Skip to content

Commit d0b8457

Browse files
authored
Merge pull request #58 from codefuse-ai/v0.4.dev
V0.4.dev
2 parents cf38e74 + a2a0bff commit d0b8457

31 files changed

+3689
-1059
lines changed

README.md

+23-17
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,19 @@
4646

4747

4848
## News
49+
🔥🔥🔥 [2024/05/20] We released **MFTCoder v0.4.2**, mainly for MFTCoder-accelerate. It supports **QLoRA + DeepSpeed Zero3** and **QLoRA + FSDP** as options allowing you training very large models. It now supports new models like Qwen2, Qwen2-MoE, Starcoder2, Gemma, etc.
4950

50-
🔥🔥🔥 [2024/01/30] The model [CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) fine-tuned with MFTCoder ranks first in HuggingFace [Big Code Models LeaderBoard](https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard)
51+
🔥🔥🔥 [2024/05/20] Our paper [MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning](https://arxiv.org/abs/2311.02303) has been accepted by KDD2024.
5152

52-
🔥🔥🔥 [2024/01/17] We released MFTCoder v0.3.0, mainly for MFTCoder-accelerate. It now supports new models like Mixtral(MoE), DeepSeek-coder, chatglm3. It supports FSDP as an option. It also supports Self-paced Loss as a solution for convergence balance in Multitask Fine-tuning.
53+
🔥🔥🔥 [2024/05/20] [CodeFuse-StarCoder2-15B](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B) has been released, achieving a pass@1 (greedy decoding) score of 73.2% on HumanEval.
5354

54-
🔥🔥🔥 [2024/01/17] [CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) has been released, achieving a pass@1 (greedy decoding) score of 78.7% on HumanEval. It lists as top-1 LLM on Bigcode Leardboard in terms of win-rate, the official result is going to be published later.
55+
🔥🔥 [2024/01/30] The model [CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) fine-tuned with MFTCoder ranks first in HuggingFace [Big Code Models LeaderBoard](https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard)
5556

56-
🔥🔥🔥 [2024/01/17] [CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8X7B) has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval.
57+
🔥🔥 [2024/01/17] We released MFTCoder v0.3.0, mainly for MFTCoder-accelerate. It now supports new models like Mixtral(MoE), DeepSeek-coder, chatglm3. It supports FSDP as an option. It also supports Self-paced Loss as a solution for convergence balance in Multitask Fine-tuning.
58+
59+
🔥🔥 [2024/01/17] [CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) has been released, achieving a pass@1 (greedy decoding) score of 78.7% on HumanEval. It lists as top-1 LLM on Bigcode Leardboard in terms of win-rate, the official result is going to be published later.
60+
61+
🔥🔥 [2024/01/17] [CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8X7B) has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval.
5762

5863
🔥🔥 [2023/11/07] [MFTCoder Paper](https://arxiv.org/abs/2311.02303) has been released on Arxiv, which discloses technique details of multi-task-fine-tuning.
5964

@@ -73,6 +78,7 @@
7378
| **CodeFuse-DeepSeek-33B** | **78.7%** | 2024/01 |
7479
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
7580
| **CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
81+
| **CodeFuse-StarCoder2-15B** | **73.2%** | 2023/05 |
7682
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
7783
| GPT-4(zero-shot) | 67.0% | 2023/03 |
7884
| PanGu-Coder2 15B | 61.6% | 2023/08 |
@@ -88,7 +94,7 @@
8894

8995

9096
## Articles
91-
[MFT Arxiv paper](https://arxiv.org/abs/2311.02303)
97+
[MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning (KDD2024)](https://arxiv.org/abs/2311.02303)
9298

9399
## Introduction
94100

@@ -125,13 +131,13 @@ The main components of this project include:
125131

126132

127133
## Requirements
128-
To begin, ensure that you have successfully installed CUDA (version >= 11.4, preferably 11.7) along with the necessary drivers. Additionally, make sure you have installed torch (version 2.0.1).
134+
To begin, ensure that you have successfully installed CUDA (version >= 11.4, preferably 12.1) along with the necessary drivers. Additionally, make sure you have installed torch (version >= 2.1.0).
129135

130136
Next, we have provided an init_env.sh script to simplify the installation of required packages. Execute the following command to run the script:
131137
```bash
132138
sh init_env.sh
133139
```
134-
We highly recommend training with flash attention(version >= 2.1.0, preferably 2.3.6), please refer to the following link for installation instructions: https://github.com/Dao-AILab/flash-attention
140+
We highly recommend training with flash attention(version >= 2.3.0), please refer to the following link for installation instructions: https://github.com/Dao-AILab/flash-attention
135141

136142

137143
## Training
@@ -152,16 +158,16 @@ If you want to explore some new framework like atorch, you could check:
152158
We are excited to release the following two CodeLLMs trained by MFTCoder, now available on both HuggingFace and ModelScope:
153159

154160

155-
| Model | HuggingFace Links | ModelScope Links | Base Model | Num of examples trained | Batch Size | Seq Length |
156-
|--------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------|------|------------|------------|
157-
| 🔥 CodeFuse-DeepSeek-33B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B) | DeepSeek-coder-33B | 60万 | 80 | 4096 |
158-
| 🔥 CodeFuse-Mixtral-8x7B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B) | Mixtral-8x7B | 60万 | 80 | 4096 |
159-
| 🔥 CodeFuse-CodeLlama-34B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 60万 | 80 | 4096 |
160-
| 🔥 CodeFuse-CodeLlama-34B-4bits | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
161-
| 🔥 CodeFuse-StarCoder-15B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B) | StarCoder-15B | 60万 | 80 | 4096 |
162-
| 🔥 CodeFuse-QWen-14B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 110万 | 256 | 4096 |
163-
| 🔥 CodeFuse-CodeGeex2-6B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B) | CodeGeex2-6B | 110万 | 256 | 4096 |
164-
161+
| Model | HuggingFace Links | ModelScope Links | Base Model | Num of examples trained | Batch Size | Seq Length |
162+
|----------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------|-------------------------|------------|------------|
163+
| 🔥 CodeFuse-DeepSeek-33B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B) | DeepSeek-coder-33B | 600K | 80 | 4096 |
164+
| 🔥 CodeFuse-Mixtral-8x7B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B) | Mixtral-8x7B | 600K | 80 | 4096 |
165+
| 🔥 CodeFuse-CodeLlama-34B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 600K | 80 | 4096 |
166+
| 🔥 CodeFuse-CodeLlama-34B-4bits | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
167+
| 🔥 CodeFuse-StarCoder-15B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B) | StarCoder-15B | 600K | 80 | 4096 |
168+
| 🔥 CodeFuse-QWen-14B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 1.1 Million | 256 | 4096 |
169+
| 🔥 CodeFuse-CodeGeex2-6B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B) | CodeGeex2-6B | 1.1 Million | 256 | 4096 |
170+
| 🔥 CodeFuse-StarCoder2-15B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder2-15B) | Starcoder2-15B | 700K | 128 | 4096 |
165171

166172
## Datasets
167173
We are also pleased to release two code-related instruction datasets, meticulously selected from a range of datasets to facilitate multitask training. Moving forward, we are committed to releasing additional instruction datasets covering various code-related tasks.

README_cn.md

+16-9
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,17 @@
4545

4646

4747
## 新闻
48-
🔥🔥🔥 [2024/01/17] **MFTCoder-v0.3.0**发布。新增对Mixtral(MoE), DeepSeek等模型的支持;新增支持FSDP(Fully Sharded Data Parallel);新增Self-paced Loss, 支持多任务收敛均衡。 感兴趣详见微信公众号CodeFuse的文章[MFTCoder 重磅升级v0.3.0发布](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
48+
🔥🔥🔥 [2024/05/20] **MFTCoder-v0.4.2**发布。新增支持**QLoRA+ DeepSpeed Zero3**, **QLoRA + FSDP**训练模式,可以更好的支持微调更大的模型,比如Qwen1.5-70B等。新增对Qwen2, Qwen2-MoE, Starcoder2, Gemma等模型的支持。
4949

50-
🔥🔥🔥 [2024/01/17] 开源了[CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)模型,在HumanEval pass@1(greedy decoding)上可以达到78.7%。该模型在Big Code榜单的结果近期发布,请关注公众号获取最新信息。
50+
🔥🔥🔥 [2024/05/20] 我们的论文 [MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning](https://arxiv.org/abs/2311.02303) 已被 KDD 2024 接收.
5151

52-
🔥🔥🔥 [2024/01/17] 开源了[CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)模型,在HumanEval pass@1(greedy decoding)上可以达到56.1%。感兴趣详见微信公众号CodeFuse的文章[MFTCoder提升Mixtral-8x7B混合专家模型的代码能力实践](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
52+
🔥🔥🔥 开源了[CodeFuse-StarCoder2-15B](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B)模型,在HumanEval上可以达到73.2%,多代码语言能力均衡.
53+
54+
🔥🔥 [2024/01/17] **MFTCoder-v0.3.0**发布。新增对Mixtral(MoE), DeepSeek等模型的支持;新增支持FSDP(Fully Sharded Data Parallel);新增Self-paced Loss, 支持多任务收敛均衡。 感兴趣详见微信公众号CodeFuse的文章[MFTCoder 重磅升级v0.3.0发布](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
55+
56+
🔥🔥 [2024/01/17] 开源了[CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)模型,在HumanEval pass@1(greedy decoding)上可以达到78.7%。该模型在Big Code榜单的结果近期发布,请关注公众号获取最新信息。
57+
58+
🔥🔥 [2024/01/17] 开源了[CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)模型,在HumanEval pass@1(greedy decoding)上可以达到56.1%。感兴趣详见微信公众号CodeFuse的文章[MFTCoder提升Mixtral-8x7B混合专家模型的代码能力实践](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
5359

5460
🔥🔥 [2023/11/07] [MFTCoder论文](https://arxiv.org/abs/2311.02303)在Arxiv公布,介绍了多任务微调的技术细节。
5561

@@ -69,6 +75,7 @@
6975
| **CodeFuse-DeepSeek-33B** | **78.7%** | 2024/01 |
7076
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
7177
| **CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
78+
| **CodeFuse-StarCoder2-15B** | **73.2%** | 2023/05 |
7279
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
7380
| GPT-4(zero-shot) | 67.0% | 2023/03 |
7481
| PanGu-Coder2 15B | 61.6% | 2023/08 |
@@ -118,12 +125,12 @@
118125

119126

120127
## 环境
121-
首先, 你需要将CUDA(>=11.4, 推荐11.7)及其相关驱动安装成功,并确保其工作正常, 并且安装基本的torch(>=2.0.0)
128+
首先, 你需要将CUDA(>=11.4, 推荐12.1)及其相关驱动安装成功,并确保其工作正常, 并且安装基本的torch(>=2.1.0)
122129
在requirements.txt下固定了几个主要的python包的版本,执行如下脚本即可:
123130
```bash
124131
sh init_env.sh
125132
```
126-
我们强烈建议您安装flash attention(>=2.1.0, 推荐2.3.6), 安装请参考 https://github.com/Dao-AILab/flash-attention
133+
我们强烈建议您安装flash attention(>=2.3.0), 安装请参考 https://github.com/Dao-AILab/flash-attention
127134

128135
## 训练
129136
如果你熟悉大模型训练的各种主流开源资源,例如 ```transformers```, ```DeepSpeed```, ```FSDP```等, 为了用开源项目快速上手高性能微调,我们建议您尝试:
@@ -145,11 +152,11 @@ sh init_env.sh
145152
| 🔥🔥🔥 CodeFuse-DeepSeek-33B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B) | DeepSeek-coder-33B | 60万 | 80 | 4096 |
146153
| 🔥🔥🔥 CodeFuse-Mixtral-8x7B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B) | Mixtral-8x7B | 60万 | 80 | 4096 |
147154
| 🔥🔥🔥 CodeFuse-CodeLlama-34B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 60万 | 80 | 4096 |
148-
| 🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
155+
| 🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
149156
| 🔥🔥🔥 CodeFuse-StarCoder-15B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B) | StarCoder-15B | 60万 | 80 | 4096 |
150-
| 🔥🔥🔥 CodeFuse-QWen-14B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 110万 | 256 | 4096 |
151-
| 🔥🔥🔥 CodeFuse-CodeGeex2-6B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B) | CodeGeex2-6B | 110万 | 256 | 4096 |
152-
157+
| 🔥🔥🔥 CodeFuse-QWen-14B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 110万 | 256 | 4096 |
158+
| 🔥🔥🔥 CodeFuse-CodeGeex2-6B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B) | CodeGeex2-6B | 110万 | 256 | 4096 |
159+
| 🔥🔥🔥 CodeFuse-StarCoder2-15B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder2-15B) | Starcoder2-15B | 70万 | 128 | 4096 |
153160

154161

155162

0 commit comments

Comments
 (0)