Merge pull request #161 from LDLINGLINGLING/main

增加了llama_factory的示例
This commit is contained in:
LDLINGLINGLING 2024-06-27 17:28:17 +08:00 committed by GitHub
commit 21266f3d88
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 17939 additions and 26 deletions

View File

@ -46,19 +46,28 @@ MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的
## 目录 ## 目录
- [更新日志](#0) - [更新日志](#0)
- [模型下载](#1) - [模型下载](#1)
- [快速上手](#2) - [快速上手](#2)
- [模型量化](#quantize) - [模型量化](#quantize)
- [开源社区](#community) - [开源社区](#community)
- [评测结果](#3) - [评测结果](#3)
- [手机部署](#4) - [手机部署](#4)
- [Demo & API 部署](#5) - [Demo & API 部署](#5)
- [二次开发](#6) - [二次开发](#6)
- [开源协议](#7) - [开源协议](#7)
- [工作引用](#8) - [工作引用](#8)
- [典型示例](#9) - [典型示例](#9)
## 常用模块导航
| [推理](#2) | [微调](#6) | [手机部署](#4) | [量化](#quantize)
|-------------|------------|-----------|-----------|
|[Transformers](#Huggingface模型)|[Transformers](#transformer_finetune)|[MLC部署](#MLC)|[GPTQ](#gptq)|
|[vLLM](#vllm-推理)|[mlx_finetune](#mlx)|[llama.cpp](#llama.cpp)|[AWQ](#awq)|
|[llama.cpp](#llama.cpp)|[llama_factory](https://github.com/OpenBMB/MiniCPM/tree/main/finetune/llama_factory_example/README.md)||[困惑度测试](#quantize_test)|
|[ollama](#ollama)||||
|[fastllm](#fastllm)||||
|[mlx_lm](#mlx_lm)||||
<p id="0"></p> <p id="0"></p>
## 更新日志 ## 更新日志
@ -104,6 +113,8 @@ MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的
- [Colab](https://colab.research.google.com/drive/1tJcfPyWGWA5HezO7GKLeyeIso0HyOc0l?usp=sharing) - [Colab](https://colab.research.google.com/drive/1tJcfPyWGWA5HezO7GKLeyeIso0HyOc0l?usp=sharing)
<p id="Huggingface模型"></p>
#### Huggingface 模型 #### Huggingface 模型
##### MiniCPM-2B ##### MiniCPM-2B
@ -195,7 +206,9 @@ python inference/inference_vllm.py --model_path <hf_repo_path> --prompt_path pro
#### llama.cpp、Ollama、fastllm、mlx_lm推理 #### llama.cpp、Ollama、fastllm、mlx_lm推理
MiniCPM支持[llama.cpp](https://github.com/ggerganov/llama.cpp/) 、[ollama](https://github.com/ollama/ollama)、[fastllm](https://github.com/ztxz16/fastllm)、[mlx_lm](https://github.com/ml-explore/mlx-examples)推理。感谢[@runfuture](https://github.com/runfuture)对llama.cpp和ollama的适配。 MiniCPM支持[llama.cpp](https://github.com/ggerganov/llama.cpp/) 、[ollama](https://github.com/ollama/ollama)、[fastllm](https://github.com/ztxz16/fastllm)、[mlx_lm](https://github.com/ml-explore/mlx-examples)推理。感谢[@runfuture](https://github.com/runfuture)对llama.cpp和ollama的适配。
**llama.cpp** <p id="llama.cpp"></p>
#### llama.cpp
1. [安装llama.cpp](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build) 1. [安装llama.cpp](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build)
2. 下载gguf形式的模型。[下载链接-fp16格式](https://huggingface.co/runfuture/MiniCPM-2B-dpo-fp16-gguf) [下载链接-q4km格式](https://huggingface.co/runfuture/MiniCPM-2B-dpo-q4km-gguf) 2. 下载gguf形式的模型。[下载链接-fp16格式](https://huggingface.co/runfuture/MiniCPM-2B-dpo-fp16-gguf) [下载链接-q4km格式](https://huggingface.co/runfuture/MiniCPM-2B-dpo-q4km-gguf)
3. 在命令行运行示例代码: 3. 在命令行运行示例代码:
@ -204,8 +217,9 @@ MiniCPM支持[llama.cpp](https://github.com/ggerganov/llama.cpp/) 、[ollama](ht
``` ```
更多参数调整[详见](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md) 更多参数调整[详见](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
**ollama** <p id="ollama"></p>
#### ollama
***ollama自动安装模型*** ***ollama自动安装模型***
1. [安装ollama](https://github.com/ollama/ollama) 1. [安装ollama](https://github.com/ollama/ollama)
2. 在命令行运行: 2. 在命令行运行:
@ -233,8 +247,9 @@ ollama create ollama_model_name -f model_name.Modelfile
``` ```
ollama run ollama_model_name ollama run ollama_model_name
``` ```
<p id="fastllm"></p>
**fastllm** #### fastllm
1. [编译安装fastllm](https://github.com/ztxz16/fastllm) 1. [编译安装fastllm](https://github.com/ztxz16/fastllm)
2. 模型推理 2. 模型推理
```python ```python
@ -248,8 +263,9 @@ llm.set_device_map("cpu")
model = llm.from_hf(model, tokenizer, dtype = "float16") # dtype支持 "float16", "int8", "int4" model = llm.from_hf(model, tokenizer, dtype = "float16") # dtype支持 "float16", "int8", "int4"
print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?<AI>", top_p=0.8, temperature=0.5, repeat_penalty=1.02)) print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?<AI>", top_p=0.8, temperature=0.5, repeat_penalty=1.02))
``` ```
<p id="mlx_lm"></p>
**mlx_lm** #### mlx_lm
1. 安装mlx_lm库 1. 安装mlx_lm库
```shell ```shell
pip install mlx_lm pip install mlx_lm
@ -259,9 +275,11 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
```shell ```shell
python -m mlx_lm.generate --model mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx --prompt "hello, tell me a joke." --trust-remote-code python -m mlx_lm.generate --model mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx --prompt "hello, tell me a joke." --trust-remote-code
``` ```
<p id="community"></p> <p id="quantize"></p>
## 模型量化 ## 模型量化
<p id="gptq"></p>
**gptq量化** **gptq量化**
1. 首先git获取[minicpm_gptqd代码](https://github.com/LDLINGLINGLING/AutoGPTQ/tree/minicpm_gptq) 1. 首先git获取[minicpm_gptqd代码](https://github.com/LDLINGLINGLING/AutoGPTQ/tree/minicpm_gptq)
2. 进入minicpm_gptqd主目录./AutoGPTQ命令行输入 2. 进入minicpm_gptqd主目录./AutoGPTQ命令行输入
@ -275,14 +293,37 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
``` ```
5. 可以使用./AutoGPTQ/examples/quantization/inference.py进行推理也可以参考前文使用vllm对量化后的模型单卡4090下minicpm-1b-int4模型vllm推理在2000token/s左右。 5. 可以使用./AutoGPTQ/examples/quantization/inference.py进行推理也可以参考前文使用vllm对量化后的模型单卡4090下minicpm-1b-int4模型vllm推理在2000token/s左右。
<p id="awq"></p>
**awq量化** **awq量化**
1. 在quantize/awq_quantize.py 文件中修改根据注释修改配置参数model_path , quant_path, quant_data_path , quant_config, quant_samples, 如需自定数据集则需要修改 custom_data。 1. 在quantize/awq_quantize.py 文件中修改根据注释修改配置参数:
2. 在quantize/quantize_data文件下已经提供了alpaca和wiki_text两个数据集作为量化校准集如果需要自定义数据集修改quantize/awq_quantize.py中的custom_data变量 ```python
``` model_path = '/root/ld/ld_model_pretrained/MiniCPM-1B-sft-bf16' # model_path or model_id
quant_path = '/root/ld/ld_project/pull_request/MiniCPM/quantize/awq_cpm_1b_4bit' # quant_save_path
quant_data_path='/root/ld/ld_project/pull_request/MiniCPM/quantize/quantize_data/wikitext'# 写入自带量化数据集data下的alpaca或者wikitext
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } # "w_bit":4 or 8
quant_samples=512 # how many samples to use for calibration
custom_data=[{'question':'你叫什么名字。','answer':'我是openmbmb开源的小钢炮minicpm。'}, # 自定义数据集可用
{'question':'你有什么特色。','answer':'我很小,但是我很强。'}]
```
2. 在quantize/quantize_data文件下已经提供了alpaca和wiki_text两个数据集作为量化校准集修改上述quant_data_path为其中一个文件夹的路径
3. 如果需要自定义数据集修改quantize/awq_quantize.py中的custom_data变量
```python
custom_data=[{'question':'过敏性鼻炎有什么症状?','answer':'过敏性鼻炎可能鼻塞,流鼻涕,头痛等症状反复发作,严重时建议及时就医。'}, custom_data=[{'question':'过敏性鼻炎有什么症状?','answer':'过敏性鼻炎可能鼻塞,流鼻涕,头痛等症状反复发作,严重时建议及时就医。'},
{'question':'1+1等于多少','answer':'等于2'}] {'question':'1+1等于多少','answer':'等于2'}]
``` ```
3. 运行quantize/awq_quantize.py文件,在设置的quan_path目录下可得awq量化后的模型。 4. 根据选择的数据集,选择以下某一行代码替换 quantize/awq_quantize.py 中第三十八行:
```python
#使用wikitext进行量化
model.quantize(tokenizer, quant_config=quant_config, calib_data=load_wikitext(quant_data_path=quant_data_path))
#使用alpaca进行量化
model.quantize(tokenizer, quant_config=quant_config, calib_data=load_alpaca(quant_data_path=quant_data_path))
#使用自定义数据集进行量化
model.quantize(tokenizer, quant_config=quant_config, calib_data=load_cust_data(quant_data_path=quant_data_path))
```
5. 运行quantize/awq_quantize.py文件,在设置的quan_path目录下可得awq量化后的模型。
<p id="quantize_test"></p>
**量化测试** **量化测试**
1. 命令行进入到 MiniCPM/quantize 目录下 1. 命令行进入到 MiniCPM/quantize 目录下
@ -750,6 +791,7 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
<p id="4"></p> <p id="4"></p>
## 手机部署 ## 手机部署
<p id="MLC"></p>
#### 部署步骤 #### 部署步骤
@ -821,14 +863,17 @@ python demo/hf_based_demo.py --model_path <hf_repo_path>
<p id="6"></p> <p id="6"></p>
## 二次开发 ## 二次开发
<p id="transformer_finetune"></p>
* 高效参数微调 * 高效参数微调
* 一张1080/2080可实现高效参数微调 * 一张1080/2080可实现高效参数微调
* [高效参数微调代码](https://github.com/OpenBMB/MiniCPM/tree/main/finetune) * [高效参数微调代码](https://github.com/OpenBMB/MiniCPM/tree/main/finetune)
<p id="BMTrain"></p>
* 全参数微调 or 持续训练 * 全参数微调 or 持续训练
* 使用[BMTrain](https://github.com/OpenBMB/BMTrain)借助重计算和ZeRO-3一张3090/4090可实现全参数微调一台机器可实现持续训练 * 使用[BMTrain](https://github.com/OpenBMB/BMTrain)借助重计算和ZeRO-3一张3090/4090可实现全参数微调一台机器可实现持续训练
* 相关代码也将陆续推出 * 相关代码也将陆续推出
<p id="mlx"></p>
* mlx高效参数微调 * mlx高效参数微调
* 环境准备 * 环境准备
@ -842,7 +887,7 @@ python demo/hf_based_demo.py --model_path <hf_repo_path>
# test # test
python mlx_finetune.py --model MiniCPM-2B-sft-bf16-llama-format-mlx --data data/AdvertiseGen --test --seed 2024 python mlx_finetune.py --model MiniCPM-2B-sft-bf16-llama-format-mlx --data data/AdvertiseGen --test --seed 2024
``` ```
* [llama_factory微调](https://github.com/OpenBMB/MiniCPM/tree/main/finetune/llama_factory_example/README.md)
<p id="9"></p> <p id="9"></p>

View File

@ -0,0 +1,101 @@
# MiniCPM_llama_factory 微调
MiniCPM已经支持llama_factory微调llama_factory支持continue_pretrain,sft,ppo,dpo,kto,orpo等等微调方式。
由于llama_factory功能强大但初学者较难上手我们录制了微调教程
**我们提供了 llama_factory_example文件夹用来微调minicpm1bminicpm2b模型。**
1.首先安装llama_factory依赖。
```bash
git clone https://github.com/hiyouga/LLaMA-Factory
cd LLaMA-Factory
pip install -r requirements.txt
```
2.将数据集处理成Minicpm/finetune/llama_factory_example/llama_factory_data文件夹中的格式,示例包括dpo,kto,sft三种微调方式并放置到llama_factory/data目录下.以dpo为例
```json
[
{
"conversations": [
{
"from": "human",
"value": "Hi! I'd like to create a new language game simulating the first person perspective of a character named Angela."
}
],
"chosen": {
"from": "gpt",
"value": "That sounds like a fun and engaging idea! Here are some tips to help you create the game:\n1. ......"
},
"rejected": {
"from": "gpt",
"value": "Hello! I'd be happy to help you create a language game simulating the first-person perspective ....."
}
}
]
```
3.在llama_factory/data/dataset_info.json中添加数据集信息,保证dataset_info.json中能找到你的数据集如下例
``` json
{"identity": {
"file_name": "identity.json"
},
"sft_zh_demo": {
"file_name": "alpaca_zh_demo.json"
},
"kto_en_demo": {
"file_name": "kto_en_demo.json",
"formatting": "sharegpt",
"columns": {
"messages": "messages",
"kto_tag": "label"
},
"tags": {
"role_tag": "role",
"content_tag": "content",
"user_tag": "user",
"assistant_tag": "assistant"
}
},
"dpo_en_demo": {
"file_name": "dpo_en_demo.json",
"ranking": true,
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"chosen": "chosen",
"rejected": "rejected"
}
}
}
```
4.将MiniCPM/finetune/llama_factory_example中文件复制到LLaMA-Factory/examples目录下。
```bash
cd LLaMA-Factory/examples
mkdir minicpm
#以下代码中的/your/path要改成你的MiniCPM代码和LLaMA-Factory路径
cp -r /your/path/MiniCPM/finetune/llama_factory_example/* /your/path/LLaMA-Factory/examples/minicpm
```
5.以dpo为例首先修改minicpm_dpo.yaml,需要修改的:
```yaml
model_name_or_path: openbmb/MiniCPM-2B-sft-bf16 #或者你本地保存的地址
dataset: dpo_en_demo #这里写dataset_info.json中的键名
output_dir: your/finetune_minicpm/save/path
bf16: true #如果你的设备支持bf16否则false
deepspeed: examples/deepspeed/ds_z2_config.json #如果显存不够可以改成ds_z3_config.json
```
6.修改single_node.sh文件中
- 1.如果是a100以及更高端服务器删除以下两行
```bash
export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1
```
- 2.设置你希望参与微调的卡以下示例为第1张到第8张卡都参与微调
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
```
- 3.将以下代码src/train.py空格后方参数改为llama_facoty中minicpm_dpo.yaml的绝对路径
```bash
src/train.py /root/ld/ld_project/LLaMA-Factory/examples/minicpm/minicpm_sft.yaml
```
7.执行:
```bash
cd LLaMA-Factory
bash single_node.sh
```

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,42 @@
### model
model_name_or_path: /root/ld/ld_project/LLaMA-Factory/saves/minicpm/full/sft/
### method
stage: dpo
do_train: true
finetuning_type: full
### ddp
ddp_timeout: 180000000
deepspeed: examples/deepspeed/ds_z2_config.json
### dataset
dataset: dpo_en_demo
template: cpm
cutoff_len: 1200
max_samples: 50000000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/minicpm/dpo
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_strategy: epoch
### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 0.00001
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_steps: 0.1
bf16: true
### eval
val_size: 0.1
per_device_eval_batch_size: 4
evaluation_strategy: steps
eval_steps: 500

View File

@ -0,0 +1,42 @@
### model
model_name_or_path: /root/ld/ld_model_pretrain/MiniCPM-1B-sft-bf16/
### method
stage: kto
do_train: true
finetuning_type: full
kto_ftx: 0.1
### ddp
ddp_timeout: 180000000
deepspeed: examples/deepspeed/ds_z2_config.json
### dataset
dataset: kto_harmless
template: cpm
cutoff_len: 1200
max_samples: 500000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/minicpm/kto
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 0.000005
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_steps: 0.1
bf16: true
### eval
val_size: 0.1
per_device_eval_batch_size: 16
evaluation_strategy: steps
eval_steps: 500

View File

@ -0,0 +1,41 @@
### model
model_name_or_path: /root/ld/ld_model_pretrained/miniCPM-bf16/
### method
stage: sft
do_train: true
finetuning_type: full
### ddp
ddp_timeout: 180000000
deepspeed: examples/deepspeed/ds_z2_config.json
### dataset
dataset: glaive_toolcall_en,glaive_toolcall_zh
template: cpm
cutoff_len: 1800
max_samples: 500000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/minicpm/fuction_call
logging_steps: 10
save_strategy: epoch
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 0.0001
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_steps: 0.1
bf16: true
### eval
val_size: 0.1
per_device_eval_batch_size: 4
evaluation_strategy: steps
eval_steps: 500

View File

@ -0,0 +1,16 @@
#!/bin/bash
NPROC_PER_NODE=8
NNODES=1
RANK=0
MASTER_ADDR=127.0.0.1
MASTER_PORT=29500
export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun \
--nproc_per_node $NPROC_PER_NODE \
--nnodes $NNODES \
--node_rank $RANK \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
src/train.py /root/ld/ld_project/LLaMA-Factory/examples/minicpm/minicpm_sft.yaml

View File

@ -7,10 +7,10 @@ import os
model_path = '/root/ld/ld_model_pretrained/MiniCPM-1B-sft-bf16' # model_path or model_id model_path = '/root/ld/ld_model_pretrained/MiniCPM-1B-sft-bf16' # model_path or model_id
quant_path = '/root/ld/ld_project/pull_request/MiniCPM/quantize/awq_cpm_1b_4bit' # quant_save_path quant_path = '/root/ld/ld_project/pull_request/MiniCPM/quantize/awq_cpm_1b_4bit' # quant_save_path
quant_data_path='/root/ld/ld_project/pull_request/MiniCPM/quantize/quantize_data/wikitext'# 写入自带 quant_data_path='/root/ld/ld_project/pull_request/MiniCPM/quantize/quantize_data/wikitext'# 写入自带数据集地址
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } # "w_bit":4 or 8 quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } # "w_bit":4 or 8
quant_samples=512 # how many samples to use for calibration quant_samples=512 # how many samples to use for calibration
custom_data=[{'question':'你叫什么名字。','answer':'我是openmbmb开源的小钢炮minicpm。'}, custom_data=[{'question':'你叫什么名字。','answer':'我是openmbmb开源的小钢炮minicpm。'}, # 自定义数据集可用
{'question':'你有什么特色。','answer':'我很小,但是我很强。'}] {'question':'你有什么特色。','answer':'我很小,但是我很强。'}]
# Load model # Load model
model = AutoAWQForCausalLM.from_pretrained(model_path) model = AutoAWQForCausalLM.from_pretrained(model_path)