Merge pull request #161 from LDLINGLINGLING/main

增加了llama_factory的示例
2026-01-19 12:53:36 +08:00 · 2024-06-27 17:28:17 +08:00 · 2024-06-27 17:28:17 +08:00 · 21266f3d88
commit 21266f3d88
parent dc76234d43 2e6b0171f7
10 changed files with 17939 additions and 26 deletions
--- a/README.md
+++ b/README.md
@ -46,19 +46,28 @@ MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的

 ## 目录

- [更新日志](#0)
- [模型下载](#1)
- [快速上手](#2)
- [模型量化](#quantize)
- [开源社区](#community)
- [评测结果](#3)
- [手机部署](#4)
- [Demo & API 部署](#5)
- [二次开发](#6)
- [开源协议](#7)
- [工作引用](#8)
- [典型示例](#9)
+- [更新日志](#0)｜
+- [模型下载](#1)｜
+- [快速上手](#2)｜
+- [模型量化](#quantize)｜
+- [开源社区](#community)｜
+- [评测结果](#3)｜
+- [手机部署](#4)｜
+- [Demo & API 部署](#5)｜
+- [二次开发](#6)｜
+- [开源协议](#7)｜
+- [工作引用](#8)｜
+- [典型示例](#9)｜

+## 常用模块导航
+| [推理](#2) | [微调](#6) | [手机部署](#4) | [量化](#quantize)
+|-------------|------------|-----------|-----------|
+|[Transformers](#Huggingface模型)|[Transformers](#transformer_finetune)|[MLC部署](#MLC)|[GPTQ](#gptq)|
+|[vLLM](#vllm-推理)|[mlx_finetune](#mlx)|[llama.cpp](#llama.cpp)|[AWQ](#awq)|
+|[llama.cpp](#llama.cpp)|[llama_factory](https://github.com/OpenBMB/MiniCPM/tree/main/finetune/llama_factory_example/README.md)||[困惑度测试](#quantize_test)|
+|[ollama](#ollama)||||
+|[fastllm](#fastllm)||||
+|[mlx_lm](#mlx_lm)||||
 <p id="0"></p>

 ## 更新日志
@ -104,6 +113,8 @@ MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的

 - [Colab](https://colab.research.google.com/drive/1tJcfPyWGWA5HezO7GKLeyeIso0HyOc0l?usp=sharing)

+<p id="Huggingface模型"></p>
+
 #### Huggingface 模型

 ##### MiniCPM-2B
@ -195,7 +206,9 @@ python inference/inference_vllm.py --model_path <hf_repo_path> --prompt_path pro
 #### llama.cpp、Ollama、fastllm、mlx_lm推理
 MiniCPM支持[llama.cpp](https://github.com/ggerganov/llama.cpp/) 、[ollama](https://github.com/ollama/ollama)、[fastllm](https://github.com/ztxz16/fastllm)、[mlx_lm](https://github.com/ml-explore/mlx-examples)推理。感谢[@runfuture](https://github.com/runfuture)对llama.cpp和ollama的适配。

-**llama.cpp**
+<p id="llama.cpp"></p>
+
+#### llama.cpp
 1. [安装llama.cpp](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build)
 2. 下载gguf形式的模型。[下载链接-fp16格式](https://huggingface.co/runfuture/MiniCPM-2B-dpo-fp16-gguf) [下载链接-q4km格式](https://huggingface.co/runfuture/MiniCPM-2B-dpo-q4km-gguf)
 3. 在命令行运行示例代码:
@ -204,8 +217,9 @@ MiniCPM支持[llama.cpp](https://github.com/ggerganov/llama.cpp/) 、[ollama](ht
 ```
 更多参数调整[详见](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)

-**ollama**
+<p id="ollama"></p>

+#### ollama
 ***ollama自动安装模型***
 1. [安装ollama](https://github.com/ollama/ollama)
 2. 在命令行运行:
@ -233,8 +247,9 @@ ollama create ollama_model_name -f model_name.Modelfile
 ```
 ollama run ollama_model_name
 ```
+<p id="fastllm"></p>

-**fastllm**
+#### fastllm
 1. [编译安装fastllm](https://github.com/ztxz16/fastllm)
 2. 模型推理
 ```python
@ -248,8 +263,9 @@ llm.set_device_map("cpu")
 model = llm.from_hf(model, tokenizer, dtype = "float16") # dtype支持 "float16", "int8", "int4"
 print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？<AI>", top_p=0.8, temperature=0.5, repeat_penalty=1.02))
 ```
+<p id="mlx_lm"></p>

-**mlx_lm**
+#### mlx_lm
 1. 安装mlx_lm库
    ```shell
    pip install mlx_lm
@ -259,9 +275,11 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
    ```shell
    python -m mlx_lm.generate --model mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx --prompt "hello, tell me a joke." --trust-remote-code
    ```
-<p id="community"></p>
+<p id="quantize"></p>

 ## 模型量化
+<p id="gptq"></p>
+
 **gptq量化**
 1. 首先git获取[minicpm_gptqd代码](https://github.com/LDLINGLINGLING/AutoGPTQ/tree/minicpm_gptq)
 2. 进入minicpm_gptqd主目录./AutoGPTQ，命令行输入：
@ -275,14 +293,37 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
    ```
 5. 可以使用./AutoGPTQ/examples/quantization/inference.py进行推理，也可以参考前文使用vllm对量化后的模型，单卡4090下minicpm-1b-int4模型vllm推理在2000token/s左右。

+<p id="awq"></p>
+
 **awq量化**
-1. 在quantize/awq_quantize.py 文件中修改根据注释修改配置参数：model_path , quant_path, quant_data_path , quant_config, quant_samples, 如需自定数据集则需要修改 custom_data。
-2. 在quantize/quantize_data文件下已经提供了alpaca和wiki_text两个数据集作为量化校准集，如果需要自定义数据集，修改quantize/awq_quantize.py中的custom_data变量，如：
-    ```
+1. 在quantize/awq_quantize.py 文件中修改根据注释修改配置参数：
+  ```python
+  model_path = '/root/ld/ld_model_pretrained/MiniCPM-1B-sft-bf16' # model_path or model_id
+  quant_path = '/root/ld/ld_project/pull_request/MiniCPM/quantize/awq_cpm_1b_4bit' # quant_save_path
+  quant_data_path='/root/ld/ld_project/pull_request/MiniCPM/quantize/quantize_data/wikitext'# 写入自带量化数据集，data下的alpaca或者wikitext
+  quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } # "w_bit":4 or 8
+  quant_samples=512 # how many samples to use for calibration
+  custom_data=[{'question':'你叫什么名字。','answer':'我是openmbmb开源的小钢炮minicpm。'}, # 自定义数据集可用
+                 {'question':'你有什么特色。','answer':'我很小，但是我很强。'}]
+  ```
+2. 在quantize/quantize_data文件下已经提供了alpaca和wiki_text两个数据集作为量化校准集，修改上述quant_data_path为其中一个文件夹的路径
+3. 如果需要自定义数据集，修改quantize/awq_quantize.py中的custom_data变量，如：
+    ```python
    custom_data=[{'question':'过敏性鼻炎有什么症状？','answer':'过敏性鼻炎可能鼻塞，流鼻涕，头痛等症状反复发作，严重时建议及时就医。'},
                 {'question':'1+1等于多少？','answer':'等于2'}]
    ```
-3. 运行quantize/awq_quantize.py文件,在设置的quan_path目录下可得awq量化后的模型。
+4. 根据选择的数据集，选择以下某一行代码替换 quantize/awq_quantize.py 中第三十八行：
+  ```python
+    #使用wikitext进行量化
+    model.quantize(tokenizer, quant_config=quant_config, calib_data=load_wikitext(quant_data_path=quant_data_path))
+    #使用alpaca进行量化
+    model.quantize(tokenizer, quant_config=quant_config, calib_data=load_alpaca(quant_data_path=quant_data_path))
+    #使用自定义数据集进行量化
+    model.quantize(tokenizer, quant_config=quant_config, calib_data=load_cust_data(quant_data_path=quant_data_path))
+    
+  ```
+5. 运行quantize/awq_quantize.py文件,在设置的quan_path目录下可得awq量化后的模型。
+<p id="quantize_test"></p>

 **量化测试**
 1. 命令行进入到 MiniCPM/quantize 目录下
@ -750,6 +791,7 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
 <p id="4"></p>

 ## 手机部署
+<p id="MLC"></p>

 #### 部署步骤

@ -821,14 +863,17 @@ python demo/hf_based_demo.py --model_path <hf_repo_path>
 <p id="6"></p>

 ## 二次开发
+<p id="transformer_finetune"></p>

 * 高效参数微调
  * 一张1080/2080可实现高效参数微调
-  * [高效参数微调代码](https://github.com/OpenBMB/MiniCPM/tree/main/finetune)
-  
+  * [高效参数微调代码](https://github.com/OpenBMB/MiniCPM/tree/main/finetune) 
+<p id="BMTrain"></p>  
+
 * 全参数微调 or 持续训练
  * 使用[BMTrain](https://github.com/OpenBMB/BMTrain)，借助重计算和ZeRO-3，一张3090/4090可实现全参数微调，一台机器可实现持续训练
  * 相关代码也将陆续推出
+<p id="mlx"></p> 

 * mlx高效参数微调
  * 环境准备
@ -842,7 +887,7 @@ python demo/hf_based_demo.py --model_path <hf_repo_path>
    # test
    python mlx_finetune.py --model MiniCPM-2B-sft-bf16-llama-format-mlx  --data data/AdvertiseGen  --test --seed 2024
    ```
-
+* [llama_factory微调](https://github.com/OpenBMB/MiniCPM/tree/main/finetune/llama_factory_example/README.md)

 <p id="9"></p>

--- a/finetune/llama_factory_example/README.md
+++ b/finetune/llama_factory_example/README.md
@ -0,0 +1,101 @@
+# MiniCPM_llama_factory 微调
+MiniCPM已经支持llama_factory微调，llama_factory支持continue_pretrain,sft,ppo,dpo,kto,orpo等等微调方式。
+由于llama_factory功能强大，但初学者较难上手，我们录制了微调教程
+
+**我们提供了 llama_factory_example文件夹，用来微调minicpm1b，minicpm2b模型。**
+1.首先安装llama_factory依赖。
+```bash
+git clone https://github.com/hiyouga/LLaMA-Factory
+cd LLaMA-Factory
+pip install -r requirements.txt
+```
+2.将数据集处理成Minicpm/finetune/llama_factory_example/llama_factory_data文件夹中的格式,示例包括dpo,kto,sft三种微调方式并放置到llama_factory/data目录下.以dpo为例：
+```json
+  [
+    {
+      "conversations": [
+        {
+          "from": "human",
+          "value": "Hi! I'd like to create a new language game simulating the first person perspective of a character named Angela."
+        }
+      ],
+      "chosen": {
+        "from": "gpt",
+        "value": "That sounds like a fun and engaging idea! Here are some tips to help you create the game:\n1. ......"
+      },
+      "rejected": {
+        "from": "gpt",
+        "value": "Hello! I'd be happy to help you create a language game simulating the first-person perspective ....."
+      }
+    }
+  ]
+```
+3.在llama_factory/data/dataset_info.json中添加数据集信息,保证dataset_info.json中能找到你的数据集，如下例：
+``` json
+  {"identity": {
+    "file_name": "identity.json"
+  },
+    "sft_zh_demo": {
+      "file_name": "alpaca_zh_demo.json"
+    },
+    "kto_en_demo": {
+      "file_name": "kto_en_demo.json",
+      "formatting": "sharegpt",
+      "columns": {
+        "messages": "messages",
+        "kto_tag": "label"
+      },
+      "tags": {
+        "role_tag": "role",
+        "content_tag": "content",
+        "user_tag": "user",
+        "assistant_tag": "assistant"
+      }
+    },
+    "dpo_en_demo": {
+      "file_name": "dpo_en_demo.json",
+      "ranking": true,
+      "formatting": "sharegpt",
+      "columns": {
+        "messages": "conversations",
+        "chosen": "chosen",
+        "rejected": "rejected"
+      }
+    }
+  }
+```
+4.将MiniCPM/finetune/llama_factory_example中文件复制到LLaMA-Factory/examples目录下。
+  ```bash
+    cd LLaMA-Factory/examples
+    mkdir minicpm
+    #以下代码中的/your/path要改成你的MiniCPM代码和LLaMA-Factory路径
+    cp -r /your/path/MiniCPM/finetune/llama_factory_example/*  /your/path/LLaMA-Factory/examples/minicpm
+  ```
+5.以dpo为例，首先修改minicpm_dpo.yaml,需要修改的：
+```yaml
+  model_name_or_path: openbmb/MiniCPM-2B-sft-bf16 #或者你本地保存的地址
+  dataset: dpo_en_demo #这里写dataset_info.json中的键名
+  output_dir: your/finetune_minicpm/save/path
+  bf16: true #如果你的设备支持bf16，否则false
+  deepspeed: examples/deepspeed/ds_z2_config.json #如果显存不够可以改成ds_z3_config.json
+```
+6.修改single_node.sh文件中：
+
+  - 1.如果是a100以及更高端服务器，删除以下两行
+  ```bash
+    export NCCL_P2P_DISABLE=1
+    export NCCL_IB_DISABLE=1 
+  ```
+  - 2.设置你希望参与微调的卡，以下示例为第1张到第8张卡都参与微调
+  ```bash
+    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  ```
+  - 3.将以下代码src/train.py空格后方参数改为llama_facoty中minicpm_dpo.yaml的绝对路径
+  ```bash
+    src/train.py /root/ld/ld_project/LLaMA-Factory/examples/minicpm/minicpm_sft.yaml
+  ```
+7.执行：
+```bash
+  cd LLaMA-Factory
+  bash single_node.sh
+```
--- a/finetune/llama_factory_example/llama_factory_data/dpo_en_demo.json
+++ b/finetune/llama_factory_example/llama_factory_data/dpo_en_demo.json
--- a/finetune/llama_factory_example/llama_factory_data/kto_en_demo.json
+++ b/finetune/llama_factory_example/llama_factory_data/kto_en_demo.json
--- a/finetune/llama_factory_example/llama_factory_data/sft_zh_demo.json
+++ b/finetune/llama_factory_example/llama_factory_data/sft_zh_demo.json
--- a/finetune/llama_factory_example/minicpm_dpo.yaml
+++ b/finetune/llama_factory_example/minicpm_dpo.yaml
@ -0,0 +1,42 @@
+### model
+model_name_or_path: /root/ld/ld_project/LLaMA-Factory/saves/minicpm/full/sft/
+
+### method
+stage: dpo
+do_train: true
+finetuning_type: full
+
+### ddp
+ddp_timeout: 180000000
+deepspeed: examples/deepspeed/ds_z2_config.json
+
+### dataset
+dataset: dpo_en_demo
+template: cpm
+cutoff_len: 1200
+max_samples: 50000000
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+
+### output
+output_dir: saves/minicpm/dpo
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_strategy: epoch
+### train
+per_device_train_batch_size: 2
+gradient_accumulation_steps: 4
+learning_rate: 0.00001
+num_train_epochs: 2.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+bf16: true
+
+### eval
+val_size: 0.1
+per_device_eval_batch_size: 4
+evaluation_strategy: steps
+eval_steps: 500
--- a/finetune/llama_factory_example/minicpm_kto.yaml
+++ b/finetune/llama_factory_example/minicpm_kto.yaml
@ -0,0 +1,42 @@
+### model
+model_name_or_path: /root/ld/ld_model_pretrain/MiniCPM-1B-sft-bf16/
+
+### method
+stage: kto
+do_train: true
+finetuning_type: full
+kto_ftx: 0.1
+
+### ddp
+ddp_timeout: 180000000
+deepspeed: examples/deepspeed/ds_z2_config.json
+
+### dataset
+dataset: kto_harmless
+template: cpm
+cutoff_len: 1200
+max_samples: 500000
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+### output
+output_dir: saves/minicpm/kto
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+### train
+per_device_train_batch_size: 4
+gradient_accumulation_steps: 4
+learning_rate: 0.000005
+num_train_epochs: 1.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+bf16: true
+
+### eval
+val_size: 0.1
+per_device_eval_batch_size: 16
+evaluation_strategy: steps
+eval_steps: 500
--- a/finetune/llama_factory_example/minicpm_sft.yaml
+++ b/finetune/llama_factory_example/minicpm_sft.yaml
@ -0,0 +1,41 @@
+### model
+model_name_or_path: /root/ld/ld_model_pretrained/miniCPM-bf16/
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+
+### ddp
+ddp_timeout: 180000000
+deepspeed: examples/deepspeed/ds_z2_config.json
+
+### dataset
+dataset: glaive_toolcall_en,glaive_toolcall_zh
+template: cpm
+cutoff_len: 1800
+max_samples: 500000
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+### output
+output_dir: saves/minicpm/fuction_call
+logging_steps: 10
+save_strategy: epoch
+plot_loss: true
+overwrite_output_dir: true
+
+### train
+per_device_train_batch_size: 2
+gradient_accumulation_steps: 4
+learning_rate: 0.0001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+bf16: true
+
+### eval
+val_size: 0.1
+per_device_eval_batch_size: 4
+evaluation_strategy: steps
+eval_steps: 500
--- a/finetune/llama_factory_example/single_node.sh
+++ b/finetune/llama_factory_example/single_node.sh
@ -0,0 +1,16 @@
+#!/bin/bash
+
+NPROC_PER_NODE=8
+NNODES=1
+RANK=0
+MASTER_ADDR=127.0.0.1
+MASTER_PORT=29500
+export NCCL_P2P_DISABLE=1
+export NCCL_IB_DISABLE=1 
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun \
+    --nproc_per_node $NPROC_PER_NODE \
+    --nnodes $NNODES \
+    --node_rank $RANK \
+    --master_addr $MASTER_ADDR \
+    --master_port $MASTER_PORT \
+    src/train.py /root/ld/ld_project/LLaMA-Factory/examples/minicpm/minicpm_sft.yaml
--- a/quantize/awq_quantize.py
+++ b/quantize/awq_quantize.py
@ -7,10 +7,10 @@ import os

 model_path = '/root/ld/ld_model_pretrained/MiniCPM-1B-sft-bf16' # model_path or model_id
 quant_path = '/root/ld/ld_project/pull_request/MiniCPM/quantize/awq_cpm_1b_4bit' # quant_save_path
-quant_data_path='/root/ld/ld_project/pull_request/MiniCPM/quantize/quantize_data/wikitext'# 写入自带
+quant_data_path='/root/ld/ld_project/pull_request/MiniCPM/quantize/quantize_data/wikitext'# 写入自带数据集地址
 quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } # "w_bit":4 or 8
 quant_samples=512 # how many samples to use for calibration
-custom_data=[{'question':'你叫什么名字。','answer':'我是openmbmb开源的小钢炮minicpm。'},
+custom_data=[{'question':'你叫什么名字。','answer':'我是openmbmb开源的小钢炮minicpm。'}, # 自定义数据集可用
                 {'question':'你有什么特色。','answer':'我很小，但是我很强。'}]
 # Load model
 model = AutoAWQForCausalLM.from_pretrained(model_path)