update readme

This commit is contained in:
zh-zheng 2024-04-11 01:41:17 +08:00
parent ff3dba000c
commit 552d09e72b
2 changed files with 52 additions and 42 deletions

View File

@ -11,7 +11,7 @@
</h4>
<p align="center">
<a href="https://shengdinghu.notion.site/MiniCPM-Unveiling-the-Potential-of-End-side-Large-Language-Models-d4d3a8c426424654a4e80e42a711cb20?pvs=4" target="_blank">Technical Blog</a> |
<a href="https://openbmb.vercel.app/" target="_blank">Technical Blog</a> |
<a href="https://github.com/OpenBMB/OmniLMM/" target="_blank">Multi-modal Model OmniLMM</a> |
<a href="https://luca.cn/" target="_blank">CPM-C 100B Model Trial</a> |
Join our <a href="https://discord.gg/3cGQn9b3YM" target="_blank">discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">wechat</a>
@ -21,19 +21,20 @@ MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP, with onl
- MiniCPM has very close performance compared with Mistral-7B on open-sourced general benchmarks with better ability on Chinese, Mathematics and Coding after SFT. The overall performance exceeds Llama2-13B, MPT-30B, Falcon-40B, etc.
- After DPO, MiniCPM outperforms Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, Zephyr-7B-alpha, etc. on MTBench.
- MiniCPM-V, based on MiniCPM-2B, achieves the best overall performance among multimodel models of the same scale, surpassing existing multimodal large models built on Phi-2 and achieving performance comparable to or even better than 9.6B Qwen-VL-Chat on some tasks.
- MiniCPM-V 2.0, based on MiniCPM-2B, achieves state-of-the-art performance on multiple benchmarks among models under 7B parameters. It even outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass. MiniCPM-V 2.0 also shows strong OCR capability, achieving comparable performance to Gemini Pro in scene-text understanding.
- MiniCPM can be deployed and infer on smartphones, and the speed of streaming output is relatively higher than human verbal speed. MiniCPM-V has also successfully deployed multi-modal models on smartphones.
- The cost of developing based on MiniCPM is low. Parameter efficient finetuning can be conducted with a single 1080/2080 GPU and full parameter finetuning can be conducted with a 3090/4090 GPU.
We release all model parameters for research and limited commercial use.
- SFT and DPO version based on MiniCPM-2B and human preference: **MiniCPM-2B-SFT/DPO**
- The multi-modal model **MiniCPM-V** based on MiniCPM-2B, which outperforms models with similar size, i.e., Phi-2
- SFT and DPO version based on MiniCPM-2B: **MiniCPM-2B-SFT/DPO**
- The multi-modal model **MiniCPM-V 2.0** based on MiniCPM-2B.
- The INT4 quantized version **MiniCPM-2B-SFT/DPO-Int4** based on MiniCPM-2B-SFT/DPO
- The 128k long context version of MiniCPM-2B: **MiniCPM-2B-128k**.
- The MoE version of MiniCPM-2B: **MiniCPM-MoE-8x2B**.
- SFT version of MiniCPM-1B, a lighter-weight model: **MiniCPM-1B-SFT**.
- Mobile phone application based on MLC-LLM and LLMFarm. Both language model and multimodel model can conduct inference on smartphones.
- 30 Intermidiate [checkpoints](https://huggingface.co/openbmb/MiniCPM-2B-history) for academic purpose.
- 30 Intermidiate [checkpoints](https://huggingface.co/openbmb/MiniCPM-2B-history) of MiniCPM-2B for academic purpose.
### Limitations
@ -59,7 +60,7 @@ We release all model parameters for research and limited commercial use.
<p id="0"></p>
## Update Log
- 2024/04/11 We release [MiniCPM-V-v2.0](https://huggingface.co/openbmb/MiniCPM-V-v2.0)、[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) and [MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)。
- 2024/04/11 We release [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2.0), [MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k), [MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) and [MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)!
- 2024/03/16 Intermediate checkpoints were released [here](https://huggingface.co/openbmb/MiniCPM-2B-history)!
- 2024/02/13 We support llama.cpp
- 2024/02/09 We have included a [Community](#community) section in the README to encourage support for MiniCPM from the open-source community.
@ -75,18 +76,18 @@ We release all model parameters for research and limited commercial use.
| HuggingFace | ModelScope | WiseModel |
|-------------|------------|-----------|
|[MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[MiniCPM-2B-sft-bf16](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)|[MiniCPM-2B-sft-bf16](https://wisemodel.cn/models/OpenBMB/miniCPM-bf16)|
|[MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[MiniCPM-2B-dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://replicate.com/tuantuanzhang/minicpm)
|[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) |[MiniCPM-2B-128k](https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/summary)|
|[MiniCPM-MoE-8x2B]() |[MiniCPM-MoE-8x2B]()|
|[MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[MiniCPM-2B-dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16)|
|[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) |[MiniCPM-2B-128k](https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/summary)|
|[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) |[MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B)|
|[MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) |
Note:
1. The model training was conducted in bf16 format, so inference using bf16 will yield the best results. Other formats might experience a slight performance decline due to precision issues.
2. More model versions can be found [here](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f).
Note: More model versions can be found [here](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f).
* Multimodel Model
| HuggingFace | ModelScope | WiseModel |
|-------------|------------|-----------|
| [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2.0) | [MiniCPM-V 2.0](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2.0/) |
| [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V) | [MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/) | [MiniCPM-V](https://wisemodel.cn/models/OpenBMB/MiniCPM-V) |
| [OmniLMM](https://huggingface.co/openbmb/OmniLMM-12B) | [OmniLMM](https://modelscope.cn/models/OpenBMB/OmniLMM-12B) | [OmniLMM](https://wisemodel.cn/models/OpenBMB/OmniLMM-12B) |
@ -195,13 +196,6 @@ More parameters adjustment [see this](https://github.com/ggerganov/llama.cpp/blo
**ollama**
Solving [this issue](https://github.com/ollama/ollama/issues/2383)
<p id="Community"></p>
## Community
- [ChatLLM](https://github.com/foldl/chatllm.cpp): [Run MiniCPM on CPU](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16/discussions/2#65c59c4f27b8c11e43fc8796)
**fastllm**
1. install [fastllm](https://github.com/ztxz16/fastllm)
2. inference
@ -217,6 +211,11 @@ model = llm.from_hf(model, tokenizer, dtype = "float16") # dtype支持 "float16"
print(model.response("<用户>Write an acrostic poem with the word MINICPM (One line per letter)<AI>", top_p=0.8, temperature=0.5, repeat_penalty=1.02))
```
<p id="Community"></p>
## Community
- [ChatLLM](https://github.com/foldl/chatllm.cpp): [Run MiniCPM on CPU](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16/discussions/2#65c59c4f27b8c11e43fc8796)
<p id="3"></p>
@ -530,9 +529,7 @@ MBPP, instead of the hand-verified set.
#### Performance
* We did not conduct in-depth optimization and system testing on the mobile inference model, only verifying the feasibility of MiniCPM using mobile phone chips for inference.
* Besides us, there are also some [efforts](https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/MobileVLM-README.md) to deploy multimodal models on mobile phones based on llama.cpp. We have verified the feasibility of deploying MiniCPM-V on mobile phones based on MLC-LLM this time, and it can input and output normally. However, there also exist a problem of long image processing time, which needs further optimization :)
* **We welcome more developers to continuously improve the inference performance of LLMs on mobile phones and update the test results below.**
* We did not conduct in-depth optimization and system testing on the mobile inference model, only verifying the feasibility of MiniCPM using mobile phone chips for inference. **We welcome more developers to continuously improve the inference performance of LLMs on mobile phones and update the test results below.**
| Mobile Phones | OS | Processor | MemoryGB | Inference Throughputtoken/s |
| ----------------- | ------------- | ------------------ | ------------ | ------------------------------- |
@ -556,7 +553,14 @@ MBPP, instead of the hand-verified set.
| iPhone 11 | iOS 16.6 | A13 | 4 | 4.6 |
|Xiaomi Redmi K50 | HyperOS 1.0.2 | MediaTek Dimensity 8100 |12 |3.5|
![multimodel demo](https://github.com/OpenBMB/OmniLMM/blob/main/assets/gif_cases/Snake_en.gif)
* We have also verified the feasibility of deploying MiniCPM-V series models on mobile phones based on MLC-LLM, and it can input and output normally. However, there also exist a problem of long image processing time, which needs further optimization. The demo video below is the raw screen recording on a Xiaomi 14 Pro without edition.
<table align="center">
<p align="center">
<img src="https://github.com/OpenBMB/OmniLMM/blob/main/assets/gif_cases/station.gif" width=36%/>
<img src="https://github.com/OpenBMB/OmniLMM/blob/main/assets/gif_cases/english_menu.gif" width=36%/>
</p>
</table>
<p id="5"></p>

View File

@ -12,7 +12,7 @@
<p align="center">
<a href="https://shengdinghu.notion.site/MiniCPM-c805a17c5c8046398914e47f0542095a" target="_blank">MiniCPM 技术博客</a> |
<a href="https://openbmb.vercel.app/?category=Chinese+Blog" target="_blank">MiniCPM 技术博客</a> |
<a href="https://github.com/OpenBMB/OmniLMM/" target="_blank">OmniLMM 多模态模型</a> |
<a href="https://luca.cn/" target="_blank">CPM-C 千亿模型试用</a> |
加入我们的 <a href="https://discord.gg/3cGQn9b3YM" target="_blank">discord</a><a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">wechat</a>
@ -20,21 +20,22 @@
</p>
MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的系列端侧大模型,主体语言模型 MiniCPM-2B 仅有 24亿2.4B)的非词嵌入参数量, 总计2.7B参数量。
- 经过 SFT 后MiniCPM 在公开综合性评测集上MiniCPM 与 Mistral-7B相近中文、数学、代码能力更优整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。
- 经过 DPO 后MiniCPM 在当前最接近用户体感的评测集 MTBench上MiniCPM-2B 也超越了 Llama2-70B-Chat、Vicuna-33B、Mistral-7B-Instruct-v0.1、Zephyr-7B-alpha 等众多代表性开源大模型。
- 以 MiniCPM-2B 为基础构建端侧多模态大模型 MiniCPM-V,整体性能在同规模模型中实现最佳,超越基于 Phi-2 构建的现有多模态大模型,在部分评测集上达到与 9.6B Qwen-VL-Chat 相当甚至更好的性能
- 经过 SFT 后MiniCPM-2B 在公开综合性评测集上与 Mistral-7B 表现相近(中文、数学、代码能力更优),整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。
- 经过 DPO 后MiniCPM-2B 在当前最接近用户体感的评测集 MTBench 上也超越了 Llama2-70B-Chat、Vicuna-33B、Mistral-7B-Instruct-v0.1、Zephyr-7B-alpha 等众多代表性开源大模型。
- 以 MiniCPM-2B 为基础构建端侧多模态大模型 MiniCPM-V 2.0,在多个测试基准中实现了 7B 以下模型的最佳性能,在 OpenCompass 榜单上超过了 Qwen-VL-Chat 9.6B、CogVLM-Chat 17.4B 和 Yi-VL 34B 等更大参数规模的模型。MiniCPM-V 2.0 还展现出领先的 OCR 能力,在场景文字识别能力上接近 Gemini Pro
- 经过 Int4 量化后MiniCPM 可在手机上进行部署推理流式输出速度略高于人类说话速度。MiniCPM-V 也直接跑通了多模态大模型在手机上的部署。
- 一张1080/2080可高效参数微调一张3090/4090可全参数微调一台机器可持续训练 MiniCPM二次开发成本较低。
我们完全开源MiniCPM-2B的模型参数供学术研究和有限商用.
我们完全开源MiniCPM系列的模型参数供学术研究和有限商用.
具体而言,我们目前已公开以下模型,地址详见 [模型下载](#1) 部分
- 基于MiniCPM-2B的指令微调与人类偏好对齐**MiniCPM-2B-SFT/DPO**。
- 基于MiniCPM-2B的多模态模型**MiniCPM-V**能力超越基于Phi-2的同参数级别多模态模型
- 基于MiniCPM-2B的指令微调与人类偏好对齐版本**MiniCPM-2B-SFT/DPO**。
- 基于MiniCPM-2B的多模态模型**MiniCPM-V 2.0**。
- MiniCPM-2B-SFT/DPO的Int4量化版**MiniCPM-2B-SFT/DPO-Int4**。
- MiniCPM-2B的128k长文本版本**MiniCPM-2B-128k**。
- MiniCPM-2B的MoE版本**MiniCPM-MoE-8x2B**。
- 更轻量级的MiniCPM-1B指令微调版本**MiniCPM-1B-SFT**。
- 基于MLC-LLM、LLMFarm开发的MiniCPM手机端程序**文本及多模态模型均可在手机端进行推理**。
- 训练过程中的[30个Checkpoints](https://huggingface.co/openbmb/MiniCPM-2B-history)供模型机理研究。
- MiniCPM-2B训练过程中的[30个Checkpoints](https://huggingface.co/openbmb/MiniCPM-2B-history)供模型机理研究。
### 局限性:
@ -60,7 +61,7 @@ MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的
<p id="0"></p>
## 更新日志
- 2024/04/11 开源[MiniCPM-V-v2.0](https://huggingface.co/openbmb/MiniCPM-V-v2.0)、[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)和[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)。
- 2024/04/11 开源[MiniCPM-V-2.0](https://huggingface.co/openbmb/MiniCPM-V-2.0)、[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)、[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)和[MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)
- 2024/03/16 MiniCPM-2B 的30余个中间检查点开放了[huggingface链接](https://huggingface.co/openbmb/MiniCPM-2B-history)
- 2024/02/13 支持了llama.cpp
- 2024/02/09 我们在readme里加入了一个[开源社区](#community)章节用来收集开源社区对MiniCPM的支持案例。
@ -76,18 +77,18 @@ MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的
| HuggingFace | ModelScope | WiseModel |
|-------------|------------|-----------|
|[MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[MiniCPM-2B-sft-bf16](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)|[MiniCPM-2B-sft-bf16](https://wisemodel.cn/models/OpenBMB/miniCPM-bf16)|
|[MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[MiniCPM-2B-dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://replicate.com/tuantuanzhang/minicpm)
|[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) |[MiniCPM-2B-128k](https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/summary)|
|[MiniCPM-MoE-8x2B]() |[MiniCPM-MoE-8x2B]()|
|[MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[MiniCPM-2B-dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16)|
|[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) |[MiniCPM-2B-128k](https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/summary)|
|[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) |[MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B)|
|[MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) |
注:
1. 模型训练为bf16训练因此用bf16进行推理将取得最好的效果其他的格式会由于精度问题造成一点的性能下降。
2. 更多模型版本见[这里](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f)。
注: 更多模型版本见[这里](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f)。
* 多模态模型
| HuggingFace | ModelScope | WiseModel |
|-------------|------------|-----------|
| [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2.0) | [MiniCPM-V 2.0](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2.0/) |
| [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V) | [MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/) | [MiniCPM-V](https://wisemodel.cn/models/OpenBMB/MiniCPM-V) |
| [OmniLMM](https://huggingface.co/openbmb/OmniLMM-12B) | [OmniLMM](https://modelscope.cn/models/OpenBMB/OmniLMM-12B) | [OmniLMM](https://wisemodel.cn/models/OpenBMB/OmniLMM-12B) |
@ -541,9 +542,7 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
#### 部署性能
* 我们未针对手机推理模型进行深度优化和系统测试仅验证MiniCPM使用手机芯片进行推理的可行性。
* 【更正】在本工作之前已有初步的基于llama.cpp进行手机部署多模态大模型的[努力](https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/MobileVLM-README.md)我们此次在MLC-LLM上验证了手机部署MiniCPM-V的可行性能够正常输入输出但也存在图片处理时间较长的问题需要进一步优化兼容性问题也需要进一步解决 :)。
* **我们也欢迎更多开发者进一步调优并更新下面的测试列表,不断提升端侧大模型在手机上的推理性能**
* 我们未针对手机推理模型进行深度优化和系统测试仅验证MiniCPM使用手机芯片进行推理的可行性。**我们也欢迎更多开发者进一步调优并更新下面的测试列表,不断提升端侧大模型在手机上的推理性能**。
|手机型号|操作系统|处理器|MemoryGB|文本吞吐token/s|
|-|-|-|-|-|
@ -567,7 +566,14 @@ print(model.response("<用户>山东省最高的山是哪座山, 它比黄山高
|iPhone 11|iOS 16.6|A13|4|4.6|
|Xiaomi Redmi K50|HyperOS 1.0.2|MediaTek Dimensity 8100|12|3.5
![多模态样例](https://github.com/OpenBMB/OmniLMM/blob/main/assets/Snake_cn_Mushroom_en.gif)
* 我们也使用MLC-LLM验证了在手机上部署MiniCPM-V系列模型的可行性能够正常输入输出但也存在图片处理时间较长的问题需要进一步优化兼容性问题也需要进一步解决。下面的动图是使用小米14 Pro运行MiniCPM-V 2.0的屏幕录像,没有进行任何编辑。
<table align="center">
<p align="center">
<img src="https://github.com/OpenBMB/OmniLMM/blob/main/assets/gif_cases/station.gif" width=36%/>
<img src="https://github.com/OpenBMB/OmniLMM/blob/main/assets/gif_cases/english_menu.gif" width=36%/>
</p>
</table>
<p id="5"></p>