diff --git a/README-en.md b/README-en.md index 2276636..d3b5959 100644 --- a/README-en.md +++ b/README-en.md @@ -1,7 +1,5 @@
Technical Blog | +MiniCPM Wiki (in Chinese) | MiniCPM Paper | MiniCPM-V Repo | Join our discord and WeChat
-MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings (2.7B in total). +## Changelog🔥 -- MiniCPM has very close performance compared with Mistral-7B on open-sourced general benchmarks with better ability on Chinese, Mathematics and Coding after SFT. The overall performance exceeds Llama2-13B, MPT-30B, Falcon-40B, etc. -- After DPO, MiniCPM outperforms Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, Zephyr-7B-alpha, etc. on MTBench. -- MiniCPM-V 2.0, based on MiniCPM-2B, achieves state-of-the-art performance on multiple benchmarks among models under 7B parameters. It even outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass. MiniCPM-V 2.0 also shows strong OCR capability, achieving comparable performance to Gemini Pro in scene-text understanding. -- MiniCPM can be deployed and infer on smartphones, and the speed of streaming output is relatively higher than human verbal speed. MiniCPM-V has also successfully deployed multi-modal models on smartphones. -- The cost of developing based on MiniCPM is low. Parameter efficient finetuning can be conducted with a single 1080/2080 GPU and full parameter finetuning can be conducted with a 3090/4090 GPU. - -We release all model parameters for research and limited commercial use. - -- SFT and DPO version based on MiniCPM-2B: **MiniCPM-2B-SFT/DPO** -- The multi-modal model **MiniCPM-V 2.0** based on MiniCPM-2B. -- The INT4 quantized version **MiniCPM-2B-SFT/DPO-Int4** based on MiniCPM-2B-SFT/DPO -- The 128k long context version of MiniCPM-2B: **MiniCPM-2B-128k**. -- The MoE version of MiniCPM-2B: **MiniCPM-MoE-8x2B**. -- SFT version of MiniCPM-1B, a lighter-weight model: **MiniCPM-1B-SFT**. -- Mobile phone application based on MLC-LLM and LLMFarm. Both language model and multimodel model can conduct inference on smartphones. -- 30 Intermidiate [checkpoints](https://huggingface.co/openbmb/MiniCPM-2B-history) of MiniCPM-2B for academic purpose. - -### Limitations - -- Due to limitations in model size, the model may experience hallucinatory issues. As DPO model tend to generate longer response, hallucinations are more likely to occur. We will also continue to iterate and improve the MiniCPM model. -- To ensure the generality of the model for academic research purposes, we have not subject it to any identity-specific training. Meanwhile, as we use ShareGPT open-source corpus as part of the training data, the model may output identity-related information similar to the GPT series models. -- Due to the limitation of model size, the output of the model is greatly influenced by prompts, which may result in inconsistent results from multiple attempts. -- Due to limited model capacity, the model's knowledge recall may not be accurate. In the future, we will combine the RAG method to enhance the model's knowledge retention ability. +- [2024.09.05] We release [**MiniCPM3-4B**](https://huggingface.co/openbmb/MiniCPM3-4B)! This model outperforms Phi-3.5-mini-instruct and GPT-3.5-Turbo-0125 and is comparable to several models with 7B-9B parameters like Llama3.1-8B-Instruct, Qwen2-7B-Instruct, and GLM-4-9B-Chat. +- [2024.07.05] Released [MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)! This model achieves an average sparsity of 87.89% in the FFN layer, reducing FFN FLOPs by 84%, while maintaining downstream task performance. +- [2024.04.11] Released [MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k), [MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) and [MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)! Click [here](https://openbmb.vercel.app/) to read our technical blog. +- [2024.03.16] Intermediate checkpoints of MiniCPM-2B were released [here](https://huggingface.co/openbmb/MiniCPM-2B-history)! +- [2024.02.01] Released [**MiniCPM-2B**](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)! This model performs similarly to Mistral-7B on public benchmarks (with better performance in Chinese, math, and code abilities) and overall outperforms models like Llama2-13B, MPT-30B, and Falcon-40B. ## Quick Links -- [Updates](#0) -- [Downloading](#1) -- [Quick Start](#2) -- [Community](#community) -- [Benchmark](#3) -- [Deployment on Mobile Phones](#4) -- [Demo & API](#5) -- [Fine-tuning Models](#6) -- [LICENSE](#7) -- [Citation](#8) -- [Show Cases](#9) -- - +- [Model Downloads](#model-downloads) +- [MiniCPM 3.0](#minicpm-30) + - [Evaluation Results](#evaluation-results) + - [Comprehensive Evaluation](#comprehensive-evaluation) + - [Function Calling](#function-calling) + - [Long Context](#long-context) + - [Inference](#inference) + - [HuggingFace](#huggingface) + - [vLLM](#vllm) + - [llama.cpp](#llamacpp) + - [Fine-Tuning](#fine-tuning) + - [LLaMA-Factory](#llama-factory) + - [Advanced Features](#advanced-features) + - [Function Calling](#function-calling-1) + - [Code Interpreter](#code-interpreter) +- [MiniCPM 2.0](#minicpm-20) +- [MiniCPM 1.0](#minicpm-10) -## Common Modules -The following table allows you quick access to commonly used engineering modules. If you need extensive and detailed tutorials, please click on [Tutorials]((https://modelbest.feishu.cn/wiki/D2tFw8Pcsi5CIzkaHNacLK64npg?from=from_copylink)). -| [infer](#2) | [finetune](#6) | [deployment](#4) | [quantize](#quantize) -|-------------|------------|-----------|-----------| -|[Transformers](#Huggingface)|[Transformers](#6)|[MLC](#MLC)|[GPTQ](#gptq)| -|[vLLM](#vLLM)|[mlx_finetune](#mlx_finetune)|[llama.cpp](#llama.cpp)|[AWQ](#awq)| -|[llama.cpp](#llama.cpp)|[LLaMA-Factory](./finetune/llama_factory_example/README.md)||[bnb](#bnb)| -|[ollama](#ollama)|||[quantize_test](#quantize_test)| -|[fastllm](#fastllm)|||| -|[mlx_lm](#mlx)|||| -|[powerinfer](#powerinfer)|||| -## Update Log -- **2024/04/11 We release [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2.0), [MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k), [MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) and [MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)! Click [here](https://openbmb.vercel.app/) to read our technical blog.** -- 2024/03/16 Intermediate checkpoints were released [here](https://huggingface.co/openbmb/MiniCPM-2B-history)! -- 2024/02/13 We support llama.cpp -- 2024/02/09 We have included a [Community](#community) section in the README to encourage support for MiniCPM from the open-source community. -- 2024/02/08 We updated the [llama-format model weights](#llamaformat), which can be loaded into LlamaModel directly, making it more convenient for everyone to use our model quickly. -- 2024/02/01 Initial release. +## Model Downloads - - -## Downloading - -* Language Model - - | HuggingFace | ModelScope | WiseModel | - |-------------|------------|-----------| - |[MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[MiniCPM-2B-sft-bf16](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)|[MiniCPM-2B-sft-bf16](https://wisemodel.cn/models/OpenBMB/miniCPM-bf16)| - |[MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[MiniCPM-2B-dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16)| + | HuggingFace | ModelScope | + |-------------|------------| + |[MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)| + |[MiniCPM-2B-sft](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[MiniCPM-2B-sft](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)| + |[MiniCPM-2B-dpo](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)| |[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) |[MiniCPM-2B-128k](https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/summary)| |[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) |[MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B)| - |[MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) | [MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16) | + |[MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) | [MiniCPM-1B](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16) | + |[MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)|[MiniCPM-S-1B](https://modelscope.cn/models/OpenBMB/MiniCPM-S-1B-sft)| - Note: More model versions can be found [here](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f). - -* Multimodel Model - - | HuggingFace | ModelScope | WiseModel | - |-------------|------------|-----------| - | [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2) | [MiniCPM-V 2.0](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) | - | [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V) | [MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/) | [MiniCPM-V](https://wisemodel.cn/models/OpenBMB/MiniCPM-V) | - | [OmniLMM-12B](https://huggingface.co/openbmb/OmniLMM-12B) | [OmniLMM-12B](https://modelscope.cn/models/OpenBMB/OmniLMM-12B) | [OmniLMM-12B](https://wisemodel.cn/models/OpenBMB/OmniLMM-12B) | +Note: More model versions can be found [here](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f). +## MiniCPM 3.0 - +MiniCPM 3.0 is a language model with 4 billion parameters. Compared to MiniCPM 1.0/2.0, it offers more comprehensive features and a significant improvement in overall capabilities. Its performance on most evaluation benchmarks rivals or even surpasses many models with 7B-9B parameters. -## Quick Start +* **Supports Function Call🛠️ and Code Interpreter💻**: Achieved SOTA among models with fewer than 9B parameters on the [Berkeley Function Calling Leaderboard (BFCL)](https://gorilla.cs.berkeley.edu/leaderboard.html), outperforming GLM-4-9B-Chat and Qwen2-7B-Instruct. +* **Exceptional Reasoning Ability🧮**: In terms of math abilities, it outperforms GPT-3.5-Turbo and several 7B-9B models on [MathBench](https://open-compass.github.io/MathBench/). On the highly challenging [LiveCodeBench](https://livecodebench.github.io/), it surpasses Llama3.1-8B-Instruct. +* **Outstanding Instruction-Following in English and Chinese🤖**: Exceeds GLM-4-9B-Chat and Qwen2-7B-Instruct on English instruction following with [IFEval](https://huggingface.co/datasets/google/IFEval) and on Chinese instruction following with [FollowBench-zh](https://huggingface.co/datasets/YuxinJiang/FollowBench). +* **Long Context Capability**: Natively supports 32k context length, with flawless performance. We introduce the **LLM x MapReduce** approach, theoretically enabling processing of context lengths up to infinity. +* **RAG Capability**:We release [MiniCPM RAG Suite](https://huggingface.co/collections/openbmb/minicpm-rag-suite-66d976b4204cd0a4f8beaabb). Based on the MiniCPM series models, [MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) and [MiniCPM-Reranker](https://huggingface.co/openbmb/MiniCPM-Reranker) achieve SOTA performance on Chinese and Chinese-English cross-lingual retrieval tests. Specifically designed for the RAG scenario, [MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA) outperforms models like Llama3-8B and Baichuan2-13B on multiple tasks, such as open-domain question answering. -#### Online +## Evaluation Results -- [Colab](https://colab.research.google.com/drive/1tJcfPyWGWA5HezO7GKLeyeIso0HyOc0l?usp=sharing) +### Comprehensive Evaluation - +| Benchmarks | +Qwen2-7B-Instruct | +GLM-4-9B-Chat | +Gemma2-9B-it | +Llama3.1-8B-Instruct | +GPT-3.5-Turbo-0125 | +Phi-3.5-mini-Instruct(3.8B) | +MiniCPM3-4B | +|||||||
| English | +||||||||||||||
| MMLU | +70.5 | +72.4 | +72.6 | +69.4 | +69.2 | +68.4 | +67.2 | +|||||||
| BBH | +64.9 | +76.3 | +65.2 | +67.8 | +70.3 | +68.6 | +70.2 | +|||||||
| MT-Bench | +8.41 | +8.35 | +7.88 | +8.28 | +8.17 | +8.60 | +8.41 | +|||||||
| IFEVAL (Prompt Strict-Acc.) | +51.0 | +64.5 | +71.9 | +71.5 | +58.8 | +49.4 | +68.4 | +|||||||
| Chinese | +||||||||||||||
| CMMLU | +80.9 | +71.5 | +59.5 | +55.8 | +54.5 | +46.9 | +73.3 | +|||||||
| CEVAL | +77.2 | +75.6 | +56.7 | +55.2 | +52.8 | +46.1 | +73.6 | +|||||||
| AlignBench v1.1 | +7.10 | +6.61 | +7.10 | +5.68 | +5.82 | +5.73 | +6.74 | +|||||||
| FollowBench-zh (SSR) | +63.0 | +56.4 | +57.0 | +50.6 | +64.6 | +58.1 | +66.8 | +|||||||
| Mathematics | +||||||||||||||
| MATH | +49.6 | +50.6 | +46.0 | +51.9 | +41.8 | +46.4 | +46.6 | +|||||||
| GSM8K | +82.3 | +79.6 | +79.7 | +84.5 | +76.4 | +82.7 | +81.1 | +|||||||
| MathBench | +63.4 | +59.4 | +45.8 | +54.3 | +48.9 | +54.9 | +65.6 | +|||||||
| Coding | +||||||||||||||
| HumanEval+ | +70.1 | +67.1 | +61.6 | +62.8 | +66.5 | +68.9 | +68.3 | +|||||||
| MBPP+ | +57.1 | +62.2 | +64.3 | +55.3 | +71.4 | +55.8 | +63.2 | +|||||||
| LiveCodeBench | +22.2 | +20.2 | +19.2 | +20.4 | +24.0 | +19.6 | +22.6 | +|||||||
| Tool Use | +||||||||||||||
| BFCL | +71.6 | +70.1 | +19.2 | +73.3 | +75.4 | +48.4 | +76.0 | +|||||||
| Overall | +||||||||||||||
| Average | +65.3 | +65.0 | +57.9 | +60.8 | +61.0 | +57.2 | +66.3 | +|||||||
| Model | +Overall Accuracy | +AST Summary | +Exec Summary | +Irrelevance Detection | +Relevance Detection | +
| MiniCPM3-4B | +76.03% | +68.55% | +85.54% | +53.71% | +90.24% | +
| Llama3.1-8B-Instruct | +73.28% | +64.61% | +86.48% | +43.12% | +85.37% | +
| Qwen2-7B-Instruct | +71.61% | +65.71% | +79.57% | +44.70% | +90.24% | +
| GLM-4-9B-Chat | +70.08% | +60.69% | +80.02% | +55.02% | +82.93% | +
| Phi-3.5-mini-instruct | +48.44% | +38.89% | +54.04% | +46.78% | +65.85% | +
| Gemma2-9B-it | +19.18% | +5.41% | +18.50% | +88.88% | +7.32% | +
| Model | -Size | -TextVQA val | -DocVQA test | -OCRBench | -OpenCompass | -MME | -MMB dev(en) | -MMB dev(zh) | -MMMU val | -MathVista | -LLaVA Bench | -Object HalBench | -
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Proprietary models | -||||||||||||
| Gemini Pro Vision | -- | -74.6 | -88.1 | -680 | -63.8 | -2148.9 | -75.2 | -74.0 | -48.9 | -45.8 | -79.9 | -- | -
| GPT-4V | -- | -78.0 | -88.4 | -645 | -63.2 | -1771.5 | -75.1 | -75.0 | -53.8 | -47.8 | -93.1 | -86.4 / 92.7 | -
| Open-source models 6B~34B | -||||||||||||
| Yi-VL-6B | -6.7B | -45.5* | -17.1* | -290 | -49.3 | -1915.1 | -68.6 | -68.3 | -40.3 | -28.8 | -51.9 | -- | -
| Qwen-VL-Chat | -9.6B | -61.5 | -62.6 | -488 | -52.1 | -1860.0 | -60.6 | -56.7 | -37.0 | -33.8 | -67.7 | -56.2 / 80.0 | -
| Yi-VL-34B | -34B | -43.4* | -16.9* | -290 | -52.6 | -2050.2 | -71.1 | -71.4 | -45.1 | -30.7 | -62.3 | -- | -
| DeepSeek-VL-7B | -7.3B | -64.7* | -47.0* | -435 | -55.6 | -1765.4 | -74.1 | -72.8 | -38.3 | -36.8 | -77.8 | -- | -
| TextMonkey | -9.7B | -64.3 | -66.7 | -558 | -- | -- | -- | -- | -- | -- | -- | -- | -
| CogVLM-Chat | -17.4B | -70.4 | -33.3* | -590 | -52.5 | -1736.6 | -63.7 | -53.8 | -37.3 | -34.7 | -73.9 | -73.6 / 87.4 | -
| Open-source models 1B~3B | -||||||||||||
| DeepSeek-VL-1.3B | -1.7B | -58.4* | -37.9* | -413 | -46.0 | -1531.6 | -64.0 | -61.2 | -33.8 | -29.4 | -51.1 | -- | -
| MobileVLM V2 | -3.1B | -57.5 | -19.4* | -- | -- | -1440.5(P) | -63.2 | -- | -- | -- | -- | -- | -
| Mini-Gemini | -2.2B | -56.2 | -34.2* | -- | -- | -1653.0 | -59.8 | -- | -31.7 | -- | -- | -- | -
| MiniCPM-V | -2.8B | -60.6 | -38.2 | -366 | -47.6 | -1650.2 | -67.9 | -65.3 | -38.3 | -28.9 | -51.3 | -78.4 / 88.5 | -
| MiniCPM-V 2.0 | -2.8B | -74.1 | -71.9 | -605 | -55.0 | -1808.6 | -69.6 | -68.1 | -38.2 | -38.7 | -69.2 | -85.5 / 92.2 | -
[Modelbest Inc.](https://modelbest.cn/)
+-
[THUNLP](https://nlp.csai.tsinghua.edu.cn/)
## Citation
diff --git a/README.md b/README.md
index dc03df9..eb60413 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,5 @@
MiniCPM 技术博客 | +MiniCPM 知识库 | MiniCPM 论文 | MiniCPM-V 仓库 | 加入我们的 discord 和 微信群
-MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的系列端侧大模型,主体语言模型 MiniCPM-2B 仅有 24亿(2.4B)的非词嵌入参数量, 总计2.7B参数量。 -- 经过 SFT 后,MiniCPM-2B 在公开综合性评测集上与 Mistral-7B 表现相近(中文、数学、代码能力更优),整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。 -- 经过 DPO 后,MiniCPM-2B 在当前最接近用户体感的评测集 MTBench 上也超越了 Llama2-70B-Chat、Vicuna-33B、Mistral-7B-Instruct-v0.1、Zephyr-7B-alpha 等众多代表性开源大模型。 -- 以 MiniCPM-2B 为基础构建端侧多模态大模型 MiniCPM-V 2.0,在多个测试基准中实现了 7B 以下模型的最佳性能,在 OpenCompass 榜单上超过了 Qwen-VL-Chat 9.6B、CogVLM-Chat 17.4B 和 Yi-VL 34B 等更大参数规模的模型。MiniCPM-V 2.0 还展现出领先的 OCR 能力,在场景文字识别能力上接近 Gemini Pro。 -- 经过 Int4 量化后,MiniCPM 可在手机上进行部署推理,流式输出速度略高于人类说话速度。MiniCPM-V 也直接跑通了多模态大模型在手机上的部署。 -- 一张1080/2080可高效参数微调,一张3090/4090可全参数微调,一台机器可持续训练 MiniCPM,二次开发成本较低。 +## 更新日志🔥 -我们完全开源MiniCPM系列的模型参数供学术研究和有限商用。 -具体而言,我们目前已公开以下模型,地址详见 [模型下载](#1) 部分 -- 基于MiniCPM-2B的指令微调与人类偏好对齐版本**MiniCPM-2B-SFT/DPO**。 -- 基于MiniCPM-2B的多模态模型**MiniCPM-V 2.0**。 -- MiniCPM-2B-SFT/DPO的Int4量化版**MiniCPM-2B-SFT/DPO-Int4**。 -- MiniCPM-2B的128k长文本版本**MiniCPM-2B-128k**。 -- MiniCPM-2B的MoE版本**MiniCPM-MoE-8x2B**。 -- 更轻量级的MiniCPM-1B指令微调版本**MiniCPM-1B-SFT**。 -- 基于MLC-LLM、LLMFarm开发的MiniCPM手机端程序,**文本及多模态模型均可在手机端进行推理**。 -- MiniCPM-2B训练过程中的[30个Checkpoints](https://huggingface.co/openbmb/MiniCPM-2B-history)供模型机理研究。 - -### 局限性: - -- 受限于模型规模,模型可能出现**幻觉性问题**。其中由于DPO模型生成的回复内容更长,更容易出现幻觉。我们也将持续进行MiniCPM模型的迭代改进。 -- 为了保证在学术研究用途上模型的通用性,我们**未对模型进行任何身份认同训练**。同时由于我们用ShareGPT开源语料作为部分训练数据,模型可能会输出类似GPT系列模型的身份认同信息。 -- 受限于模型规模,模型的**输出受到提示词(prompt)的影响较大**,可能多次尝试产生不一致的结果。 -- 受限于模型容量,模型的**知识记忆较不准确**,后续我们将结合RAG方法来增强模型的知识记忆能力。 +- [2024.09.05] 发布 [**MiniCPM3-4B**](https://huggingface.co/openbmb/MiniCPM3-4B)!该模型的表现超越 Phi-3.5-mini-instruct 和 GPT-3.5-Turbo-0125,并且能够比肩 Llama3.1-8B-Instruct、Qwen2-7B-Instruct、GLM-4-9B-Chat 等多个 7B-9B 参数量的模型。 +- [2024.07.05] 发布 [MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)!该模型在保持下游任务性能无损的前提下,FFN 层实现了 87.89% 的平均稀疏度,将 FFN FLOPs 降低了 84%。 +- [2024.04.11] 发布 [MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)、[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) 和 [MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)!点击[这里](https://openbmb.vercel.app/?category=Chinese+Blog)查看技术博客。 +- [2024.03.16] MiniCPM-2B 的 30 余个中间检查点开放了 +- [2024.02.01] 发布 [**MiniCPM-2B**](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)!该模型在公开评测集上与 Mistral-7B 表现相近(中文、数学、代码能力更优),整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。 ## 目录 -- [更新日志](#0)| -- [模型下载](#1)| -- [快速上手](#2)| -- [模型量化](#quantize)| -- [开源社区](#community)| -- [评测结果](#3)| -- [手机部署](#4)| -- [Demo & API 部署](#5)| -- [二次开发](#6)| -- [开源协议](#7)| -- [工作引用](#8)| -- [典型示例](#9)| +- [模型下载](#模型下载) +- [MiniCPM 3.0](#minicpm-30) + - [评测结果](#评测结果) + - [综合评测](#综合评测) + - [工具调用能力](#工具调用能力) + - [长文本能力](#长文本能力) + - [模型推理](#模型推理) + - [HuggingFace](#huggingface) + - [vLLM](#vllm) + - [llama.cpp](#llamacpp) + - [模型微调](#模型微调) + - [LLaMA-Factory](#llama-factory) + - [进阶功能](#进阶功能) + - [工具调用](#工具调用) + - [代码解释器](#代码解释器) +- [MiniCPM 2.0](#minicpm-20) +- [MiniCPM 1.0](#minicpm-10) -## 常用模块导航 -以下表格可以让你快速访问常用的工程模块,如果你需要广泛而详细的教程请点击[教程](https://modelbest.feishu.cn/wiki/D2tFw8Pcsi5CIzkaHNacLK64npg?from=from_copylink) - -| [推理](#2) | [微调](#6) | [手机部署](#4) | [量化](#quantize) -|-------------|------------|-----------|-----------| -|[Transformers](#Huggingface模型)|[Transformers](#transformer_finetune)|[MLC部署](#MLC)|[GPTQ](#gptq)| -|[vLLM](#vllm-推理)|[mlx_finetune](#mlx)|[llama.cpp](#llama.cpp)|[AWQ](#awq)| -|[llama.cpp](#llama.cpp)|[LLaMA-Factory](./finetune/llama_factory_example/README.md)||[bnb](#bnb)| -|[ollama](#ollama)|||[量化测试](#quantize_test)| -|[fastllm](#fastllm)|||| -|[mlx_lm](#mlx_lm)|||| -|[powerinfer](#powerinfer)|||| - - -## 更新日志 -- **2024/04/11 开源[MiniCPM-V-2.0](https://huggingface.co/openbmb/MiniCPM-V-2.0)、[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)、[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)和[MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)!点击[这里](https://openbmb.vercel.app/?category=Chinese+Blog)查看技术博客。** -- 2024/03/16 MiniCPM-2B 的30余个中间检查点开放了 -- 2024/02/13 支持了llama.cpp -- 2024/02/09 我们在README里加入了一个[开源社区](#community)章节,用来收集开源社区对MiniCPM的支持案例。 -- 2024/02/08 我们更新了[llama-format的模型权重](#llamaformat),方便大家更加快捷地使用我们的模型。 -- 2024/02/01 初始发布。 - - ## 模型下载 - -* 语言模型 - | HuggingFace | ModelScope | WiseModel | - |-------------|------------|-----------| - |[MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[MiniCPM-2B-sft-bf16](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)|[MiniCPM-2B-sft-bf16](https://wisemodel.cn/models/OpenBMB/miniCPM-bf16)| - |[MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[MiniCPM-2B-dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16)| + | HuggingFace | ModelScope | + |-------------|------------| + |[MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)| + |[MiniCPM-2B-sft](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[MiniCPM-2B-sft](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)| + |[MiniCPM-2B-dpo](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)| |[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) |[MiniCPM-2B-128k](https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/summary)| |[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) |[MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B)| - |[MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) | [MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16) | + |[MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) | [MiniCPM-1B](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16) | + |[MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)|[MiniCPM-S-1B](https://modelscope.cn/models/OpenBMB/MiniCPM-S-1B-sft)| 注: 更多模型版本见[这里](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f)。 -* 多模态模型 - | HuggingFace | ModelScope | WiseModel | - |-------------|------------|-----------| - | [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2) | [MiniCPM-V 2.0](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2) | - | [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V) | [MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/) | [MiniCPM-V](https://wisemodel.cn/models/OpenBMB/MiniCPM-V) | - | [OmniLMM-12B](https://huggingface.co/openbmb/OmniLMM-12B) | [OmniLMM-12B](https://modelscope.cn/models/OpenBMB/OmniLMM-12B) | [OmniLMM-12B](https://wisemodel.cn/models/OpenBMB/OmniLMM-12B) | +## MiniCPM 3.0 - +MiniCPM 3.0 是一个 4B 参数量的语言模型,相比 MiniCPM1.0/2.0,功能更加全面,综合能力大幅提升,多数评测集上的效果比肩甚至超越众多 7B-9B 模型。 +* **支持工具调用🛠️(Function Calling)和代码解释器💻(Code Interpreter)**:[Berkeley Function Calling Leaderboard (BFCL)](https://gorilla.cs.berkeley.edu/leaderboard.html) 上取得 9B 规模以下 SOTA,超越 GLM-4-9B-Chat、Qwen2-7B-Instruct。 +* **超强的推理能力🧮**:数学能力方面,[MathBench](https://open-compass.github.io/MathBench/) 上的效果超越 GPT-3.5-Turbo 以及多个 7B-9B 模型。在非常具有挑战性的 [LiveCodeBench](https://livecodebench.github.io/) 上,效果超越 Llama3.1-8B-Instruct。 +* **出色的中英文指令遵循能力🤖**:英文指令遵循 [IFEval](https://huggingface.co/datasets/google/IFEval)、中文指令遵循 [FollowBench-zh](https://huggingface.co/datasets/YuxinJiang/FollowBench) 效果超越 GLM-4-9B-Chat、Qwen2-7B-Instruct。 +* **长文本能力**:原生支持 32k 上下文长度,32k 长度内大海捞针全绿。提出 **LLM x MapReduce** ,理论可处理的上下文长度达到 +∞。 +* **RAG能力**:我们发布了 [MiniCPM RAG 套件](https://huggingface.co/collections/openbmb/minicpm-rag-suite-66d976b4204cd0a4f8beaabb)。基于 MiniCPM 系列模型的 [MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding)、[MiniCPM-Reranker](https://huggingface.co/openbmb/MiniCPM-Reranker) 在中文、中英跨语言检索测试中取得 SOTA 表现;针对 RAG 场景的 [MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA) 在开放域问答等多项任务上超越 Llama3-8B、Baichuan2-13B 等模型。 +### 评测结果 - +#### 综合评测 -## 快速上手 +| 评测集 | +Qwen2-7B-Instruct | +GLM-4-9B-Chat | +Gemma2-9B-it | +Llama3.1-8B-Instruct | +GPT-3.5-Turbo-0125 | +Phi-3.5-mini-Instruct(3.8B) | +MiniCPM3-4B | +|||||||
| 英文能力 | +||||||||||||||
| MMLU | +70.5 | +72.4 | +72.6 | +69.4 | +69.2 | +68.4 | +67.2 | +|||||||
| BBH | +64.9 | +76.3 | +65.2 | +67.8 | +70.3 | +68.6 | +70.2 | +|||||||
| MT-Bench | +8.41 | +8.35 | +7.88 | +8.28 | +8.17 | +8.60 | +8.41 | +|||||||
| IFEVAL (Prompt Strict-Acc.) | +51.0 | +64.5 | +71.9 | +71.5 | +58.8 | +49.4 | +68.4 | +|||||||
| 中文能力 | +||||||||||||||
| CMMLU | +80.9 | +71.5 | +59.5 | +55.8 | +54.5 | +46.9 | +73.3 | +|||||||
| CEVAL | +77.2 | +75.6 | +56.7 | +55.2 | +52.8 | +46.1 | +73.6 | +|||||||
| AlignBench v1.1 | +7.10 | +6.61 | +7.10 | +5.68 | +5.82 | +5.73 | +6.74 | +|||||||
| FollowBench-zh (SSR) | +63.0 | +56.4 | +57.0 | +50.6 | +64.6 | +58.1 | +66.8 | +|||||||
| 数学能力 | +||||||||||||||
| MATH | +49.6 | +50.6 | +46.0 | +51.9 | +41.8 | +46.4 | +46.6 | +|||||||
| GSM8K | +82.3 | +79.6 | +79.7 | +84.5 | +76.4 | +82.7 | +81.1 | +|||||||
| MathBench | +63.4 | +59.4 | +45.8 | +54.3 | +48.9 | +54.9 | +65.6 | +|||||||
| 代码能力 | +||||||||||||||
| HumanEval+ | +70.1 | +67.1 | +61.6 | +62.8 | +66.5 | +68.9 | +68.3 | +|||||||
| MBPP+ | +57.1 | +62.2 | +64.3 | +55.3 | +71.4 | +55.8 | +63.2 | +|||||||
| LiveCodeBench | +22.2 | +20.2 | +19.2 | +20.4 | +24.0 | +19.6 | +22.6 | +|||||||
| 工具调用能力 | +||||||||||||||
| BFCL | +71.6 | +70.1 | +19.2 | +73.3 | +75.4 | +48.4 | +76.0 | +|||||||
| 综合能力 | +||||||||||||||
| 平均分 | +65.3 | +65.0 | +57.9 | +60.8 | +61.0 | +57.2 | +66.3 | +|||||||
| 模型 | +总体准确率 | +AST Summary | +Exec Summary | +Irrelevance Detection | +Relevance Detection | +
| MiniCPM3-4B | +76.03% | +68.55% | +85.54% | +53.71% | +90.24% | +
| Llama3.1-8B-Instruct | +73.28% | +64.61% | +86.48% | +43.12% | +85.37% | +
| Qwen2-7B-Instruct | +71.61% | +65.71% | +79.57% | +44.70% | +90.24% | +
| GLM-4-9B-Chat | +70.08% | +60.69% | +80.02% | +55.02% | +82.93% | +
| Phi-3.5-mini-instruct | +48.44% | +38.89% | +54.04% | +46.78% | +65.85% | +
| Gemma2-9B-it | +19.18% | +5.41% | +18.50% | +88.88% | +7.32% | +
| Model | -Size | -TextVQA val | -DocVQA test | -OCRBench | -OpenCompass | -MME | -MMB dev(en) | -MMB dev(zh) | -MMMU val | -MathVista | -LLaVA Bench | -Object HalBench | -
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Proprietary models | -||||||||||||
| Gemini Pro Vision | -- | -74.6 | -88.1 | -680 | -63.8 | -2148.9 | -75.2 | -74.0 | -48.9 | -45.8 | -79.9 | -- | -
| GPT-4V | -- | -78.0 | -88.4 | -645 | -63.2 | -1771.5 | -75.1 | -75.0 | -53.8 | -47.8 | -93.1 | -86.4 / 92.7 | -
| Open-source models 6B~34B | -||||||||||||
| Yi-VL-6B | -6.7B | -45.5* | -17.1* | -290 | -49.3 | -1915.1 | -68.6 | -68.3 | -40.3 | -28.8 | -51.9 | -- | -
| Qwen-VL-Chat | -9.6B | -61.5 | -62.6 | -488 | -52.1 | -1860.0 | -60.6 | -56.7 | -37.0 | -33.8 | -67.7 | -56.2 / 80.0 | -
| Yi-VL-34B | -34B | -43.4* | -16.9* | -290 | -52.6 | -2050.2 | -71.1 | -71.4 | -45.1 | -30.7 | -62.3 | -- | -
| DeepSeek-VL-7B | -7.3B | -64.7* | -47.0* | -435 | -55.6 | -1765.4 | -74.1 | -72.8 | -38.3 | -36.8 | -77.8 | -- | -
| TextMonkey | -9.7B | -64.3 | -66.7 | -558 | -- | -- | -- | -- | -- | -- | -- | -- | -
| CogVLM-Chat | -17.4B | -70.4 | -33.3* | -590 | -52.5 | -1736.6 | -63.7 | -53.8 | -37.3 | -34.7 | -73.9 | -73.6 / 87.4 | -
| Open-source models 1B~3B | -||||||||||||
| DeepSeek-VL-1.3B | -1.7B | -58.4* | -37.9* | -413 | -46.0 | -1531.6 | -64.0 | -61.2 | -33.8 | -29.4 | -51.1 | -- | -
| MobileVLM V2 | -3.1B | -57.5 | -19.4* | -- | -- | -1440.5(P) | -63.2 | -- | -- | -- | -- | -- | -
| Mini-Gemini | -2.2B | -56.2 | -34.2* | -- | -- | -1653.0 | -59.8 | -- | -31.7 | -- | -- | -- | -
| MiniCPM-V | -2.8B | -60.6 | -38.2 | -366 | -47.6 | -1650.2 | -67.9 | -65.3 | -38.3 | -28.9 | -51.3 | -78.4 / 88.5 | -
| MiniCPM-V 2.0 | -2.8B | -74.1 | -71.9 | -605 | -55.0 | -1808.6 | -69.6 | -68.1 | -38.2 | -38.7 | -69.2 | -85.5 / 92.2 | -
[面壁智能](https://modelbest.cn/)
+-
[清华大学自然语言处理实验室](https://nlp.csai.tsinghua.edu.cn/)
## 工作引用
diff --git a/assets/COCO_test2015_000000262144.jpg b/assets/COCO_test2015_000000262144.jpg
deleted file mode 100644
index 012f88d..0000000
Binary files a/assets/COCO_test2015_000000262144.jpg and /dev/null differ
diff --git a/assets/code.case1.gif b/assets/code.case1.gif
deleted file mode 100644
index 2218ed6..0000000
Binary files a/assets/code.case1.gif and /dev/null differ
diff --git a/assets/code.case2.gif b/assets/code.case2.gif
deleted file mode 100644
index f0a036c..0000000
Binary files a/assets/code.case2.gif and /dev/null differ
diff --git a/assets/code_interpreter.gif b/assets/code_interpreter.gif
new file mode 100644
index 0000000..98d72d1
Binary files /dev/null and b/assets/code_interpreter.gif differ
diff --git a/assets/creation.case1.png b/assets/creation.case1.png
deleted file mode 100644
index 3f6d1aa..0000000
Binary files a/assets/creation.case1.png and /dev/null differ
diff --git a/assets/creation.case2.png b/assets/creation.case2.png
deleted file mode 100644
index e4a7b1c..0000000
Binary files a/assets/creation.case2.png and /dev/null differ
diff --git a/assets/creation.case3.png b/assets/creation.case3.png
deleted file mode 100644
index 08e1eba..0000000
Binary files a/assets/creation.case3.png and /dev/null differ
diff --git a/assets/en.code.case1.gif b/assets/en.code.case1.gif
deleted file mode 100644
index 6c6e04f..0000000
Binary files a/assets/en.code.case1.gif and /dev/null differ
diff --git a/assets/en.creation.case1.png b/assets/en.creation.case1.png
deleted file mode 100644
index 2c390cc..0000000
Binary files a/assets/en.creation.case1.png and /dev/null differ
diff --git a/assets/en.creation.case2.png b/assets/en.creation.case2.png
deleted file mode 100644
index 08b72f3..0000000
Binary files a/assets/en.creation.case2.png and /dev/null differ
diff --git a/assets/en.instruction_following.case1.png b/assets/en.instruction_following.case1.png
deleted file mode 100644
index 69a6484..0000000
Binary files a/assets/en.instruction_following.case1.png and /dev/null differ
diff --git a/assets/en.math.case1.png b/assets/en.math.case1.png
deleted file mode 100644
index 4f6a0fc..0000000
Binary files a/assets/en.math.case1.png and /dev/null differ
diff --git a/assets/en.math.case2.png b/assets/en.math.case2.png
deleted file mode 100644
index 908885d..0000000
Binary files a/assets/en.math.case2.png and /dev/null differ
diff --git a/assets/en.special_char.case1.png b/assets/en.special_char.case1.png
deleted file mode 100644
index 5d80129..0000000
Binary files a/assets/en.special_char.case1.png and /dev/null differ
diff --git a/assets/en.special_char.case2.png b/assets/en.special_char.case2.png
deleted file mode 100644
index bd7a50e..0000000
Binary files a/assets/en.special_char.case2.png and /dev/null differ
diff --git a/assets/en.translation.case1.png b/assets/en.translation.case1.png
deleted file mode 100644
index adaffb4..0000000
Binary files a/assets/en.translation.case1.png and /dev/null differ
diff --git a/assets/eval_needle.jpeg b/assets/eval_needle.jpeg
new file mode 100644
index 0000000..cfb7e5f
Binary files /dev/null and b/assets/eval_needle.jpeg differ
diff --git a/assets/instruction_following.case1.png b/assets/instruction_following.case1.png
deleted file mode 100644
index 43229e9..0000000
Binary files a/assets/instruction_following.case1.png and /dev/null differ
diff --git a/assets/instruction_following.case2.png b/assets/instruction_following.case2.png
deleted file mode 100644
index 8f3c146..0000000
Binary files a/assets/instruction_following.case2.png and /dev/null differ
diff --git a/assets/knowledge.case1.png b/assets/knowledge.case1.png
deleted file mode 100644
index bbe6f7b..0000000
Binary files a/assets/knowledge.case1.png and /dev/null differ
diff --git a/assets/math.case1.png b/assets/math.case1.png
deleted file mode 100644
index 617f0b4..0000000
Binary files a/assets/math.case1.png and /dev/null differ
diff --git a/assets/math.case2.png b/assets/math.case2.png
deleted file mode 100644
index fab8b54..0000000
Binary files a/assets/math.case2.png and /dev/null differ
diff --git a/assets/minicpm_logo.png b/assets/minicpm_logo.png
new file mode 100644
index 0000000..3da1191
Binary files /dev/null and b/assets/minicpm_logo.png differ
diff --git a/assets/modelbest.png b/assets/modelbest.png
new file mode 100644
index 0000000..c5d0b86
Binary files /dev/null and b/assets/modelbest.png differ
diff --git a/assets/special_char.case1.png b/assets/special_char.case1.png
deleted file mode 100644
index 7f51de1..0000000
Binary files a/assets/special_char.case1.png and /dev/null differ
diff --git a/assets/special_char.case2.png b/assets/special_char.case2.png
deleted file mode 100644
index 43a5d6c..0000000
Binary files a/assets/special_char.case2.png and /dev/null differ
diff --git a/assets/thunlp.png b/assets/thunlp.png
new file mode 100644
index 0000000..85f5128
Binary files /dev/null and b/assets/thunlp.png differ
diff --git a/assets/translation.case1.png b/assets/translation.case1.png
deleted file mode 100644
index c2a23af..0000000
Binary files a/assets/translation.case1.png and /dev/null differ
diff --git a/assets/translation.case2.png b/assets/translation.case2.png
deleted file mode 100644
index 95dee81..0000000
Binary files a/assets/translation.case2.png and /dev/null differ
diff --git a/demo/code_interpreter.py b/demo/code_interpreter.py
new file mode 100644
index 0000000..0cea893
--- /dev/null
+++ b/demo/code_interpreter.py
@@ -0,0 +1,186 @@
+import contextlib
+import io
+import json
+import os
+import re
+import sys
+import traceback
+
+import fire
+from vllm import LLM, SamplingParams
+
+max_turns = 5
+system_prompt_template = """You are an AI Agent who is proficient in solve complicated task.
+Each step you should wirte executable code to fulfill user query. Any Response without code means the task is completed and you do not have another chance to submit code
+
+You are equipped with a codeinterpreter. You can give the code and get the execution result of your code. You should use the codeinterpreter in the following format:
+<|execute_start|>
+```python
+
+