Hugging Face | ModelScope | Hugging Face | Technical Report
## Quick Links - [Introduction](#1) - [Downloading](#2) - [Benchmark](#3) - [Chinese](#3.1) - [English](#3.2) - [Code](#3.3) - [Logic](#3.4) - [Multi-modal](#3.5) - [Deployment on mobile phones](#4) - [Demo & API](#5) - [Parameter-efficient Fine-tuning](#6) - [LICENSE](#7) - [Citation](#8) - [Show Cases](#9) # Introduction # Downloading - [HuggingFace Repo]() - [ModelScope Repo]() - [XX Repo]() # Benchmark | HuggingFace | ModelScope | WiseModel | |-------------|------------|-----------| |[sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[sft-bf16](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)|[sft-bf16](https://wisemodel.cn/models/OpenBMB/miniCPM-bf16) |[sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)|[sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32)|[sft-fp32](https://wisemodel.cn/models/OpenBMB/miniCPM-dpo-fp32) |[dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16) |[dpo-fp16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16)|[dpo-fp16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp16/)|[dpo-fp16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-fp16) |[dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|[dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32)|[dpo-fp32](https://wisemodel.cn/models/OpenBMB/miniCPM-dpo-fp32) ## Multi-modal |Models|MME(P)|MMB-dev(en)|MMB-dev(zh)|MMMU-val|CMMMU-val| |-|-|-|-|-|-| |LLaVA-Phi|1335.1|59.8|/|/|/| |MobileVLM|1288.9|59.6|/|/|/| |Imp-v1|1434.0|66.5|/|/|/| |Qwen-VL-Chat|**1487**|60.6|56.7|**35.9**|30.7 |**MiniCPM-V**|1446|**67.3**|**61.9**|34.7|**32.1**| ## DPO |Models|MT-bench| |---|---| |GPT-4-turbo|9.32| |GPT-3.5-turbo|8.39| |Mistral-8*7b-Instruct-v0.1|8.30| |Claude-2.1|8.18| |Zephyr-7B-beta|7.34| |**MiniCPM-2B**|**7.25**| |Vicuna-33B|7.12| |Zephyr-7B-alpha|6.88| |LLaMA-2-70B-chat|6.86| |Mistral-7B-Instruct-v0.1|6.84| |LLaMA-2-13B-chat|6.65| |Vicuna-13B|6.57| |MPT-34B-instruct|6.39| |LLaMA-2-7B-chat|6.27| |Vicuna-7B|6.17| |MPT-7B-chat|5.42| ## Deployment on mobile phones After INT4 quantization, MiniCPM only occupies 2GB of space, meeting the requirements of inference on edge devices. We utilize the open-source framework [MLC-LLM](https://github.com/mlc-ai/mlc-llm) for deployment on Android and Harmony OS. For deployment on IOS, we adapt MiniCPM using [LLMFarm](https://github.com/guinmoon/LLMFarm). We select some mobile phones for testing respectively. ### Tutorial #### Android [Compilation and installation on Android](https://github.com/OpenBMB/mlc-MiniCPM/blob/main/README.md) #### IOS [Compilation and installation on IOS](https://github.com/OpenBMB/LLMFarm) #### Multimodal ### Performance Instead of conducting in-depth optimization for deployment on mobile phones, we only verify the feasibility of MiniCPM using mobile chips for inference. **We welcome more developers to continuously improve the inference performance of LLMs on mobile phones and update the test results below.** |Mobile Phones|OS|Processor|Memory(GB)|Inference Throughput(token/s)| |-|-|-|-|-| |OPPO Find N3|Android 13|snapdragon 8 Gen2|12|6.5| |Samsung S23 Ultra|Android 14|snapdragon 8 Gen2|12|6.4| |Meizu M182Q|Android 11|snapdragon 888Plus|8|3.7| |Xiaomi 12 Pro|Android 13|snapdragon 8 Gen1|8+3|3.7| |Xiaomi Redmi K40|Android 11|snapdragon 870|8|3.5| |Oneplus LE 2100|Android 13|snapdragon 870|12|3.5| |Oneplus HD1900|Android 11|snapdragon 865|8|3.2| |Oneplus HD1900|Android 11|snapdragon 855|8|3.0| |Oneplus HD1905|Android 10|snapdragon 855|8|3.0| |Oneplus HD1900|Android 11|snapdragon 855|8|3.0| |Xiaomi MI 8|Android 9|snapdragon 845|6|2.3| |Huawei Nova 11SE|Harmony 4.0.0|snapdragon 778|12|1.9| |Xiaomi MIX 2|Android 9|snapdragon 835|6|1.3| |iPhone 15 Pro|iOS 17.2.1|A16|8|18.0| |iPhone 15|iOS 17.2.1|A16|6|15.0| |iPhone 12 Pro|iOS 16.5.1|A14|6|5.8| |iPhone 12|iOS 17.2.1|A14|4|5.8| |iPhone 11|iOS 16.6|A13|4|4.6| ## Demo & API #### Web-demo based on Gradio Launch gradio-based demo using the following command: ```shell python demo/gradio_based_demo.py ``` #### Inference with vLLM (Recommended!) * Install vLLM supporting MiniCPM - vLLM 0.2.2 is adapted to MiniCPM in `inference/vllm`. More vLLM versions will be supported in the future ```shell pip install inference/vllm ``` * Transfer Huggingface Transformers repo to vLLM-MiniCPM repo, where `