mlx inference

2026-01-19 12:53:36 +08:00 · 2024-03-26 21:07:54 +08:00 · 2024-03-26 21:07:54 +08:00 · 9e1438682e
commit 9e1438682e
parent a1013b1ad2
4 changed files with 63 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -2,3 +2,7 @@
 *.pyc
 finetune/output/*
 wip.*
+.idea
+venv
+.venv
+.env
--- a/README.md
+++ b/README.md
@ -488,6 +488,12 @@ python demo/vllm_based_demo.py --model_path <vllmcpm_repo_path>
 python demo/hf_based_demo.py --model_path <hf_repo_path>
 ```

+#### 使用如下命令启动基于 Mac mlx 加速框架推理
+
+你需要安装 `mlx_lm` 库，并且，你需要下载对应的转换后的专用模型权重[MiniCPM-2B-sft-bf16-llama-format-mlx](https://huggingface.co/mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx)，然后运行以下命令：
+```shell
+python -m mlx_lm.generate --model mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx --prompt "hello, tell me a joke." --trust-remote-code
+```

 <p id="6"></p>

--- a/demo/mlx_based_demo.py
+++ b/demo/mlx_based_demo.py
@ -0,0 +1,42 @@
+"""
+使用 MLX 快速推理 MiniCPM
+
+如果你使用 Mac 设备进行推理，可以直接使用MLX进行推理。
+由于 MiniCPM 暂时不支持 mlx 格式转换。您可以下载由 MLX 社群转换好的模型 [MiniCPM-2B-sft-bf16-llama-format-mlx](https://huggingface.co/mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx)。
+
+并安装对应的依赖包
+
+
+```bash
+pip install mlx-lm
+```
+
+这是一个简单的推理代码，使用 Mac 设备推理 MiniCPM-2
+```python
+python -m mlx_lm.generate --model mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx --prompt "hello, tell me a joke." --trust-remote-code
+```
+
+"""
+
+from mlx_lm import load, generate
+from jinja2 import Template
+
+def chat_with_model():
+    model, tokenizer = load("mlx-community/MiniCPM-2B-sft-bf16-llama-format-mlx")
+    print("Model loaded. Start chatting! (Type 'quit' to stop)")
+
+    messages = []
+    chat_template = Template(
+        "{% for message in messages %}{% if message['role'] == 'user' %}{{'<用户>' + message['content'].strip() + '<AI>'}}{% else %}{{message['content'].strip()}}{% endif %}{% endfor %}")
+
+    while True:
+        user_input = input("You: ")
+        if user_input.lower() == 'quit':
+            break
+        messages.append({"role": "user", "content": user_input})
+        response = generate(model, tokenizer, prompt=chat_template.render(messages=messages), verbose=True)
+        print("Model:", response)
+        messages.append({"role": "ai", "content": response})
+
+
+chat_with_model()
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,11 @@
+transformers>=4.38.2
+torch>=2.0.0
+triton>=2.2.0
+httpx>=0.27.0
+gradio>=4.21.0
+flash_attn>=2.4.1
+accelerate>=0.28.0
+sentence_transformers>=2.6.0
+sse_starlette>=2.0.0
+tiktoken>=0.6.0
+mlx_lm>=0.5.0