From 4f87756c2eb3f4dff5b8abaf043d4b0864430816 Mon Sep 17 00:00:00 2001 From: TangJingqi Date: Fri, 16 Aug 2024 11:10:30 +0800 Subject: [PATCH 1/2] fix broken link --- doc/en/injection_tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/en/injection_tutorial.md b/doc/en/injection_tutorial.md index 655163e..5ebb327 100644 --- a/doc/en/injection_tutorial.md +++ b/doc/en/injection_tutorial.md @@ -165,7 +165,7 @@ Through these two rules, we place all previously unmatched layers (and their sub ## Muti-GPU If you have multiple GPUs, you can set the device for each module to different GPUs. -DeepseekV2-Chat got 60 layers, if we got 2 GPUs, we can allocate 30 layers to each GPU. Complete multi GPU rule examples [here](ktransformers/optimize/optimize_rules). +DeepseekV2-Chat got 60 layers, if we got 2 GPUs, we can allocate 30 layers to each GPU. Complete multi GPU rule examples [here](https://github.com/kvcache-ai/ktransformers/blob/main/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml).

From 4a0f1cbbfa6b945d0dfbc6116c91491256d698a9 Mon Sep 17 00:00:00 2001 From: TangJingqi Date: Fri, 16 Aug 2024 15:22:16 +0800 Subject: [PATCH 2/2] Update readme --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f0c2bae..3cafb01 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified

Our vision for KTransformers is to serve as a flexible platform for experimenting with innovative LLM inference optimizations. Please let us know if you need any other features. -

🔥 Updates

+

✨ Updates

* **Aug 15, 2024**: Update detailed [TUTORIAL](doc/en/injection_tutorial.md) for injection and multi-GPU. * **Aug 14, 2024**: Support llamfile as linear backend,