mirror of
https://github.com/RYDE-WORK/ktransformers.git
synced 2026-01-19 21:03:18 +08:00
⚡ update v0.3 preview
This commit is contained in:
parent
6dd4fa0e87
commit
fd481af193
@ -23,7 +23,7 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin
|
||||
|
||||
<h2 id="Updates">🔥 Updates</h2>
|
||||
|
||||
* **Fed 10, 2025**: Support DeepseekR1 and V3 on single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~64X speedup. The Detailed tutorial is [here](./doc/en/DeepseekR1_V3_tutorial.md)
|
||||
* **Fed 10, 2025**: Support DeepseekR1 and V3 on single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~64x speedup. The Detailed tutorial is [here](./doc/en/DeepseekR1_V3_tutorial.md)
|
||||
* **Aug 28, 2024**: Support 1M context under the InternLM2.5-7B-Chat-1M model, utilizing 24GB of VRAM and 150GB of DRAM. The detailed tutorial is [here](./doc/en/long_context_tutorial.md).
|
||||
* **Aug 28, 2024**: Decrease DeepseekV2's required VRAM from 21G to 11G.
|
||||
* **Aug 15, 2024**: Update detailed [TUTORIAL](doc/en/injection_tutorial.md) for injection and multi-GPU.
|
||||
|
||||
@ -47,6 +47,12 @@ The main acceleration comes from
|
||||
- Intel AMX instruction set and our specially designed cache friendly memory layout
|
||||
- Expert selection strategy that selects fewer experts based on offline profile results of out of domain data
|
||||
|
||||
|
||||
*From our research on DeepSeekV2, DeepSeekV3 and DeepSeekR1,
|
||||
when we slightly decrease the activation experts num in inference,
|
||||
the output quality doesn't change,But the speed of decoding and prefill
|
||||
is speed up which is inspiring. So our showcase makes use of this finding*
|
||||
|
||||
## how to run
|
||||
### v0.2 showcase
|
||||
#### single socket version(32 cores)
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user