mirror of
https://github.com/RYDE-WORK/ktransformers.git
synced 2026-02-06 22:55:50 +08:00
commit
3c6035aa8a
@ -9,7 +9,7 @@
|
|||||||
- [Why KTransformers So Fast](en/deepseek-v2-injection.md)
|
- [Why KTransformers So Fast](en/deepseek-v2-injection.md)
|
||||||
- [Injection Tutorial](en/injection_tutorial.md)
|
- [Injection Tutorial](en/injection_tutorial.md)
|
||||||
- [Multi-GPU Tutorial](en/multi-gpu-tutorial.md)
|
- [Multi-GPU Tutorial](en/multi-gpu-tutorial.md)
|
||||||
# Server(Temperary Deprected)
|
# Server (Temporary Deprecated)
|
||||||
- [Server](en/api/server/server.md)
|
- [Server](en/api/server/server.md)
|
||||||
- [Website](en/api/server/website.md)
|
- [Website](en/api/server/website.md)
|
||||||
- [Tabby](en/api/server/tabby.md)
|
- [Tabby](en/api/server/tabby.md)
|
||||||
|
|||||||
@ -83,7 +83,7 @@ Memory: standard DDR5-4800 server DRAM (1 TB), each socket with 8×DDR5-4800
|
|||||||
#### Change Log
|
#### Change Log
|
||||||
- Longer Context (from 4K to 8K for 24GB VRAM) and Slightly Faster Speed (+15%):<br>
|
- Longer Context (from 4K to 8K for 24GB VRAM) and Slightly Faster Speed (+15%):<br>
|
||||||
Integrated the highly efficient Triton MLA Kernel from the fantastic sglang project, enable much longer context length and slightly faster prefill/decode speed
|
Integrated the highly efficient Triton MLA Kernel from the fantastic sglang project, enable much longer context length and slightly faster prefill/decode speed
|
||||||
- We suspect the impressive improvement comes from the change of hardwre platform (4090D->4090)
|
- We suspect that some of the improvements come from the change of hardwre platform (4090D->4090)
|
||||||
#### Benchmark Results
|
#### Benchmark Results
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user