346 Commits

Author SHA1 Message Date
liam
ddf3339339 release v0.2.2rc1 2025-02-25 22:06:36 +08:00
Azure
8333a4d874
Merge pull request #663 from kvcache-ai/develop-0.2.2
[release]  Release 0.2.2rc.
2025-02-25 21:47:36 +08:00
Azure
c6e4e1c3c5
Merge pull request #662 from Azure-Tang/support-fp8
[update] Update readme.
2025-02-25 21:45:19 +08:00
Azure
91c1619296 Merge branch 'develop-0.2.2' into support-fp8
Update README.md
2025-02-25 13:43:26 +00:00
Atream
13974eb264
Update DeepseekR1_V3_tutorial.md 2025-02-25 21:36:52 +08:00
Atream
03f8bc9f79
Update DeepseekR1_V3_tutorial.md add long context 2025-02-25 21:35:31 +08:00
Azure
2c0cce90d0 add fp8 multi gpu yaml example 2025-02-25 13:32:09 +00:00
Atream
d9b2895bd3 Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2 2025-02-25 12:47:48 +00:00
Atream
477ac28a9c fix-update-flashinfer_wrapper_local_chat 2025-02-25 12:47:31 +00:00
Azure
7e5962af3d fix fp8 multi gpu; update FQA 2025-02-25 10:52:29 +00:00
ZiWei Yuan
89b55052b8
Merge pull request #659 from KMSorSMS/develop-0.2.2
📝 add benchmark.md
2025-02-25 17:47:05 +08:00
liam
1b5ac67fca 📝 add benchmark.md 2025-02-25 17:45:17 +08:00
ZiWei Yuan
1aa10e93b3
Merge pull request #658 from KMSorSMS/develop-0.2.2
 update git ignore add docker dev container
2025-02-25 17:22:34 +08:00
liam
0ca0b99fab update git ignore add docker dev container 2025-02-25 17:22:11 +08:00
Azure
5474be5299 Merge branch 'main' into develop-0.2.2 2025-02-25 09:04:22 +00:00
Azure
021822dd01 update FAQ 2025-02-25 09:02:32 +00:00
Atream
b443c7dfa2
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
Feat absorb for long prefill
2025-02-25 16:53:21 +08:00
Atream
f4c198bd42 support absorb for prefill long context 2025-02-25 08:52:02 +00:00
Azure
050b745a6e
Merge pull request #643 from Azure-Tang/support-fp8
[feat] Support fp8 linear kernel;
2025-02-25 16:22:12 +08:00
Azure
36fbeee341 Update doc 2025-02-25 08:21:18 +00:00
Azure
4dc5518e4d update fp8 kernel tutorial 2025-02-24 15:37:01 +00:00
Atream
7b2a6690ab
Merge pull request #608 from makllama/fix_musa_ext
musa: support bf16
2025-02-24 23:12:54 +08:00
Atream
6f9ea689a9
Merge pull request #645 from makllama/torch2.2
Ensure backward compatibility with PyTorch 2.2
2025-02-24 23:12:33 +08:00
Xiaodong Ye
f88c05a6f1 Ensure backward compatibility with Torch 2.2
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-24 21:55:30 +08:00
Azure
ca7366d2db Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8 2025-02-24 11:58:10 +00:00
Azure
581a524f65 Add data loader to read special weights for fp8; Add special weight process script 2025-02-24 11:34:17 +00:00
Atream
e9b1216a9a Merge branch 'main' into feat-absorb-for-long-prefill 2025-02-24 09:44:17 +00:00
Atream
4b5991e77e
Merge pull request #638 from kvcache-ai/feat-moonlight
fix KExpertsMarlin on GPU with out CUDA Graph
2025-02-24 17:32:05 +08:00
Atream
f327695079 fix KExpertsMarlin on GPU with out CUDA Graph 2025-02-24 09:30:54 +00:00
Atream
eb039b723d
Merge pull request #621 from kvcache-ai/feat-moonlight
support moonlight, use ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
2025-02-23 22:39:08 +08:00
Atream
f5f6c6b95d update yaml 2025-02-23 14:33:58 +00:00
Atream
e8e02e5ccc support Moonlight 2025-02-23 14:21:18 +00:00
DDong Jianwei
95d937c51d tmp 2025-02-23 18:51:42 +08:00
Atream
006e8c6abc remove causal mask 2025-02-23 07:40:47 +00:00
Atream
cdb6f896bb
Merge pull request #612 from kvcache-ai/fix-bf16-load
fix bf16 load, TODO: refactor cpu dequant
2025-02-23 15:37:23 +08:00
Atream
036ae25a89
fix bf16 load, TODO: refactor cpu dequant 2025-02-23 15:37:09 +08:00
Xiaodong Ye
18b1d18367 musa: support bf16
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-23 10:19:19 +08:00
Azure
7b7c6a657d Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225' 2025-02-22 13:05:08 +00:00
Atream
94ab2de3b9
Merge pull request #523 from miaooo0000OOOO/main
optimize CMake multi core parallel
2025-02-22 17:38:18 +08:00
Atream
72d09f3f6e
Merge pull request #597 from kvcache-ai/feat-more-context
Feat more context
2025-02-22 17:17:09 +08:00
Atream
f7f1059873 fix merge bug, this branch also padding Marlin 2025-02-22 09:00:09 +00:00
Atream
e90896314c
Merge pull request #577 from JiamingMai/dev
Fix the link address in the doc install.md
2025-02-22 16:45:41 +08:00
Atream
954796123c
Merge pull request #582 from twobob/patch-1
Adjust the installation link to the correct section of docs
2025-02-22 16:44:48 +08:00
Atream
024009675e Merge branch 'main' into feat-more-context 2025-02-22 06:17:39 +00:00
Atream
5ec33d046d optimize gguf dequant, save mem, support Q2_K
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
2025-02-22 06:13:01 +00:00
_
5ed441a0f5
Update README.md 2025-02-21 14:15:50 +00:00
JiamingMai
45faddf668 fix the link addresses 2025-02-21 17:53:20 +08:00
Atream
7e1fe256c8 optimize GPU 2025-02-21 05:06:57 +00:00
Azure
25c5bddd08
Merge pull request #506 from makllama/musa
feat: Support Moore Threads GPU
2025-02-20 22:50:31 +08:00
ZiWei Yuan
1dd84b4a5b
Merge pull request #550 from kvcache-ai/docker_dev
Docker dev
2025-02-20 22:29:56 +08:00