ktransformers

mirror of https://github.com/RYDE-WORK/ktransformers.git synced 2026-07-22 04:31:37 +08:00

Author	SHA1	Message	Date
liam	ddf3339339	⚡ release v0.2.2rc1	2025-02-25 22:06:36 +08:00
Azure	8333a4d874	Merge pull request #663 from kvcache-ai/develop-0.2.2 [release] Release 0.2.2rc.	2025-02-25 21:47:36 +08:00
Azure	c6e4e1c3c5	Merge pull request #662 from Azure-Tang/support-fp8 [update] Update readme.	2025-02-25 21:45:19 +08:00
Azure	91c1619296	Merge branch 'develop-0.2.2' into support-fp8 Update README.md	2025-02-25 13:43:26 +00:00
Atream	13974eb264	Update DeepseekR1_V3_tutorial.md	2025-02-25 21:36:52 +08:00
Atream	03f8bc9f79	Update DeepseekR1_V3_tutorial.md add long context	2025-02-25 21:35:31 +08:00
Azure	2c0cce90d0	add fp8 multi gpu yaml example	2025-02-25 13:32:09 +00:00
Atream	d9b2895bd3	Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2	2025-02-25 12:47:48 +00:00
Atream	477ac28a9c	fix-update-flashinfer_wrapper_local_chat	2025-02-25 12:47:31 +00:00
Azure	7e5962af3d	fix fp8 multi gpu; update FQA	2025-02-25 10:52:29 +00:00
ZiWei Yuan	89b55052b8	Merge pull request #659 from KMSorSMS/develop-0.2.2 📝 add benchmark.md	2025-02-25 17:47:05 +08:00
liam	1b5ac67fca	📝 add benchmark.md	2025-02-25 17:45:17 +08:00
ZiWei Yuan	1aa10e93b3	Merge pull request #658 from KMSorSMS/develop-0.2.2 ⚡ update git ignore add docker dev container	2025-02-25 17:22:34 +08:00
liam	0ca0b99fab	⚡ update git ignore add docker dev container	2025-02-25 17:22:11 +08:00
Azure	5474be5299	Merge branch 'main' into develop-0.2.2	2025-02-25 09:04:22 +00:00
Azure	021822dd01	update FAQ	2025-02-25 09:02:32 +00:00
Atream	b443c7dfa2	Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Feat absorb for long prefill	2025-02-25 16:53:21 +08:00
Atream	f4c198bd42	support absorb for prefill long context	2025-02-25 08:52:02 +00:00
Azure	050b745a6e	Merge pull request #643 from Azure-Tang/support-fp8 [feat] Support fp8 linear kernel;	2025-02-25 16:22:12 +08:00
Azure	36fbeee341	Update doc	2025-02-25 08:21:18 +00:00
Azure	4dc5518e4d	update fp8 kernel tutorial	2025-02-24 15:37:01 +00:00
Atream	7b2a6690ab	Merge pull request #608 from makllama/fix_musa_ext musa: support bf16	2025-02-24 23:12:54 +08:00
Atream	6f9ea689a9	Merge pull request #645 from makllama/torch2.2 Ensure backward compatibility with PyTorch 2.2	2025-02-24 23:12:33 +08:00
Xiaodong Ye	f88c05a6f1	Ensure backward compatibility with Torch 2.2 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-24 21:55:30 +08:00
Azure	ca7366d2db	Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8	2025-02-24 11:58:10 +00:00
Azure	581a524f65	Add data loader to read special weights for fp8; Add special weight process script	2025-02-24 11:34:17 +00:00
Atream	e9b1216a9a	Merge branch 'main' into feat-absorb-for-long-prefill	2025-02-24 09:44:17 +00:00
Atream	4b5991e77e	Merge pull request #638 from kvcache-ai/feat-moonlight fix KExpertsMarlin on GPU with out CUDA Graph	2025-02-24 17:32:05 +08:00
Atream	f327695079	fix KExpertsMarlin on GPU with out CUDA Graph	2025-02-24 09:30:54 +00:00
Atream	eb039b723d	Merge pull request #621 from kvcache-ai/feat-moonlight support moonlight, use ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml	2025-02-23 22:39:08 +08:00
Atream	f5f6c6b95d	update yaml	2025-02-23 14:33:58 +00:00
Atream	e8e02e5ccc	support Moonlight	2025-02-23 14:21:18 +00:00
DDong Jianwei	95d937c51d	tmp	2025-02-23 18:51:42 +08:00
Atream	006e8c6abc	remove causal mask	2025-02-23 07:40:47 +00:00
Atream	cdb6f896bb	Merge pull request #612 from kvcache-ai/fix-bf16-load fix bf16 load, TODO: refactor cpu dequant	2025-02-23 15:37:23 +08:00
Atream	036ae25a89	fix bf16 load, TODO: refactor cpu dequant	2025-02-23 15:37:09 +08:00
Xiaodong Ye	18b1d18367	musa: support bf16 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-23 10:19:19 +08:00
Azure	7b7c6a657d	Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225 '	2025-02-22 13:05:08 +00:00
Atream	94ab2de3b9	Merge pull request #523 from miaooo0000OOOO/main optimize CMake multi core parallel	2025-02-22 17:38:18 +08:00
Atream	72d09f3f6e	Merge pull request #597 from kvcache-ai/feat-more-context Feat more context	2025-02-22 17:17:09 +08:00
Atream	f7f1059873	fix merge bug, this branch also padding Marlin	2025-02-22 09:00:09 +00:00
Atream	e90896314c	Merge pull request #577 from JiamingMai/dev Fix the link address in the doc install.md	2025-02-22 16:45:41 +08:00
Atream	954796123c	Merge pull request #582 from twobob/patch-1 Adjust the installation link to the correct section of docs	2025-02-22 16:44:48 +08:00
Atream	024009675e	Merge branch 'main' into feat-more-context	2025-02-22 06:17:39 +00:00
Atream	5ec33d046d	optimize gguf dequant, save mem, support Q2_K use marlin for lm_head, lm_head only calc last token for prefill extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM	2025-02-22 06:13:01 +00:00
_	5ed441a0f5	Update README.md	2025-02-21 14:15:50 +00:00
JiamingMai	45faddf668	fix the link addresses	2025-02-21 17:53:20 +08:00
Atream	7e1fe256c8	optimize GPU	2025-02-21 05:06:57 +00:00
Azure	25c5bddd08	Merge pull request #506 from makllama/musa feat: Support Moore Threads GPU	2025-02-20 22:50:31 +08:00
ZiWei Yuan	1dd84b4a5b	Merge pull request #550 from kvcache-ai/docker_dev Docker dev	2025-02-20 22:29:56 +08:00

1 2 3 4 5 ...

346 Commits