Commit Graph

  • 494469d4c5
    Merge pull request #722 from ZhangShuaiyi/remove_unused main Atream 2025-02-28 15:04:21 +08:00
  • 1264f9407b
    Merge pull request #732 from KMSorSMS/main ZiWei Yuan 2025-02-28 11:28:06 +08:00
  • a0e7afa432 fox docker build liam 2025-02-28 11:25:34 +08:00
  • add415124f
    Merge pull request #731 from Azure-Tang/update-template Azure 2025-02-28 11:19:52 +08:00
  • bc52969918 fix name Azure 2025-02-28 03:17:33 +00:00
  • 0439cb36d4
    Merge pull request #730 from Azure-Tang/update-template Azure 2025-02-28 11:10:29 +08:00
  • 31b01f5b99 update ZH/EN template Azure 2025-02-28 03:09:06 +00:00
  • a34a25d5cc Delete unused code Shuaiyi 2025-02-27 13:18:19 +00:00
  • 7a19f3b781
    Merge pull request #721 from kvcache-ai/fix_temperature wang jiahao 2025-02-27 21:01:21 +08:00
  • 22df52e94e fix temperature qiyuxinlin 2025-02-27 21:00:44 +08:00
  • 85e2cc7bf4
    Merge pull request #719 from kvcache-ai/fix-use-generation-json Atream 2025-02-27 19:49:41 +08:00
  • e645d84794 use generation config from json file in official repo Atream 2025-02-27 11:48:34 +00:00
  • 5e3c6b4f97
    Merge pull request #644 from wtdcode/temperature_top_p_from_request wang jiahao 2025-02-27 18:13:13 +08:00
  • b121ca4df8
    Fix according to upstream changes lazymio 2025-02-27 18:11:35 +08:00
  • 26f7b4af11
    Merge branch 'main' into temperature_top_p_from_request wang jiahao 2025-02-27 18:08:55 +08:00
  • 1f28f75f55
    Merge pull request #717 from kvcache-ai/issue-template Azure 2025-02-27 18:02:34 +08:00
  • c61805dd0a
    Update issue templates Azure 2025-02-27 17:53:27 +08:00
  • 50c691297f
    Merge pull request #622 from akemimadoka/fix-msvc Atream 2025-02-27 17:42:00 +08:00
  • 0422152cf3
    Merge pull request #670 from akemimadoka/fix-win Atream 2025-02-27 17:40:27 +08:00
  • 798e1d0cfa
    Merge pull request #532 from xv44586/fix-sse-formatting Atream 2025-02-27 12:19:23 +08:00
  • f403cde6d4
    Merge pull request #650 from ceerRep/main Atream 2025-02-27 12:16:53 +08:00
  • 1d5d5faef6
    Merge pull request #626 from cyhasuka/main Atream 2025-02-27 12:13:10 +08:00
  • 8db6a4d402
    Merge branch 'main' into main Atream 2025-02-27 12:12:32 +08:00
  • 3c8c580580
    Merge pull request #691 from swu-hyk/ollama_api_chat wang jiahao 2025-02-27 11:17:48 +08:00
  • ca93cf7548
    Merge pull request #702 from Azure-Tang/update-readme Azure 2025-02-26 23:45:24 +08:00
  • c05ebb74b1 Update fp8 doc; Update install.md broken link Azure 2025-02-26 15:43:08 +00:00
  • 3ebe17eb63
    Merge pull request #699 from kvcache-ai/Atream-patch-1 Atream 2025-02-26 22:04:45 +08:00
  • 369f4d917d
    Update DeepseekR1_V3_tutorial.md Atream 2025-02-26 22:04:29 +08:00
  • 9650893adc
    Merge pull request #697 from kvcache-ai/fix-yaml Atream 2025-02-26 21:54:01 +08:00
  • 90eb87b3fc
    Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml Atream 2025-02-26 21:53:50 +08:00
  • ec7e912fee modify swu-hyk 2025-02-26 19:21:30 +08:00
  • 68e7df3a25 implementation of chat routing for Ollama swu-hyk 2025-02-26 17:05:00 +08:00
  • 9660b2cc1e
    Merge pull request #685 from vproxy-tools/main Chen Hongtao 2025-02-26 15:35:19 +08:00
  • e7ebb26370
    Merge pull request #684 from KMSorSMS/main ZiWei Yuan 2025-02-26 15:06:51 +08:00
  • ffb86c66e3 fix experts torch liam 2025-02-26 15:04:25 +08:00
  • de082f141c fix cd error liam 2025-02-26 14:54:47 +08:00
  • b2bff17775 fix numa cpu distribution wkgcass 2025-02-26 14:48:22 +08:00
  • 8817777e11 Fix RuntimeError on Windows caused by integer overflow in np.prod akemimadoka 2025-02-26 03:50:12 +08:00
  • 99f6e42113
    Merge pull request #668 from KMSorSMS/main Azure 2025-02-26 00:21:09 +08:00
  • 3ad12751cf 📝 update more detail and fix typo liam 2025-02-26 00:17:02 +08:00
  • 31bc990677
    Merge pull request #667 from Azure-Tang/update-readme Azure 2025-02-26 00:01:46 +08:00
  • 05339ad0ef 📝 update benchmark.md liam 2025-02-25 23:56:19 +08:00
  • bb6920ed72 update doc Azure 2025-02-25 15:46:15 +00:00
  • 9c71bcb0bb
    Merge pull request #665 from KMSorSMS/v0.2.2rc1 ZiWei Yuan 2025-02-25 22:07:19 +08:00
  • ddf3339339 release v0.2.2rc1 liam 2025-02-25 22:06:36 +08:00
  • 8333a4d874
    Merge pull request #663 from kvcache-ai/develop-0.2.2 Azure 2025-02-25 21:47:36 +08:00
  • c6e4e1c3c5
    Merge pull request #662 from Azure-Tang/support-fp8 Azure 2025-02-25 21:45:19 +08:00
  • 91c1619296 Merge branch 'develop-0.2.2' into support-fp8 Update README.md Azure 2025-02-25 13:36:21 +00:00
  • 13974eb264
    Update DeepseekR1_V3_tutorial.md Atream 2025-02-25 21:36:52 +08:00
  • 03f8bc9f79
    Update DeepseekR1_V3_tutorial.md add long context Atream 2025-02-25 21:35:31 +08:00
  • 2c0cce90d0 add fp8 multi gpu yaml example Azure 2025-02-25 13:32:09 +00:00
  • d9b2895bd3 Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2 Atream 2025-02-25 12:47:48 +00:00
  • 477ac28a9c fix-update-flashinfer_wrapper_local_chat Atream 2025-02-25 12:47:31 +00:00
  • 7e5962af3d fix fp8 multi gpu; update FQA Azure 2025-02-25 10:52:29 +00:00
  • 89b55052b8
    Merge pull request #659 from KMSorSMS/develop-0.2.2 ZiWei Yuan 2025-02-25 17:47:05 +08:00
  • 1b5ac67fca 📝 add benchmark.md liam 2025-02-25 17:45:17 +08:00
  • 1aa10e93b3
    Merge pull request #658 from KMSorSMS/develop-0.2.2 ZiWei Yuan 2025-02-25 17:22:34 +08:00
  • 0ca0b99fab update git ignore add docker dev container liam 2025-02-25 17:19:19 +08:00
  • 5474be5299 Merge branch 'main' into develop-0.2.2 Azure 2025-02-25 09:04:22 +00:00
  • 021822dd01 update FAQ Azure 2025-02-25 09:02:32 +00:00
  • b443c7dfa2
    Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Atream 2025-02-25 16:53:21 +08:00
  • f4c198bd42 support absorb for prefill long context Atream 2025-02-25 08:52:02 +00:00
  • 050b745a6e
    Merge pull request #643 from Azure-Tang/support-fp8 Azure 2025-02-25 16:22:12 +08:00
  • 36fbeee341 Update doc Azure 2025-02-25 08:21:18 +00:00
  • f639fbc19e feat: basic api key support ceerrep 2025-02-25 14:11:39 +08:00
  • 4dc5518e4d update fp8 kernel tutorial Azure 2025-02-24 15:37:01 +00:00
  • 7b2a6690ab
    Merge pull request #608 from makllama/fix_musa_ext Atream 2025-02-24 23:12:54 +08:00
  • 6f9ea689a9
    Merge pull request #645 from makllama/torch2.2 Atream 2025-02-24 23:12:33 +08:00
  • f88c05a6f1 Ensure backward compatibility with Torch 2.2 Xiaodong Ye 2025-02-24 21:55:30 +08:00
  • 07eb712a73
    Left out lazymio 2025-02-24 21:51:14 +08:00
  • 91062a834f
    Default values lazymio 2025-02-24 21:38:01 +08:00
  • 76487c4dcb
    Revert repetition_penalty as it is not in API spec lazymio 2025-02-24 21:30:03 +08:00
  • 05ad288453
    Also /chat/completions lazymio 2025-02-24 21:08:36 +08:00
  • bf36547f98
    Also allow repetition_penalty lazymio 2025-02-24 21:07:35 +08:00
  • 8704c09192
    Allow temperature and top_p from requests lazymio 2025-02-24 21:01:33 +08:00
  • ca7366d2db Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8 Azure 2025-02-24 11:58:10 +00:00
  • 581a524f65 Add data loader to read special weights for fp8; Add special weight process script Azure 2025-02-24 11:16:23 +00:00
  • e9b1216a9a Merge branch 'main' into feat-absorb-for-long-prefill Atream 2025-02-24 09:44:17 +00:00
  • 4b5991e77e
    Merge pull request #638 from kvcache-ai/feat-moonlight Atream 2025-02-24 17:32:05 +08:00
  • f327695079 fix KExpertsMarlin on GPU with out CUDA Graph Atream 2025-02-24 09:30:54 +00:00
  • cea07d1998
    Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM Yuhao Tsui 2025-02-24 10:09:42 +08:00
  • 706e69f4fc Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC akemimadoka 2025-02-24 01:37:50 +08:00
  • eb039b723d
    Merge pull request #621 from kvcache-ai/feat-moonlight Atream 2025-02-23 22:39:08 +08:00
  • f5f6c6b95d update yaml Atream 2025-02-23 14:33:58 +00:00
  • e8e02e5ccc support Moonlight Atream 2025-02-23 14:21:18 +00:00
  • 95d937c51d tmp DDong Jianwei 2025-02-23 18:51:42 +08:00
  • 006e8c6abc remove causal mask Atream 2025-02-23 07:40:47 +00:00
  • cdb6f896bb
    Merge pull request #612 from kvcache-ai/fix-bf16-load Atream 2025-02-23 15:37:23 +08:00
  • 036ae25a89
    fix bf16 load, TODO: refactor cpu dequant Atream 2025-02-23 15:37:09 +08:00
  • 18b1d18367 musa: support bf16 Xiaodong Ye 2025-02-23 10:19:19 +08:00
  • 7b7c6a657d Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225' Azure 2025-02-22 13:05:08 +00:00
  • 94ab2de3b9
    Merge pull request #523 from miaooo0000OOOO/main Atream 2025-02-22 17:38:18 +08:00
  • 72d09f3f6e
    Merge pull request #597 from kvcache-ai/feat-more-context Atream 2025-02-22 17:17:09 +08:00
  • f7f1059873 fix merge bug, this branch also padding Marlin Atream 2025-02-22 09:00:09 +00:00
  • e90896314c
    Merge pull request #577 from JiamingMai/dev Atream 2025-02-22 16:45:41 +08:00
  • 954796123c
    Merge pull request #582 from twobob/patch-1 Atream 2025-02-22 16:44:48 +08:00
  • 024009675e Merge branch 'main' into feat-more-context Atream 2025-02-22 06:17:39 +00:00
  • 5ec33d046d optimize gguf dequant, save mem, support Q2_K use marlin for lm_head, lm_head only calc last token for prefill extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM Atream 2025-02-22 06:13:01 +00:00
  • 5ed441a0f5
    Update README.md _ 2025-02-21 14:15:50 +00:00
  • 45faddf668 fix the link addresses JiamingMai 2025-02-21 17:53:20 +08:00