Commit Graph

  • 5a50b34627 fix hard coding caused by rope dim calculation, load from config now Azure 2025-01-31 15:25:50 +00:00
  • 476b1d8dc6 support deepseekv3; runable but have precition problem Azure 2025-01-31 08:27:24 +00:00
  • de7e892f72
    Merge pull request #115 from KMSorSMS/main UnicornChan 2024-11-14 20:07:12 +08:00
  • 04cebec4bb rm opt config path default value and fix some config logic bug liam 2024-11-14 19:32:28 +08:00
  • dddc42038d
    Merge pull request #110 from KMSorSMS/main UnicornChan 2024-11-06 09:44:12 +08:00
  • 28ae7e24ed : finish patch liam 2024-11-04 14:04:17 +08:00
  • c2b4dc805c 🚑️:roll back transformer.py and find that it's multiple chat hsitory have minor accurate error liam 2024-11-01 11:01:30 +08:00
  • a148da2cfe : rm sensitive info in config.yaml, add readme of makefile. support old model_path config liam 2024-10-31 21:28:17 +08:00
  • 9a2e7057c8 wjh fix change anyanqilin 2024-10-30 10:50:25 +00:00
  • a72dc6ed15 wjh change anyanqilin 2024-10-30 10:16:45 +00:00
  • 2d67016d14 wjh-change anyanqilin 2024-10-29 03:35:05 +00:00
  • 7c94df4bcf 🚑️: back transformer.py bugs version, and fix typo error in local_chat.py liam 2024-10-28 21:09:40 +08:00
  • dd1d8667f3 : refactor local_chat and fix message slice bug in server liam 2024-10-21 22:49:05 +08:00
  • a6a1cc054f
    Merge pull request #106 from Azure-Tang/main UnicornChan 2024-10-30 16:36:59 +08:00
  • c7d62a67db update supported models TangJingqi 2024-10-28 14:37:10 +08:00
  • d8ddaf0ea0 fix readme TangJingqi 2024-10-28 14:21:12 +08:00
  • 43fc7f44a6
    Merge pull request #99 from chenht2022/main Chen Hongtao 2024-10-09 19:09:58 +08:00
  • 14869b55ad Adapt Windows chenht2022 2024-10-09 11:08:32 +00:00
  • a81a7ffe21
    Merge pull request #77 from TKONIY/fix-prefill-and-generate UnicornChan 2024-10-09 19:04:27 +08:00
  • b4904537e3
    Merge pull request #83 from sayap/task-queue-cond-var Chen Hongtao 2024-10-09 18:57:17 +08:00
  • 43e8848ddd
    Merge pull request #86 from xhedit/main UnicornChan 2024-09-19 10:37:55 +08:00
  • 49539ac424
    Merge pull request #88 from Azure-Tang/main UnicornChan 2024-09-15 15:17:05 +08:00
  • 7953bd9c0d update readme Azure 2024-09-13 08:47:35 +00:00
  • 3758afb526 fix some dequant function dosen't support multi gpu bug Azure 2024-09-13 08:34:23 +00:00
  • 234faf7987 typo fix: KMisrtal -> KMistral xhedit 2024-09-12 15:58:01 +00:00
  • 6666d62237 Use cond var to avoid busy loop Yap Sok Ann 2024-09-10 19:26:17 +07:00
  • 3ed8a0437f
    Merge pull request #72 from sayap/dequantize-iq4-xs UnicornChan 2024-09-06 10:06:10 +08:00
  • ee72cee050 Fix: the tokens return by prefill_and_generate yangshen 2024-09-05 05:29:23 +00:00
  • be81269e38
    Merge pull request #71 from Azure-Tang/main UnicornChan 2024-09-02 14:51:05 +08:00
  • c55de02f7b fix qlen > 1000 mask is none error Azure 2024-09-02 02:58:10 +00:00
  • be356c1b8d Support IQ4_XS dequantize Yap Sok Ann 2024-09-02 09:03:42 +07:00
  • 022b893819
    Merge pull request #67 from UnicornChan/main UnicornChan 2024-08-30 17:57:49 +08:00
  • 49cce0c437 [fix] bugs about Qwen57B, install requirement, Dockerfile chenxl 2024-08-30 03:24:26 +00:00
  • 351698c3b5
    Merge pull request #64 from eltociear/patch-1 UnicornChan 2024-08-30 09:03:47 +08:00
  • c80490a95e [fix] some bugs while package in github action chenxl 2024-08-29 16:03:59 +00:00
  • e961adde15
    docs: update long_context_introduction.md Ikko Eltociear Ashimine 2024-08-30 03:34:39 +09:00
  • f536a7085f
    Merge pull request #62 from Azure-Tang/main UnicornChan 2024-08-29 23:40:21 +08:00
  • 8747c099f2 update yaml example; update version idx; update docker file TangJingqi 2024-08-29 22:39:20 +08:00
  • 6735beb5b6 Fix cannot offload whole layer in cpu TangJingqi 2024-08-29 19:10:14 +08:00
  • 35d7aed207
    Merge pull request #60 from sammcj/patch-1 Atream 2024-08-29 19:01:16 +08:00
  • 0b57627bb9
    fix(docs): fix broken link Sam 2024-08-29 20:00:55 +10:00
  • 1dcb8dae5b
    Merge pull request #58 from Azure-Tang/main Azure 2024-08-29 14:39:58 +08:00
  • 440d827e7c update readme TangJingqi 2024-08-29 12:04:56 +08:00
  • abd4214b56 fix readme; adjust param TangJingqi 2024-08-29 10:40:08 +08:00
  • 233bbb8c55
    Merge pull request #57 from UnicornChan/develop-0.1.3 UnicornChan 2024-08-29 01:57:34 +08:00
  • 4d1d561d28 [feature] release 0.1.3 chenxl 2024-08-28 16:11:43 +00:00
  • 67f8b370c3
    Merge pull request #56 from hyx1999/patch-1 UnicornChan 2024-08-28 14:30:38 +08:00
  • ea1143e54e
    Update README.md _HYX_ 2024-08-28 12:15:38 +08:00
  • 0f054fe4ff
    Merge pull request #52 from UnicornChan/fix-bug-load-config UnicornChan 2024-08-22 23:54:17 +08:00
  • b9f0819a86 None for load config chenxl 2024-08-22 15:52:25 +00:00
  • 1f85db3d73
    Merge pull request #51 from molamooo/fix-f16-dequantize-device UnicornChan 2024-08-22 16:31:45 +08:00
  • 29f4151ebc
    [fix] f16 dequantize device ignored molamooo 2024-08-22 15:10:06 +08:00
  • cbc47d0b68
    Merge pull request #48 from Azure-Tang/main UnicornChan 2024-08-21 23:24:46 +08:00
  • 170b7a6001 fix server don't accept yaml path as param; fix server static cache device problem TangJingqi 2024-08-21 14:19:43 +08:00
  • 4358722891
    Merge pull request #41 from Azure-Tang/main Azure 2024-08-16 15:27:01 +08:00
  • 4a0f1cbbfa Update readme TangJingqi 2024-08-16 15:22:16 +08:00
  • f681eb4a0f
    Merge branch 'kvcache-ai:main' into main Azure 2024-08-16 15:21:41 +08:00
  • b25ad4ec32
    Merge pull request #39 from Azure-Tang/develop-0.1.2 Azure 2024-08-16 11:29:11 +08:00
  • 4f87756c2e fix broken link TangJingqi 2024-08-16 11:10:30 +08:00
  • 7199699d78 Fix the broken link TangJingqi 2024-08-16 10:59:34 +08:00
  • e81fc482ff update README Azure 2024-08-15 16:59:26 +00:00
  • 6bcec86fa0 update README Azure 2024-08-15 16:54:31 +00:00
  • 77a34c289c
    Merge pull request #36 from kvcache-ai/develop-0.1.2 UnicornChan 2024-08-15 20:59:50 +08:00
  • 395cd3e786
    Merge pull request #35 from Azure-Tang/develop-0.1.2 UnicornChan 2024-08-15 20:56:22 +08:00
  • de3faaf55d Update readme; add pipeline tutorial; add detailed inject tutorial TangJingqi 2024-08-15 20:42:54 +08:00
  • c9bf79299b
    Merge pull request #34 from Azure-Tang/develop-0.1.2 UnicornChan 2024-08-15 11:28:19 +08:00
  • c47205dce9 fix name TangJingqi 2024-08-15 11:25:12 +08:00
  • 67043b4b5c [fix] format classes and files name TangJingqi 2024-08-15 10:44:59 +08:00
  • 1db4a67dca [feature] add github action for pre compile chenxl 2024-08-14 16:54:50 +00:00
  • 412055d450 [feature] experts can be injected using CPUInfer [fix] fix ktransformers interface when use new CUDAGraphRunner [fix] fix YAML and optimize logic, the top rule has the highest priority Atream 2024-08-14 16:10:54 +08:00
  • 80815dbc50
    Merge pull request #30 from BITcyman/feature-lc-dequantize_q2k_q3k_gpu UnicornChan 2024-08-12 21:02:32 +08:00
  • 7c4cb520bd [feature] support q2_k & q3_k dequantize on gpu BITcyman 2024-08-12 12:53:12 +00:00
  • 650c368c18 Merge remote-tracking branch 'upstream/main' into develop-0.1.2 chenxl 2024-08-12 12:31:49 +00:00
  • 44f57270c9
    Merge pull request #29 from kvcache-ai/fix]-fix-linux-mutex UnicornChan 2024-08-12 20:14:04 +08:00
  • 3c675af61a
    Update task_queue.h Atream 2024-08-12 20:06:19 +08:00
  • f5f79f5c0e [ADD] support multi-gpu qlen>1 q5_k chenxl 2024-08-12 11:17:29 +00:00
  • f293803156
    Merge pull request #27 from chenht2022/develop-0.1.2 UnicornChan 2024-08-09 17:57:27 +08:00
  • cb7f8e7817
    Merge pull request #26 from kvcache-ai/windows UnicornChan 2024-08-09 17:57:00 +08:00
  • 782a17e4e6 [feature] add bat for windows, update readme chenxl 2024-08-09 09:39:42 +00:00
  • c1cc7d2cd2 1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic. chenht2022 2024-08-08 09:04:36 +00:00
  • 1f92f7cc61 [fix] linux and windows can all find CPUInfer in current Directory Atream 2024-08-08 16:02:42 +08:00
  • 11544ef2b0
    Merge pull request #25 from kvcache-ai/windows UnicornChan 2024-08-08 16:01:54 +08:00
  • 0e613b602d [fix] recover fp16 support Atream 2024-08-08 15:42:21 +08:00
  • 1d9d397525 fix some bug in compile in linux chenxl 2024-08-08 04:26:58 +00:00
  • 0a2fd52cea support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from pygguf(It was added before, but disappear after merge). Add some TODO in the code. Atream 2024-08-07 12:19:06 +08:00
  • 442e13bc97
    Merge pull request #18 from UnicornChan/update-docker-readme UnicornChan 2024-08-01 15:40:47 +08:00
  • 69fa2f4395 update docker.md support docker pull image chenxl 2024-08-01 07:39:48 +00:00
  • 5e83bc0c82
    Merge pull request #17 from UnicornChan/feature-support-multi-instruct UnicornChan 2024-08-01 12:29:07 +08:00
  • 86ba1336a9 [feature] add support for building docker image chenxl 2024-08-01 04:01:00 +00:00
  • 112cb3c962 [feature] support python 310 and multi instruction chenxl 2024-07-31 13:58:17 +00:00
  • 25620829ce
    Merge pull request #14 from UnicornChan/feature-support-pypi UnicornChan 2024-07-29 19:58:18 +08:00
  • dd18a11cab [feature] support for pypi install chenxl 2024-07-29 11:51:28 +00:00
  • a25320b703
    Update README.md Allen 2024-07-27 23:40:08 +08:00
  • 8629301b7f
    Update README.md UnicornChan 2024-07-27 23:36:13 +08:00
  • de13c3e7cd
    Update README.md UnicornChan 2024-07-27 23:27:32 +08:00
  • 1ac7bfd75e
    Update README.md UnicornChan 2024-07-27 23:19:49 +08:00
  • 6977ebb555
    Merge pull request #7 from kvcache-ai/james0zan-patch-1 Mingxing Zhang 2024-07-27 23:12:59 +08:00
  • 935c28c277
    Update README.md Mingxing Zhang 2024-07-27 23:12:00 +08:00
  • 01df4a6b3f
    Update README.md Mingxing Zhang 2024-07-27 23:07:31 +08:00
  • 18c42e67df Initial commit chenxl 2024-07-27 16:06:58 +08:00