Commit Graph

  • 4047be74da
    scripts: update compare-llama-bench.py (#10319) Johannes Gäßler 2024-11-15 21:19:03 +01:00
  • 883d206fbd ggml : fix some build issues slaren 2024-11-15 20:20:54 +01:00
  • 09ecbcb596 cmake : fix ppc64 check (whisper/0) Georgi Gerganov 2024-11-15 15:35:22 +02:00
  • 3225008973 ggml : vulkan logs (whisper/2547) thewh1teagle 2024-11-15 15:33:53 +02:00
  • cbf5541a82 sync : ggml Georgi Gerganov 2024-11-15 15:31:16 +02:00
  • 18429220bd
    AVX BF16 and single scale quant optimizations (#10212) Eve 2024-11-15 11:47:58 +00:00
  • f0204a0ec7
    ci: build test musa with cmake (#10298) R0CKSTAR 2024-11-15 19:47:25 +08:00
  • 57f8355b29
    sycl: Update Intel docker images to use DPC++ 2025.0 (#10305) Romain Biessy 2024-11-15 12:10:45 +01:00
  • 9901068ac7
    server : (web UI) add copy button for code block, fix api key (#10242) Xuan Son Nguyen 2024-11-15 05:48:49 -04:00
  • 231f9360d9
    cann: dockerfile and doc adjustment (#10302) Chenguang Li 2024-11-15 15:09:35 +08:00
  • 4802ad350b
    scripts : fix regex in sync [no ci] Georgi Gerganov 2024-11-15 08:38:43 +02:00
  • 5a54af4d4f
    sycl: Use syclcompat::dp4a (#10267) Romain Biessy 2024-11-15 04:09:12 +01:00
  • 1607a5e5b0
    backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) Charles Xu 2024-11-15 01:28:50 +01:00
  • ae8de6d50a
    ggml : build backends as libraries (#10256) Diego Devesa 2024-11-14 18:04:35 +01:00
  • 4a8ccb37ad
    CUDA: no -sm row for very small matrices (#10185) Johannes Gäßler 2024-11-14 13:00:15 +01:00
  • 2a82891a85
    speculative : fix out-of-bounds access (#10289) Georgi Gerganov 2024-11-14 11:44:15 +02:00
  • af148c9386
    vulkan: Optimize binary ops (#10270) Jeff Bolz 2024-11-13 23:22:55 -06:00
  • 66798e42fb
    vulkan: Use macros to make the mat mul pipeline creation more concise (#10259) Jeff Bolz 2024-11-13 14:59:47 -06:00
  • fb4a0ec083
    llama : propagate the results of graph_compute (#9525) Michael Podvitskiy 2024-11-13 20:00:35 +02:00
  • 5ea926dad7
    sync : ggml Georgi Gerganov 2024-11-13 18:11:54 +02:00
  • 1ee9eea094
    docs : update bindings list (#10261) Small Grass Forest 2024-11-13 19:17:10 +08:00
  • ff7fb670d0
    server : add missing docs (#10269) Alexey Parfenov 2024-11-13 11:16:30 +00:00
  • 0e712a5acb
    server : fix incorrect res in validate_model_chat_template (#10272) Jhen-Jie Hong 2024-11-13 19:15:23 +08:00
  • a0ec17b32e
    metadata: Detailed Dataset Authorship Metadata (#8875) Brian 2024-11-13 21:10:38 +11:00
  • 2e82ffa4af
    sycl : Fixes to broken builds and test-backend-ops (#10257) Alberto Cabrera Pérez 2024-11-13 09:40:57 +00:00
  • 80dd7ff22f
    vulkan: Optimize contiguous copies (#10254) Jeff Bolz 2024-11-13 00:58:57 -06:00
  • 54ef9cfc72
    vulkan: Throttle the number of shader compiles during the build step. (#10222) Jeff Bolz 2024-11-11 11:13:51 -06:00
  • b0cefea58a
    metal : more precise Q*K in FA vec kernel (#10247) Georgi Gerganov 2024-11-11 08:39:13 +02:00
  • b141e5f6ef
    server : enable KV cache defrag by default (#10233) Georgi Gerganov 2024-11-11 08:38:43 +02:00
  • 4b3a9212b6
    flake.lock: Update (#10243) Georgi Gerganov 2024-11-10 21:45:25 +02:00
  • 505f33274d
    server : (web UI) Add back sampler settings (#10239) MaggotHATE 2024-11-11 00:42:25 +05:00
  • 160687b3ed
    vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226) Jeff Bolz 2024-11-10 05:37:56 -06:00
  • 6423c65aa8
    metal : reorder write loop in mul mat kernel + style (#10231) Georgi Gerganov 2024-11-09 11:53:13 +02:00
  • 39a334a9aa
    metal : fix build and some more comments (#10229) Georgi Gerganov 2024-11-09 11:53:02 +02:00
  • bb38cdd8ba
    metal : fix F32 accumulation in FA vec kernel (#10232) Georgi Gerganov 2024-11-09 11:52:45 +02:00
  • f018acba22
    llama : fix Qwen model type strings Georgi Gerganov 2024-11-09 11:26:34 +02:00
  • 46323fa9ef
    metal : hide debug messages from normal log Georgi Gerganov 2024-11-09 11:21:49 +02:00
  • 5b359bb1e3
    ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) SXX 2024-11-09 15:35:46 +08:00
  • e89213492d
    ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) amritahs-ibm 2024-11-09 12:47:50 +05:30
  • 8fc393f246
    scripts : fix pattern and get n_tokens in one go (#10221) haopeng 2024-11-09 15:06:54 +08:00
  • ec450d3bbf
    metal : opt-in compile flag for BF16 (#10218) Georgi Gerganov 2024-11-08 21:59:46 +02:00
  • 695ad752b2
    metal : improve clarity (minor) (#10171) Georgi Gerganov 2024-11-08 18:37:41 +02:00
  • 841f27abdb
    metal : optimize FA kernels (#10171) Georgi Gerganov 2024-11-08 13:47:22 +02:00
  • d05b3127bd
    swift : exclude ggml-metal-embed.metal (#10211) Jhen-Jie Hong 2024-11-08 17:34:06 +08:00
  • 76c6e7f105
    server : minor UI fix (#10207) Xuan Son Nguyen 2024-11-07 18:44:38 -04:00
  • a71d81cf8c
    server : revamp chat UI with vuejs and daisyui (#10175) Xuan Son Nguyen 2024-11-07 17:31:10 -04:00
  • eec4d71737
    scripts : add amx to sync-ggml.sh [no ci] Georgi Gerganov 2024-11-07 23:11:36 +02:00
  • 3b08828674
    sync : ggml Georgi Gerganov 2024-11-07 23:08:24 +02:00
  • a2c6fd747c
    scripts : sync update Georgi Gerganov 2024-11-07 23:07:55 +02:00
  • 97404c4a03
    ggml : add ggml-cpu.h to the public headers (#10204) Diego Devesa 2024-11-07 18:16:08 +01:00
  • 60e17ce23c
    Remove identical wte/etw logic for jais (#10203) Faisal Zaghloul 2024-11-07 11:46:12 -05:00
  • 5107e8cea3
    DRY: Fixes clone functionality (#10192) wwoodsTM 2024-11-07 08:20:25 -07:00
  • 2319126a70
    fix q4_0_8_8 format for corrupted tokens issue (#10198) snadampal 2024-11-07 02:02:08 -06:00
  • 3bcd40b3c5
    Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) Zhiyuan Li 2024-11-07 18:19:10 +11:00
  • 5c333e0140
    metal : add BF16 support (#8439) Georgi Gerganov 2024-11-06 19:53:51 +02:00
  • b11f9ba9b8
    server : remove hack for extra parallel slot (#10187) Georgi Gerganov 2024-11-06 13:29:01 +02:00
  • 94d8cb8be1
    metal : fix from ptr buffer name (#10189) Diego Devesa 2024-11-06 12:10:07 +01:00
  • 1dc04b2dee
    ggml : adjust is_first_call init value (#10193) Georgi Gerganov 2024-11-06 11:20:10 +02:00
  • a1eaf6a960
    metal : add quantized FA support (#10149) Georgi Gerganov 2024-11-06 10:24:23 +02:00
  • b8deef0ec0
    llama : add <|tool_call|> formatting to Granite template (#10177) Gabe Goodhart 2024-11-05 05:23:04 -07:00
  • a9e8a9a030
    ggml : fix arch check in bf16_to_fp32 (#10164) Diego Devesa 2024-11-04 23:17:01 +01:00
  • 3407364776
    Q6_K AVX improvements (#10118) Eve 2024-11-04 22:06:31 +00:00
  • d5a409e57f
    ggml : fix gelu tables initialization (#10172) Diego Devesa 2024-11-04 20:06:58 +01:00
  • 401558b7ba
    ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) Diego Devesa 2024-11-04 17:34:08 +01:00
  • 9e0ecfb697
    server : clarify /slots endpoint, add is_processing (#10162) Xuan Son Nguyen 2024-11-04 16:33:29 +01:00
  • 6a066b9978
    fix build break on arm64 linux (#10166) snadampal 2024-11-04 09:08:33 -06:00
  • ea02c753eb
    cuda : clear error after changing peer access (#10153) Diego Devesa 2024-11-04 13:10:23 +01:00
  • 05697f670b
    metal : simplify f16 and f32 dequant kernels (#0) Georgi Gerganov 2024-11-04 13:49:34 +02:00
  • f8e58135cf
    metal : move dequantize templates to beginning of MSL source (#0) Georgi Gerganov 2024-11-04 13:43:32 +02:00
  • 329ed914c9
    CANN: adjust backend registry refactor. (#10158) leo-pony 2024-11-04 19:08:22 +08:00
  • ce027adfb3
    sync : ggml Georgi Gerganov 2024-11-04 10:33:37 +02:00
  • 284e5b0275
    cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
  • e2292aaa17
    metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
  • 9f40989351
    ggml : move CPU backend to a separate file (#10144) Diego Devesa 2024-11-03 19:34:08 +01:00
  • 08828a6d7d
    metal : minor fixup in FA kernel (#10143) Georgi Gerganov 2024-11-03 15:18:40 +02:00
  • 1839f69130
    flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
  • 9830b6923b
    Add apple arm to presets (#10134) Christian Köhnenkamp 2024-11-02 23:35:31 +01:00
  • 42cadc74bd
    server : fix slot selection by lru (#10126) sasha0552 2024-11-02 16:34:56 +00:00
  • 45950415ed
    server : fix endpoint checks (#10135) Georgi Gerganov 2024-11-02 18:34:00 +02:00
  • 1926d6e39d
    llama : adjust default context size + print warnings (#10136) Georgi Gerganov 2024-11-02 15:18:56 +02:00
  • b634f8a26f
    simple-chat : only add bos on first prompt (#10129) Diego Devesa 2024-11-02 13:08:53 +01:00
  • 7554aa4655
    convert-lora : make --base optional (#10110) Xuan Son Nguyen 2024-11-02 12:53:17 +01:00
  • a6744e43e8
    llama : add simple-chat example (#10124) Diego Devesa 2024-11-01 23:50:59 +01:00
  • e991e3127f
    llama : use smart pointers for ggml resources (#10117) Diego Devesa 2024-11-01 23:48:26 +01:00
  • 418f5eef26
    vulkan : improve ggml_vk_create_buffer error handling (#9898) Shupei Fan 2024-11-02 02:33:14 +08:00
  • ba6f62eb79
    readme : update hot topics Georgi Gerganov 2024-11-01 17:31:51 +02:00
  • d865d1478c
    server : fix smart selection of available slot (#10120) sasha0552 2024-11-01 13:33:14 +00:00
  • 1804adb0cf
    ggml : remove ggml_scratch (#10121) Georgi Gerganov 2024-11-01 12:58:45 +02:00
  • 815fe72adc
    sync : ggml Georgi Gerganov 2024-11-01 10:28:24 +02:00
  • f221d56220
    ggml : alloc ggml_contexts on the heap (whisper/2525) Georgi Gerganov 2024-11-01 10:23:05 +02:00
  • e597e50794
    build: fix build error in Windows env with OneAPI setup (#10107) Zhenwei Jin 2024-11-01 11:09:59 +08:00
  • 85679d37f3
    llama : improve output buffer type selection (#10098) Diego Devesa 2024-11-01 00:49:53 +01:00
  • 1e9f94994e
    quantize : fix --keep-split (#10114) Diego Devesa 2024-11-01 00:45:34 +01:00
  • c02e5ab2a6
    llama : fix buffer checks for mamba and rwk (#10111) Diego Devesa 2024-10-31 22:54:23 +01:00
  • ab3d71f97f
    loader: refactor tensor weights storage (#9935) Zhenwei Jin 2024-11-01 02:50:39 +08:00
  • 0a683e8088
    server : include scheme when printing URL (#10106) Kevin Gibbons 2024-10-31 06:02:35 -07:00
  • dea5e86051
    ggml : check tensor name lengths in gguf files (#10100) Diego Devesa 2024-10-31 11:40:59 +01:00
  • 1329c0a75e
    kompute: add mul_mat_q4_k shader (#10097) Sergio López 2024-10-31 10:09:52 +01:00
  • 61408e7fad
    kompute: add backend registry / device interfaces (#10045) Sergio López 2024-10-30 17:01:52 +01:00
  • b9e02e8184
    ggml : fix memory leaks when loading invalid gguf files (#10094) Diego Devesa 2024-10-30 14:51:21 +01:00