Commit Graph

  • d79d8f39b4
    vulkan: multi-row k quants (#10846) Eve 2024-12-26 10:54:44 -05:00
  • d283d02bf2
    examples, ggml : fix GCC compiler warnings (#10983) Peter 2024-12-27 00:59:11 +11:00
  • 9ba399dfa7
    server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) Reza Kakhki 2024-12-24 21:33:04 +01:00
  • 2cd43f4900
    ggml : more perfo with llamafile tinyblas on x86_64 (#10714) Djip007 2024-12-24 18:54:49 +01:00
  • 09fe2e7613
    server: allow filtering llama server response fields (#10940) NeverLucky 2024-12-24 19:39:49 +03:00
  • 30caac3a68
    llama : the WPM vocabs use the CLS token as BOS (#10930) Georgi Gerganov 2024-12-24 09:44:20 +02:00
  • 60cfa728e2
    ggml : use wstring for backend search paths (#10960) Diego Devesa 2024-12-24 04:05:27 +01:00
  • 3327bb0f8d
    ggml : fix arm enabled features check (#10961) Diego Devesa 2024-12-24 04:05:17 +01:00
  • 32d6ee6385
    ggml : fix const usage in SSE path (#10962) Diego Devesa 2024-12-23 20:25:52 +01:00
  • 14b699ecde
    server : fix missing model id in /model endpoint (#10957) Xuan Son Nguyen 2024-12-23 12:52:25 +01:00
  • 485dc01214
    server : add system_fingerprint to chat/completion (#10917) Xuan Son Nguyen 2024-12-23 12:02:44 +01:00
  • 86bf31cfe6
    rpc-server : add support for the SYCL backend (#10934) Radoslav Gerganov 2024-12-23 10:39:30 +02:00
  • b92a14a841
    llama : support InfiniAI Megrez 3b (#10893) Yun Dou 2024-12-23 08:35:44 +08:00
  • 6f0c9e034b
    llama : support for Llama-3_1-Nemotron-51B (#10669) ymcki 2024-12-23 08:22:33 +08:00
  • dab76c92cc
    llama-run : include temperature option (#10899) Eric Curtin 2024-12-23 00:21:40 +00:00
  • 7024d59e6a
    ggml : fix run-time on FreeBSD in get_executable_path() (#10948) yuri@FreeBSD 2024-12-22 16:20:11 -08:00
  • 7c0e285858
    devops : add docker-multi-stage builds (#10832) Rudi Servo 2024-12-22 21:22:58 -01:00
  • 7ae33a616f
    llama : add Falcon3 support (#10883) Billel Mokeddem 2024-12-23 01:09:58 +03:00
  • ebdee9478c
    vulkan: build fixes for 32b (#10927) Jeff Bolz 2024-12-22 03:44:01 -06:00
  • 5cd85b5e00
    convert : add BertForMaskedLM (#10919) Georgi Gerganov 2024-12-21 10:10:18 +02:00
  • a91a41364b
    vulkan: optimize coopmat2 dequant functions (#10855) Jeff Bolz 2024-12-21 01:04:45 -06:00
  • e34c5af43f
    ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (#10874) Adrien Gallouët 2024-12-21 00:33:37 +01:00
  • eb5c3dc64b
    SYCL: Migrate away from deprecated ggml_tensor->backend (#10840) Akarshan Biswas 2024-12-20 21:01:28 +05:30
  • 0ca416c91a
    server : (UI) fix copy to clipboard function (#10916) Xuan Son Nguyen 2024-12-20 14:12:06 +01:00
  • 21ae3b9be8
    ggml : add test for SVE and disable when it fails (#10906) Diego Devesa 2024-12-20 13:31:28 +01:00
  • 0a11f8b7b5
    convert : fix RWKV v6 model conversion (#10913) Molly Sophia 2024-12-20 17:44:58 +08:00
  • d408bb9268
    clip : disable GPU support (#10896) Georgi Gerganov 2024-12-19 18:47:15 +02:00
  • 5cab3e4aaa
    llama : minor grammar refactor (#10897) Georgi Gerganov 2024-12-19 17:42:13 +02:00
  • 36319dec5d
    tts : small QoL for easy model fetch (#10903) Georgi Gerganov 2024-12-19 17:35:15 +02:00
  • 57bb2c40cd
    server : fix logprobs, make it OAI-compatible (#10783) Xuan Son Nguyen 2024-12-19 15:40:08 +01:00
  • a3c33b1dce
    ggml: fix arm build with gcc (#10895) Adrien Gallouët 2024-12-19 14:20:41 +01:00
  • 2fffc52b50
    llama : fix Roberta embeddings (#10856) Sukriti Sharma 2024-12-19 06:04:51 -07:00
  • 7585edbdeb
    convert : Add support for Microsoft Phi-4 model (#10817) fairydreaming 2024-12-19 10:37:12 +01:00
  • cd920d0ac3
    tests: disable GGUF test for bad value size (#10886) Johannes Gäßler 2024-12-19 08:53:58 +01:00
  • 7909e8588d
    llama-run : improve progress bar (#10821) Eric Curtin 2024-12-19 02:58:00 +00:00
  • 9177484f58
    ggml : fix arm build (#10890) Diego Devesa 2024-12-18 23:21:42 +01:00
  • 0bf2d10c55
    tts : add OuteTTS support (#10784) Georgi Gerganov 2024-12-18 19:27:21 +02:00
  • 7bbb5acf12
    server: avoid overwriting Authorization header (#10878) Gaetan Bisson 2024-12-18 04:00:07 -10:00
  • 152610eda9
    server : output embeddings for all tokens when pooling = none (#10861) Georgi Gerganov 2024-12-18 13:01:41 +02:00
  • 0e70ba686e
    server : add "tokens" output (#10853) Georgi Gerganov 2024-12-18 11:05:29 +02:00
  • 46828872c3
    server : (embeddings) using same format for "input" and "content" (#10872) Xuan Son Nguyen 2024-12-18 09:55:09 +01:00
  • 6b064c92b4
    docs: Fix HIP (née hipBLAS) in README (#10880) redbeard 2024-12-18 00:35:00 -08:00
  • 4da69d1abd
    Revert "llama : add Falcon3 support (#10864)" (#10876) Diego Devesa 2024-12-18 01:36:46 +01:00
  • d62b532c52
    Use model->gguf_kv for loading the template instead of using the C API. (#10868) DAN™ 2024-12-17 17:24:22 -05:00
  • 081b29bd2a
    tests: add tests for GGUF (#10830) Johannes Gäßler 2024-12-17 19:09:35 +01:00
  • 5437d4aaf5
    sync : ggml Georgi Gerganov 2024-12-17 18:36:02 +02:00
  • 78f766768d
    cmake : fix "amd64" processor string (whisper/2638) Georgi Gerganov 2024-12-17 18:34:32 +02:00
  • 8dd19a4812
    vulkan : fix soft_max.comp division by zero (whisper/2633) gn64 2024-12-16 19:34:38 +09:00
  • 130d0c90bd
    ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) Daniel Bevenius 2024-12-14 03:23:08 +01:00
  • 3919da8e33
    ggml : add check for grad_accs (ggml/1046) Daniel Bevenius 2024-12-13 08:19:38 +01:00
  • 0006f5a74a
    ggml : update ggml_backend_cpu_device_supports_op (#10867) Georgi Gerganov 2024-12-17 18:35:42 +02:00
  • 05c3a444b8
    server : fill usage info in embeddings and rerank responses (#10852) krystiancha 2024-12-17 16:00:24 +00:00
  • 382bc7f2e8
    llama : add Falcon3 support (#10864) Billel Mokeddem 2024-12-17 19:24:56 +04:00
  • 4f51968aca
    readme : update typos (#10863) Ruan 2024-12-17 17:47:20 +08:00
  • 227d7c5a7f
    server : (UI) fix missing async generator on safari (#10857) Xuan Son Nguyen 2024-12-17 09:52:09 +01:00
  • 7b1ec53f56
    vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809) Eve 2024-12-17 05:52:55 +00:00
  • 160bc039c8
    rwkv6: add wkv6 support for Vulkan backend (#10829) Zhiyuan Li 2024-12-17 05:00:46 +08:00
  • 08ea539df2
    unicode : improve naming style (#10838) Georgi Gerganov 2024-12-16 12:31:45 +02:00
  • 644fd71b44
    sampling : refactor + optimize penalties sampler (#10803) Georgi Gerganov 2024-12-16 12:31:14 +02:00
  • 4ddd199f6f
    llava : Allow locally downloaded models for QwenVL (#10833) Bartowski 2024-12-15 15:43:25 -05:00
  • a0974156f3
    llama : add Deepseek MoE v1 & GigaChat models (#10827) Valentin Mamedov 2024-12-16 00:02:46 +07:00
  • 87cf323cef
    scripts : change build path to "build-bench" for compare-commits.sh (#10836) Georgi Gerganov 2024-12-15 18:44:47 +02:00
  • 5478bbcd17
    server: (UI) add syntax highlighting and latex math rendering (#10808) Vinesh Janarthanan 2024-12-15 05:55:54 -06:00
  • b5ae1ddff9
    gguf-py : bump to v0.13.0 Georgi Gerganov 2024-12-15 13:16:42 +02:00
  • 89d604f2c8
    server: Fix has_next_line in JSON response (#10818) Michelle Tan 2024-12-14 22:29:45 +00:00
  • e52aba537a
    nix: allow to override rocm gpu targets (#10794) Evgeny Kurnevsky 2024-12-14 18:17:36 +00:00
  • ba1cb19cdd
    llama : add Qwen2VL support + multimodal RoPE (#10361) HimariO 2024-12-14 20:43:46 +08:00
  • 56eea0781c
    Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771) cduk 2024-12-13 23:21:49 +01:00
  • a76c56fa1a
    Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693) lhez 2024-12-13 12:23:52 -08:00
  • c27ac678dd
    Opt class for positional argument handling (#10508) Eric Curtin 2024-12-13 18:34:25 +00:00
  • 11e07fd63b
    fix: graceful shutdown for Docker images (#10815) Corentin REGAL 2024-12-13 18:23:50 +01:00
  • 4601a8bb67
    gguf-py : numpy 2 newbyteorder fix (#9772) Jett Janiak 2024-12-13 15:48:44 +01:00
  • 9f35e44592
    Fix crash caused by ggml_backend_load_all when launching on Android Activity (#10812) 谢乃闻 2024-12-13 12:56:07 +00:00
  • 64ae065511
    vulkan: small mul_mat_vec optimizations (#10665) Eve 2024-12-13 08:42:04 +00:00
  • 83ed24a97b
    SYCL: Reduce most of the compiler warnings (#10748) Akarshan Biswas 2024-12-13 12:12:15 +05:30
  • d583cd03f6
    ggml : Fix compilation issues on ARM platform when building without fp16 (#10811) Karol Kontny 2024-12-13 01:04:19 +01:00
  • adffa6ffd5
    common : improve -ctv -ctk CLI arguments (#10806) Xuan Son Nguyen 2024-12-12 22:53:05 +01:00
  • 274ec65af6
    contrib : add ngxson as codeowner (#10804) Xuan Son Nguyen 2024-12-12 20:52:28 +01:00
  • 8faa1d4dd4
    CUDA: faster non-contiguous concat (#10760) a3sh 2024-12-13 02:09:50 +08:00
  • cb13ef85a4
    remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) Diego Devesa 2024-12-12 19:02:49 +01:00
  • 4064c0e3b6
    Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 0cc4m 2024-12-12 18:36:00 +01:00
  • dc5301d565
    Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721) 0cc4m 2024-12-12 18:35:37 +01:00
  • 9fdb124304
    common : add missing env var for speculative (#10801) Xuan Son Nguyen 2024-12-12 16:57:32 +01:00
  • 5555c0c1f6
    docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +00:00
  • 973f328b1e
    Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:14:46 +02:00
  • fb18934a97
    gguf-py : bump version to 0.11.0 Georgi Gerganov 2024-12-11 23:13:31 +02:00
  • 235f6e14bf
    server : (UI) add tok/s, get rid of completion.js (#10786) Xuan Son Nguyen 2024-12-11 20:52:14 +01:00
  • 1a31d0dc00
    Update README.md (#10772) qingy1337 2024-12-11 07:16:32 -08:00
  • 92f77a640f
    ci : pin nodejs to 22.11.0 (#10779) Xuan Son Nguyen 2024-12-11 14:59:41 +01:00
  • 484d2f31ae
    bug-fix: snprintf prints NULL in place of the last character (#10419) kallewoof 2024-12-11 22:48:04 +09:00
  • 4b4d92b098
    docs: fix server documentation formatting (#10776) CentricStorm 2024-12-11 10:47:43 +00:00
  • 43041d2eb3
    ggml: load all backends from a user-provided search path (#10699) Gilad S. 2024-12-11 02:47:21 +02:00
  • b685daf386
    vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) Jeff Bolz 2024-12-10 14:23:17 -06:00
  • dafae66cc2
    vulkan: dynamic subgroup size for the remaining k quants (#10745) Eve 2024-12-10 19:33:23 +00:00
  • ae4b922614
    imatrix : Add imatrix to --no-context-shift (#10766) Bartowski 2024-12-10 12:23:50 -05:00
  • 750cb3e246
    CUDA: rename macros to avoid conflicts with WinAPI (#10736) Andreas Kieslinger 2024-12-10 18:23:24 +01:00
  • a86ad841f1
    server : add flag to disable the web-ui (#10762) (#10751) Yüg 2024-12-10 17:22:34 +00:00
  • a05e2afcc2
    vulkan: disable spirv-opt for coopmat shaders (#10763) Jeff Bolz 2024-12-10 11:22:20 -06:00
  • 26a8406ba9
    CUDA: fix shared memory access condition for mmv (#10740) Johannes Gäßler 2024-12-09 20:07:12 +01:00
  • c37fb4cf62
    Changes to CMakePresets.json to add ninja clang target on windows (#10668) Srihari-mcw 2024-12-09 23:10:19 +05:30