Commit Graph

  • 9c8dcefe17
    CUDA: backwards pass for misc. ops, add tests (#11257) Johannes Gäßler 2025-01-16 16:43:38 +01:00
  • 681149ced2
    llama : add llama_model_load_from_splits (#11255) Xuan Son Nguyen 2025-01-16 13:54:08 +01:00
  • c67cc9837d
    ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227) fj-y-saito 2025-01-16 18:11:49 +09:00
  • adc5dd92e8
    vulkan: scale caching for k quants + misc fixes (#11081) Eve 2025-01-15 19:50:13 +00:00
  • f11cfdfd7f
    ci : use -no-cnv in gguf-split tests (#11254) Georgi Gerganov 2025-01-15 18:28:35 +02:00
  • 1d8504338e
    fix: ggml: fix vulkan-shaders-gen build (#10448) Junil Kim 2025-01-15 22:17:42 +09:00
  • 432df2d5f9
    RoPE: fix back, CUDA support for back + noncont. (#11240) Johannes Gäßler 2025-01-15 12:51:37 +01:00
  • 0ccd7f3eb2
    examples : add embd_to_audio to tts-outetts.py [no ci] (#11235) Daniel Bevenius 2025-01-15 05:44:38 +01:00
  • f446c2cf6a
    SYCL: Add gated linear attention kernel (#11175) Akarshan Biswas 2025-01-15 08:50:17 +05:30
  • b4d92a59a2
    ci : add -no-cnv for tests (#11238) Xuan Son Nguyen 2025-01-14 15:42:23 +01:00
  • bbf3e55e35
    vocab : add dummy tokens for "no_vocab" type (#11231) Georgi Gerganov 2025-01-14 12:54:58 +02:00
  • c5bf0d1bd7
    server : Improve code snippets direction between RTL text (#11221) ebraminio 2025-01-14 14:09:33 +03:30
  • 091592d758
    Refactor test-chat-template.cpp (#11224) Olivier Chafik 2025-01-14 10:16:41 +00:00
  • 44d1e796d0
    sync : ggml Georgi Gerganov 2025-01-14 10:39:42 +02:00
  • a4f3f5d8e6
    scripts : sync gguf (cont) Georgi Gerganov 2025-01-14 09:40:15 +02:00
  • 48e1ae0e61
    scripts : sync gguf Georgi Gerganov 2025-01-14 09:36:58 +02:00
  • d00a80e89d
    scripts : sync opencl Georgi Gerganov 2025-01-14 09:19:58 +02:00
  • 504af20ee4
    server : (UI) Improve messages bubble shape in RTL (#11220) ebraminio 2025-01-13 22:53:31 +03:30
  • 84a44815f7
    cli : auto activate conversation mode if chat template is available (#11214) Xuan Son Nguyen 2025-01-13 20:18:12 +01:00
  • 39509fb082
    cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (#11042) Andreas Kieslinger 2025-01-13 16:45:53 +01:00
  • a29f0870d4
    contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:59:26 +02:00
  • 437e05f714
    server : (UI) Support for RTL text as models input or output (#11208) ebraminio 2025-01-13 17:16:39 +03:30
  • ca001f6656
    contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:08:44 +02:00
  • 00b4c3da62
    common : support tag-based --hf-repo like on ollama (#11195) Xuan Son Nguyen 2025-01-13 13:56:23 +01:00
  • 7426a26b24
    contrib : add naming guidelines (#11177) Georgi Gerganov 2025-01-13 14:46:36 +02:00
  • 8f70fc3d1b
    llama : remove 'd' from bad special token log (#11212) Daniel Bevenius 2025-01-13 13:38:20 +01:00
  • 1244cdcf14
    ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (#11211) Radoslav Gerganov 2025-01-13 13:31:41 +02:00
  • 924518e2e5
    Reset color before we exit (#11205) Eric Curtin 2025-01-12 18:23:10 +00:00
  • 9a483999a6
    llama : fix chat template gguf key (#11201) Xuan Son Nguyen 2025-01-12 13:45:14 +01:00
  • 08f10f69c3
    llama : remove notion of CLS token (#11064) Georgi Gerganov 2025-01-12 12:15:53 +02:00
  • afa8a9ec9b
    llama : add llama_vocab, functions -> methods, naming (#11110) Georgi Gerganov 2025-01-12 11:32:42 +02:00
  • c05e8c9934
    gguf-py: fixed local detection of gguf package (#11180) Vinesh Janarthanan 2025-01-11 03:42:31 -06:00
  • 2739a71e4b
    convert : sort print supported models [no ci] (#11179) Daniel Bevenius 2025-01-11 05:50:33 +01:00
  • ba8a1f9c5b
    examples : add README.md to tts example [no ci] (#11155) Daniel Bevenius 2025-01-10 13:16:16 +01:00
  • ff3fcabc72
    convert : add --print-supported-models option (#11172) Daniel Bevenius 2025-01-10 11:30:53 +01:00
  • c3f9d25706
    Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161) 0cc4m 2025-01-10 06:39:33 +01:00
  • ee7136c6d1
    llama: add support for QRWKV6 model architecture (#11001) Molly Sophia 2025-01-10 09:58:08 +08:00
  • c6860cc734
    SYCL: Refactor ggml_sycl_compute_forward (#11121) Akarshan Biswas 2025-01-10 05:43:03 +05:30
  • 1204f97270
    doc: add cuda guide for fedora (#11135) Tei Home 2025-01-09 19:32:06 +08:00
  • 8eceb888d7
    server : add tooltips to settings and themes btn (#11154) Daniel Bevenius 2025-01-09 11:28:29 +01:00
  • f8feb4b01a
    model: Add support for PhiMoE arch (#11003) Pierrick Hymbert 2025-01-09 11:21:41 +01:00
  • be0e950c91
    media : remove old img [no ci] Georgi Gerganov 2025-01-09 11:15:15 +02:00
  • d9feae1c06
    llama-chat : add phi 4 template (#11148) Xuan Son Nguyen 2025-01-09 10:07:33 +01:00
  • 8d59d91171
    fix: add missing msg in static_assert (#11143) hydai 2025-01-09 04:03:28 +08:00
  • 8a1d9c25fa
    gguf-py : move scripts directory (#11116) Vinesh Janarthanan 2025-01-08 12:54:58 -06:00
  • 1bf839b1e8
    Enhance user input handling for llama-run (#11138) Eric Curtin 2025-01-08 18:47:05 +00:00
  • f7cd13301c
    ci : use actions from ggml-org (#11140) Xuan Son Nguyen 2025-01-08 16:09:20 +01:00
  • 4d2b3d8804
    lora : improve compat with mergekit-extract-lora (#11131) Xuan Son Nguyen 2025-01-08 15:59:53 +01:00
  • c07d437bbd
    llama : avoid hardcoded QK_K (#11061) Georgi Gerganov 2025-01-08 16:19:36 +02:00
  • 99a3755a3c
    sync : ggml Georgi Gerganov 2025-01-08 13:40:30 +02:00
  • c792dcf488
    ggml : allow loading backend with env variable (ggml/1059) Radoslav Gerganov 2025-01-05 09:50:37 +02:00
  • 80ccf5d725
    ci : pin dependency to specific version (#11137) Xuan Son Nguyen 2025-01-08 12:07:20 +01:00
  • a3c1232c3f
    arg : option to exclude arguments from specific examples (#11136) Georgi Gerganov 2025-01-08 12:55:36 +02:00
  • 8cef75c743
    llamafile : ppc64le MMA INT8 implementation (#10912) amritahs-ibm 2025-01-08 16:24:19 +05:30
  • 0d52a69e4b
    ci : fix cmake option (#11125) Georgi Gerganov 2025-01-08 11:29:34 +02:00
  • 02f0430141
    Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117) Mathieu Baudier 2025-01-08 09:18:13 +01:00
  • bec2183f2c
    fix: Vulkan shader gen binary path when Cross-compiling (#11096) ag2s20150909 2025-01-08 16:17:29 +08:00
  • 53ff6b9b9f
    GGUF: C++ refactor, backend support, misc fixes (#11030) Johannes Gäßler 2025-01-07 18:01:58 +01:00
  • 017cc5f446
    ggml-backend : only offload from host buffers (fix) (#11124) Diego Devesa 2025-01-07 16:11:57 +01:00
  • a3d50bc022
    ggml-backend : only offload from host buffers (#11120) Diego Devesa 2025-01-07 12:38:05 +01:00
  • a4dd490069
    rpc : code cleanup (#11107) Radoslav Gerganov 2025-01-07 08:37:02 +02:00
  • c0d6f790d0
    SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087) Akarshan Biswas 2025-01-07 11:56:07 +05:30
  • dc7cef9f37
    llama-run : fix context size (#11094) Eric Curtin 2025-01-06 22:45:28 +00:00
  • ecebbd292d
    llama : remove unused headers (#11109) Georgi Gerganov 2025-01-06 17:52:35 +02:00
  • 96be8c3264
    github : add cmd line field to bug report (#11090) Xuan Son Nguyen 2025-01-06 16:34:49 +01:00
  • e6e7c75d94
    server : fix extra BOS in infill endpoint (#11106) Georgi Gerganov 2025-01-06 15:36:08 +02:00
  • 09186fabbe
    llama : remove check flash_attn with lora (#11104) Xuan Son Nguyen 2025-01-06 13:41:12 +01:00
  • 96a1dc27c3
    llama : prevent system info string accumulation across calls (#11101) Asghar Ghorbani 2025-01-06 12:21:46 +01:00
  • 6369f867a4
    llama : rename missed batch params/vars to ubatch (#10059) Daniel Bevenius 2025-01-06 10:28:17 +01:00
  • 47182dd03f
    llama : update llama_model API names (#11063) Georgi Gerganov 2025-01-06 10:55:18 +02:00
  • 3e6e7a6bc2
    tokenize : escape the prompt (#11058) Georgi Gerganov 2025-01-06 10:54:25 +02:00
  • ae2f606bb5
    mmap : fix fileno macro clash (#11076) Georgi Gerganov 2025-01-06 10:52:38 +02:00
  • 727368c60f
    llama : use LLAMA_TOKEN_NULL (#11062) Georgi Gerganov 2025-01-06 10:52:15 +02:00
  • 5047dd3546
    llama : use _impl suffix instead of _internal (#11060) Georgi Gerganov 2025-01-06 10:52:01 +02:00
  • 46e3556e01
    CUDA: add BF16 support (#11093) Johannes Gäßler 2025-01-06 02:33:52 +01:00
  • b56f079e28
    Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074) 0cc4m 2025-01-04 21:09:59 +01:00
  • 9394bbd484
    llama : Add support for DeepSeek V3 (#11049) fairydreaming 2025-01-04 21:06:11 +01:00
  • f922a9c542
    [GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047) matt23654 2025-01-04 16:10:30 +00:00
  • 46be942214
    llama : add support for the cohere2 model architecture (#10900) DAN™ 2025-01-04 09:33:31 -05:00
  • 78c6785175 sync : ggml Georgi Gerganov 2025-01-04 10:54:01 +02:00
  • 5e3b08d606 ggml : do not install metal source when embed library (ggml/1054) Georgi Gerganov 2025-01-04 10:53:54 +02:00
  • db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) Daniel Bevenius 2024-12-19 03:50:12 +01:00
  • c31fc8b966
    fix: Vulkan shader gen binary path (#11037) Gilad S. 2025-01-04 10:17:31 +02:00
  • 4b0c638b9a
    common : disable KV cache shifting automatically for unsupported models (#11053) Molly Sophia 2025-01-03 20:13:18 +08:00
  • e7da954ecc
    metal : avoid uint (#11019) Georgi Gerganov 2025-01-03 11:26:14 +02:00
  • f66f582927
    llama : refactor src/llama.cpp (#10902) Georgi Gerganov 2025-01-03 10:18:53 +02:00
  • 2f0ee84b9b
    server: bench: minor fixes (#10765) Pierrick Hymbert 2025-01-02 18:06:12 +01:00
  • 0da5d86026
    server : allow using LoRA adapters per-request (#10994) Xuan Son Nguyen 2025-01-02 15:05:18 +01:00
  • a45433ba20
    readme : add llama-swap to infrastructure section (#11032) Benson Wong 2025-01-01 23:14:54 -08:00
  • 0827b2c1da
    ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) Srihari-mcw 2024-12-31 19:53:33 +05:30
  • 45095a61bf
    server : clean up built-in template detection (#11026) Xuan Son Nguyen 2024-12-31 15:22:01 +01:00
  • 5896c65232
    server : add OAI compat for /v1/completions (#10974) Xuan Son Nguyen 2024-12-31 12:34:13 +01:00
  • bc7b1f8632
    convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) ymcki 2024-12-31 19:04:48 +08:00
  • 6e1531aca5
    common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) Peter 2024-12-31 11:46:06 +11:00
  • 716bd6dec3
    vulkan: optimize mul_mat for small values of N (#10991) Jeff Bolz 2024-12-30 11:27:11 -06:00
  • c250ecb315
    android : fix llama_batch free (#11014) ag2s20150909 2024-12-30 20:35:13 +08:00
  • a813badbbd
    vulkan: im2col and matmul optimizations for stable diffusion (#10942) Jeff Bolz 2024-12-29 03:16:34 -06:00
  • fdd2188912
    vulkan: Use push constant offset to handle misaligned descriptors (#10987) Jeff Bolz 2024-12-29 02:35:11 -06:00
  • f865ea149d
    server: added more docs for response_fields field (#10995) Isaac McFadyen 2024-12-28 10:09:19 -05:00
  • 16cdce7b68
    server : fix token duplication when streaming with stop strings (#10997) Alexey Parfenov 2024-12-28 15:08:54 +00:00