Commit Graph

  • 198b1ec611
    ggml-cpu: Fix duplicate MATMUL_INT8 (#11817) Weizhao Ouyang 2025-02-12 20:22:58 +08:00
  • c3d6af7cd2
    CUDA: fix CUDART_VERSION checks (#11821) Johannes Gäßler 2025-02-12 13:16:39 +01:00
  • 369be5598a
    llama : fix typo in llama-grammar.h [no ci] (#11816) Daniel Bevenius 2025-02-12 08:40:01 +01:00
  • 4078c77f98
    docs: add OpenCL (#11697) lhez 2025-02-11 14:04:13 -08:00
  • 90e4dba461
    Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (#11803) Sheldon Robinson 2025-02-11 10:55:45 -05:00
  • a18f481f99
    server : use common_token_to_piece instead of common_detokenize (#11740) Daniel Bevenius 2025-02-11 14:06:45 +01:00
  • b9ab0a4d0b
    CUDA: use arch list for compatibility check (#11775) Johannes Gäßler 2025-02-11 00:17:22 +01:00
  • 7b891bdc86
    fix: typos in documentation files (#11791) Maxim Evtush 2025-02-10 23:21:31 +01:00
  • 81732619fd
    docs: utilize the forward slash (/) as the path separator for Unix-like systems (#11770) jason_w 2025-02-11 06:17:48 +08:00
  • 507f9174fe
    server : (webui) introduce conversation branching + idb storage (#11792) Xuan-Son Nguyen 2025-02-10 21:23:17 +01:00
  • 19b392d58d
    llama-mmap: fix missing include (#11796) Wilken Gottwalt 2025-02-10 19:58:18 +01:00
  • 0893e0114e
    server : correct signal handler (#11795) Xuan-Son Nguyen 2025-02-10 18:03:28 +01:00
  • d7b31a9d84
    sync: minja (a72057e519) (#11774) Olivier Chafik 2025-02-10 09:34:09 +00:00
  • 9ac3457b39
    Update README.md [no ci] (#11781) pascal-lc 2025-02-10 16:05:57 +08:00
  • c2a67efe38
    vulkan: Make Vulkan optional at runtime (#11493). (#11494) Danny Milosavljevic 2025-02-10 07:17:21 +01:00
  • b044a0fe3c
    vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592) Wagner Bruna 2025-02-10 03:08:22 -03:00
  • 19d3c8293b
    There's a better way of clearing lines (#11756) Eric Curtin 2025-02-09 10:34:49 +00:00
  • 98f6b0fd1e
    vulkan: account for lookup tables when checking shared memory size (#11502) Jeff Bolz 2025-02-09 01:43:51 -06:00
  • 55ac8c7791
    server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759) Xuan-Son Nguyen 2025-02-08 21:54:50 +01:00
  • e6e6583199
    server : (webui) increase edit textarea size (#11763) Woof Dog 2025-02-08 19:09:55 +00:00
  • aaa5505307
    server : minor log updates (#11760) Georgi Gerganov 2025-02-08 18:08:43 +02:00
  • bdcf8b6a56
    cont : fix mmap flag print (#11699) Georgi Gerganov 2025-02-08 16:49:38 +02:00
  • 4d3465c5ae
    ggml: Fix data race in ggml threadpool (#11736) Karol Kontny 2025-02-08 15:30:53 +01:00
  • d80be897ac
    CUDA: fix min. version for movmatrix (#11751) Johannes Gäßler 2025-02-08 10:46:07 +01:00
  • 3ab410f55f
    readme : update front-end framework (#11753) Nikolaos Pothitos 2025-02-08 11:43:04 +02:00
  • 0cf867160c
    server : (webui) fix numeric settings being saved as string (#11739) Xuan-Son Nguyen 2025-02-08 10:42:34 +01:00
  • d2fe216fb2
    Make logging more verbose (#11714) Eric Curtin 2025-02-07 14:42:46 +00:00
  • ed926d8833
    llama : fix defrag logic (#11707) Georgi Gerganov 2025-02-07 16:05:34 +02:00
  • 2d219b389e
    vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729) Christian Fillion 2025-02-07 08:55:47 -05:00
  • 333820d749
    llama : fix progress dots (#11730) magicse 2025-02-07 15:48:47 +02:00
  • c026ba3c23
    vulkan: print shared memory size (#11719) Jeff Bolz 2025-02-07 04:26:03 -06:00
  • 7ee953a64a
    llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727) Christian Fillion 2025-02-07 04:33:27 -05:00
  • ec3bc8270b
    SYCL: remove XMX info from print devices (#11712) Akarshan Biswas 2025-02-07 14:57:53 +05:30
  • b7552cfcbc
    common : add default embeddings presets (#11677) Daniel Bevenius 2025-02-07 09:15:22 +01:00
  • 225bbbfa39
    ggml : optimize and build warning fix for LoongArch (#11709) Jinyang He 2025-02-07 15:38:31 +08:00
  • 855cd0734a
    llama : fix old glm4 models (#11670) tv1wnd 2025-02-06 22:48:51 +01:00
  • 8a59053f63
    sync : ggml Georgi Gerganov 2025-02-06 21:23:03 +02:00
  • 1d20e53c40
    rpc: fix known RCE in rpc-server (ggml/1103) Patrick Peng 2025-02-06 09:29:13 -05:00
  • 2fb3c32a16
    server : (webui) migrate project to ReactJS with typescript (#11688) Xuan-Son Nguyen 2025-02-06 17:32:29 +01:00
  • 9ab42dc722
    docs: update fedora cuda guide for 12.8 release (#11393) Tei Home 2025-02-06 20:16:15 +08:00
  • 194b2e69f8
    SYCL: Adjust support condition for norm operators (#11674) Akarshan Biswas 2025-02-06 17:12:35 +05:30
  • 9dd7a0390f
    llama : add log about loading model tensors (#11699) Georgi Gerganov 2025-02-06 13:41:37 +02:00
  • c0d4843225
    build : fix llama.pc (#11658) Adrien Gallouët 2025-02-06 12:08:13 +01:00
  • 8d4d2be143
    ggml : fix LoongArch compile error with 128-bit SIMD (#11701) junchao-zhao 2025-02-06 17:20:00 +08:00
  • 2c6c8df56d
    vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521) Jeff Bolz 2025-02-06 00:15:30 -06:00
  • 8a7e3bf17a
    vulkan: initial support for IQ4_XS quantization (#11501) Rémy O 2025-02-06 07:09:59 +01:00
  • 1b598b3058
    vulkan: use smaller combined allocations to avoid fragmentation (#11551) Jeff Bolz 2025-02-06 00:02:18 -06:00
  • 902368a06b
    metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690) Charles Duffy 2025-02-05 19:52:31 -06:00
  • c3db0480bb
    readme : add link to Autopen under UIs (#11684) Matvey Soloviev 2025-02-06 01:55:25 +01:00
  • d774ab3acc
    metal : adjust support conditions for norm operators (#11671) Georgi Gerganov 2025-02-05 10:57:42 +02:00
  • fa62da9b2d
    CUDA: support for mat. mul. with ne03 != ne13 (#11656) Johannes Gäßler 2025-02-05 08:58:31 +01:00
  • 1ec208083c
    llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) SAMI 2025-02-05 14:45:40 +07:00
  • 9f4cc8f8d3
    sync: minja (#11641) Olivier Chafik 2025-02-05 01:00:12 +00:00
  • fd08255d0d
    CUDA: non-contiguous (RMS) norm support (#11659) Johannes Gäßler 2025-02-04 22:21:42 +01:00
  • 3ec9fd4b77
    HIP: force max threads per block to be 1024 (#11621) fxzjshm 2025-02-05 02:18:38 +08:00
  • 3962fc1a79
    server : add try..catch to places not covered by set_exception_handler (#11620) Xuan-Son Nguyen 2025-02-04 18:25:42 +01:00
  • 1bef571f6a
    arg : list RPC devices first when using --list-devices (#11655) Radoslav Gerganov 2025-02-04 18:16:20 +02:00
  • db288b60cb
    tool-call: command r7b fix for normal responses (#11608) Olivier Chafik 2025-02-04 15:48:53 +00:00
  • 106045e7bb
    readme : add llm_client Rust crate to readme bindings (#11628) Shelby Jenkins 2025-02-04 05:20:55 -06:00
  • f117d84b48
    swift : fix llama-vocab api usage (#11645) Jhen-Jie Hong 2025-02-04 19:15:24 +08:00
  • 534c46b53c
    metal : use residency set for other platforms (#11648) Jhen-Jie Hong 2025-02-04 19:07:18 +08:00
  • 387a1598ca
    authors : update Georgi Gerganov 2025-02-04 13:04:10 +02:00
  • 7c9e0ca520
    sync : ggml Georgi Gerganov 2025-02-04 12:59:21 +02:00
  • 8f8290ada9
    cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) Christian Kastner 2025-02-04 00:17:15 +01:00
  • b34aedd558
    ci : do not stale-close roadmap issues Georgi Gerganov 2025-02-04 09:30:42 +02:00
  • cde3833239
    tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) Olivier Chafik 2025-02-03 23:49:27 +00:00
  • b3451785ac
    server : (webui) revert hacky solution from #11626 (#11634) Xuan-Son Nguyen 2025-02-04 00:10:52 +01:00
  • 1d1e6a90bc
    server : (webui) allow typing and submitting during llm response (#11626) Woof Dog 2025-02-03 22:16:27 +00:00
  • 5598f475be
    server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622) Daniel Bevenius 2025-02-03 16:45:38 +01:00
  • 8ec05832fa
    sync : ggml Georgi Gerganov 2025-02-03 14:57:08 +02:00
  • 21c84b5d2d
    CUDA: fix Volta FlashAttention logic (#11615) Johannes Gäßler 2025-02-03 13:25:56 +01:00
  • d92cb67e37
    server : (webui) Fix Shift+Enter handling (#11609) mashdragon 2025-02-03 09:42:55 +00:00
  • 6eecde3cc8
    HIP: fix flash_attn_stream_k_fixup warning (#11604) Johannes Gäßler 2025-02-02 23:48:29 +01:00
  • 396856b400
    CUDA/HIP: add support for selectable warp size to mmv (#11519) uvos 2025-02-02 22:40:09 +01:00
  • 4d0598e144
    HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601) uvos 2025-02-02 22:08:05 +01:00
  • 90f9b88afb
    nit: more informative crash when grammar sampler fails (#11593) Olivier Chafik 2025-02-02 19:58:34 +00:00
  • 864a0b67a6
    CUDA: use mma PTX instructions for FlashAttention (#11583) Johannes Gäßler 2025-02-02 19:31:09 +01:00
  • 84ec8a58f7
    Name colors (#11573) Eric Curtin 2025-02-02 16:14:48 +01:00
  • bfcce4d693
    tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585) Olivier Chafik 2025-02-02 09:25:38 +00:00
  • 69804487e0
    Fix exotic ci env that lacks ostringstream::str (#11581) Olivier Chafik 2025-02-02 09:10:15 +00:00
  • ff227703d6
    sampling : support for llguidance grammars (#10224) Michał Moskal 2025-02-01 23:55:32 -08:00
  • 0cec062a63
    llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) piDack 2025-02-02 15:48:46 +08:00
  • 53debe6f3c
    ci: use sccache on windows HIP jobs (#11553) Olivier Chafik 2025-02-01 18:22:38 +00:00
  • cfd74c86db
    sync: minja (418a2364b5) (#11574) Olivier Chafik 2025-02-01 12:24:51 +00:00
  • ecef206ccb
    Implement s3:// protocol (#11511) Eric Curtin 2025-02-01 11:30:54 +01:00
  • 5bbc7362cb
    ci: simplify cmake build commands (#11548) Olivier Chafik 2025-02-01 00:01:20 +00:00
  • aa6fb13213
    ci: use sccache on windows instead of ccache (#11545) Olivier Chafik 2025-01-31 17:12:40 +00:00
  • a83f528688
    tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539) Olivier Chafik 2025-01-31 14:15:25 +00:00
  • b1bcd309fc
    fix stop regression (#11543) Olivier Chafik 2025-01-31 13:48:31 +00:00
  • 5783575c9d
    Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) Olivier Chafik 2025-01-31 08:24:29 +00:00
  • 4a2b196d03
    server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) Olivier Chafik 2025-01-31 08:12:40 +00:00
  • 1bd3047a93
    common: Add missing va_end (#11529) Steve Grubb 2025-01-31 00:58:55 -05:00
  • a2df2787b3
    server : update help metrics processing/deferred (#11512) Daniel Bevenius 2025-01-31 06:04:53 +01:00
  • 553f1e46e9
    ci: ccache for all github worfklows (#11516) Olivier Chafik 2025-01-30 22:01:06 +00:00
  • 8b576b6c55
    Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) Olivier Chafik 2025-01-30 19:13:58 +00:00
  • 27d135c970 HIP: require at least HIP 5.5 uvos 2025-01-29 19:36:00 +01:00
  • 6af1ca48cb HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
  • c300e68ef4 CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
  • 3d804dec76
    sync: minja (#11499) Olivier Chafik 2025-01-30 10:30:27 +00:00
  • ffd0821c57
    vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) mgroeber9110 2025-01-30 11:10:59 +01:00