Commit Graph

  • 70680c48e5
    ggml : upgrade init_tensor API to return a ggml_status (#11854) master William Tambellini 2025-02-28 05:41:47 -08:00
  • c43a3e7996
    llama : add Phi-4-mini support (supersede #12099) (#12108) Xuan-Son Nguyen 2025-02-28 12:44:11 +01:00
  • 84d5f4bc19
    Update granite vision docs for 3.2 model (#12105) Alex Brooks 2025-02-28 04:31:47 -07:00
  • 438a83926a
    vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595) Rémy O 2025-02-28 09:42:52 +01:00
  • 9c42b1718c
    CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098) Johannes Gäßler 2025-02-28 09:26:43 +01:00
  • 05e6f5aad0
    ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064) Prashant Vithule 2025-02-28 13:06:12 +05:30
  • 673cfef9aa
    CANN: Fix build error with GCC 13 (#11990) hipudding 2025-02-28 15:23:47 +08:00
  • fbeda9002d
    vulkan: matmul dequantization improvements (#12015) Eve 2025-02-28 07:20:08 +00:00
  • 581650b7ca
    vulkan: improve im2col (#11826) Daniele 2025-02-28 06:52:51 +00:00
  • b95c8af37c
    cmake: Fix ggml backend dependencies and installation (#11818) Vladimir Vuksanovic 2025-02-27 08:42:48 +01:00
  • a800ae46da
    llava : add struct for FFI bindgen (#12079) Ting Lou 2025-02-26 22:26:52 +08:00
  • 69050a11be
    Refactor gguf scripts to improve metadata handling (#11909) Sigbjørn Skjæret 2025-02-26 14:04:48 +01:00
  • 3567ee3a94
    gguf-py: enable reading non-native endian files (#12081) Aleksei Nikiforov 2025-02-26 12:39:27 +01:00
  • 53e4db1012
    readme : update infra list (#9096) Kante Yin 2025-02-26 15:49:36 +08:00
  • d7cfe1ffe0
    docs: add docs/function-calling.md to lighten server/README.md's plight (#12069) Olivier Chafik 2025-02-25 18:52:56 +00:00
  • a82c9e7c23
    vulkan: fix assertion when qy_needs_dequant (#12068) Jeff Bolz 2025-02-25 09:30:21 -06:00
  • 401af80b54
    server: handle echo=false on /v1/completions (#12060) rhjdvsgsgks 2025-02-25 11:52:52 +00:00
  • c132239bfb
    add OP sigmoid (#12056) Judd 2025-02-25 19:32:20 +08:00
  • 393fca629e
    ggml-cpu: Fix build with sve (#12059) Molly Sophia 2025-02-25 19:28:22 +08:00
  • 61d4f39dfe
    vulkan: implement more backpropagation operators (#11914) Rémy O 2025-02-25 12:04:45 +01:00
  • 0b52745649
    server: support add_generation_prompt query param (#12062) Olivier Chafik 2025-02-25 10:40:22 +00:00
  • 4d1051a40f
    Add Doc for Converting Granite Vision -> GGUF (#12006) Alex Brooks 2025-02-25 02:46:05 -07:00
  • 3e9a2860e9
    llama : expose llama_model_n_head_kv in the API (#11997) Vitali Lovich 2025-02-25 01:29:33 -08:00
  • 58d07a8043
    metal : copy kernels for quant to F32/F16 conversions (#12017) Gian-Carlo Pascutto 2025-02-25 10:27:58 +01:00
  • 34a846b584
    opencl: fix for small models (#11950) lhez 2025-02-24 13:47:07 -08:00
  • 7a2c913e66
    llava : Add Granite Vision Support (#11794) Alex Brooks 2025-02-24 09:09:51 -07:00
  • 08d5986290
    [SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035) Neo Zhang Jianyu 2025-02-24 22:33:23 +08:00
  • 651adf4b66
    gguf_convert_endian.py: implement byteswapping for q4_k and q6_k (#11349) Aleksei Nikiforov 2025-02-24 12:27:01 +01:00
  • 8303e8b0fb
    SYCL: Fix GGML_SYCL_DEBUG macro (#11995) Akarshan Biswas 2025-02-24 15:48:25 +05:30
  • 7ad0779f5d
    run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041) Florent BENOIT 2025-02-23 18:15:51 +01:00
  • f777a73e18
    Some llama-run cleanups (#11973) Eric Curtin 2025-02-23 13:14:32 +00:00
  • af7747c95a
    ggml-cpu: Support s390x SIMD Instruction Set (#12019) Aaron Teo 2025-02-23 05:39:24 +08:00
  • a28e0d5eb1
    CUDA: app option to compile without FlashAttention (#12025) Johannes Gäßler 2025-02-22 20:44:34 +01:00
  • 36c258ee92
    llava: build clip image from pixels (#11999) Ting Lou 2025-02-22 22:28:28 +08:00
  • f3e64859ed
    ci : fix arm upload artifacts (#12024) Georgi Gerganov 2025-02-22 15:03:00 +02:00
  • 5fa07c2f93
    CUDA: optimize FA for GQA + large batches (#12014) Johannes Gäßler 2025-02-22 12:20:17 +01:00
  • 335eb04a91
    ci : Build on Github-hosted arm64 runners (#12009) Rohanjames1997 2025-02-22 04:48:57 -06:00
  • cf756d6e0a
    server : disable Nagle's algorithm (#12020) Georgi Gerganov 2025-02-22 12:46:31 +02:00
  • d70908421f
    cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000) Gian-Carlo Pascutto 2025-02-22 09:43:24 +01:00
  • de8b5a3624
    llama.swiftui : add "Done" dismiss button to help view (#11998) Daniel Bevenius 2025-02-22 06:33:29 +01:00
  • 51f311e057
    llama : skip loading unused tensors (#12004) Georgi Gerganov 2025-02-21 18:33:18 +02:00
  • 586d5fe6eb
    doc: update contributing guidelines [no ci] (#11969) Johannes Gäßler 2025-02-21 12:51:25 +01:00
  • ecc8e3aeff
    CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984) PureJourney 2025-02-21 19:21:05 +08:00
  • 0b3863ff95
    MUSA: support ARM64 and enable dp4a .etc (#11843) Bodhi 2025-02-21 15:46:23 +08:00
  • ee02ad02c5
    clip : fix visual encoders with no CLS (#11982) Alex Brooks 2025-02-20 23:11:03 -07:00
  • c392e5094d
    server (webui): Fix Premature Submission During IME Conversion (#11971) momonga 2025-02-21 03:43:22 +09:00
  • c5d91a7400
    ggml-cpu: Add CPU backend support for KleidiAI library (#11390) Charles Xu 2025-02-20 14:06:51 +01:00
  • 4806498bf1
    ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917) Prashant Vithule 2025-02-20 15:38:32 +05:30
  • 0d559580a0
    run : add --chat-template-file (#11961) Michael Engel 2025-02-20 09:35:11 +01:00
  • d04e7163c8
    doc: add links to ggml examples [no ci] (#11958) Johannes Gäßler 2025-02-19 20:45:17 +01:00
  • d07c621393
    common : add llama.vim preset for Qwen2.5 Coder (#11945) Daniel Bevenius 2025-02-19 12:29:52 +01:00
  • abd4d0bc4f
    speculative : update default params (#11954) Georgi Gerganov 2025-02-19 13:29:42 +02:00
  • 9626d9351a
    llama : fix indentation in llama-grammar [no ci] (#11943) Daniel Bevenius 2025-02-19 06:16:23 +01:00
  • b58934c183
    server : (webui) Enable communication with parent html (if webui is in iframe) (#11940) igardev 2025-02-19 00:01:44 +02:00
  • 63e489c025
    tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900) Olivier Chafik 2025-02-18 18:03:23 +00:00
  • 63ac128563
    server : add TEI API format for /rerank endpoint (#11942) Xuan-Son Nguyen 2025-02-18 14:21:41 +01:00
  • 5137da7b8c
    scripts: corrected encoding when getting chat template (#11866) (#11907) MoonRide303 2025-02-18 10:30:16 +01:00
  • 09aaf4f1f5
    docs : Fix duplicated file extension in test command (#11935) xiaobing318 2025-02-18 17:12:49 +08:00
  • 73e2ed3ce3
    CUDA: use async data loading for FlashAttention (#11894) Johannes Gäßler 2025-02-17 14:03:24 +01:00
  • f7b1116af1
    update release requirements (#11897) Eve 2025-02-17 11:20:23 +00:00
  • c4d29baf32
    server : fix divide-by-zero in metrics reporting (#11915) Antoine Viallon 2025-02-17 11:25:12 +01:00
  • 2eea03d86a
    vulkan: implement several ops relevant for ggml_opt (#11769) Rémy O 2025-02-17 07:55:57 +01:00
  • 0f2bbe6564
    server : bump httplib to 0.19.0 (#11908) Xuan-Son Nguyen 2025-02-16 18:11:22 +01:00
  • fe163d5bf3
    common : Fix a typo in help (#11899) standby24x7 2025-02-16 18:51:13 +09:00
  • 818a340ea8
    ci : fix (again) arm64 build fails (#11895) Xuan-Son Nguyen 2025-02-16 10:36:39 +01:00
  • bf42a23d0a
    vulkan: support multi/vision rope, and noncontiguous rope (#11902) Jeff Bolz 2025-02-16 01:52:23 -06:00
  • c2ea16f260
    metal : fix the crash caused by the lack of residency set support on Intel Macs. (#11904) Hale Chan 2025-02-16 14:50:26 +08:00
  • 6dde178248
    scripts: fix compare-llama-bench commit hash logic (#11891) Johannes Gäßler 2025-02-15 20:23:22 +01:00
  • fc10c38ded
    examples: fix typo in imatrix/README.md (#11884) 708-145 2025-02-15 20:03:30 +01:00
  • 22885105a6
    metal : optimize dequant q6_K kernel (#11892) Adrian Kretz 2025-02-15 19:39:20 +01:00
  • c2cd24fbfd
    readme : add notice about new package registry (#11890) Georgi Gerganov 2025-02-15 20:29:56 +02:00
  • 68ff663a04
    repo : update links to new url (#11886) Georgi Gerganov 2025-02-15 16:40:57 +02:00
  • f355229692
    server: fix type promotion typo causing crashes w/ --jinja w/o tools (#11880) Olivier Chafik 2025-02-15 10:11:36 +00:00
  • fc1b0d0936
    vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528) Rémy O 2025-02-15 09:01:40 +01:00
  • 89daa2564f
    llguidance build fixes for Windows (#11664) Michał Moskal 2025-02-14 12:46:08 -08:00
  • 300907b211
    opencl: Fix rope and softmax (#11833) lhez 2025-02-14 11:12:23 -08:00
  • 94b87f87b5
    cuda : add ampere to the list of default architectures (#11870) Diego Devesa 2025-02-14 15:33:52 +01:00
  • dbc2ec59b5
    docker : drop to CUDA 12.4 (#11869) Georgi Gerganov 2025-02-14 14:48:40 +02:00
  • 3d68f034da
    llama : add completion for --chat-template-file (#11860) Daniel Bevenius 2025-02-14 11:16:56 +01:00
  • 38e32eb6a0
    ggml: optimize some vec dot functions for LoongArch ASX (#11842) Jinyang He 2025-02-14 16:54:27 +08:00
  • a4f011e8d0
    vulkan: linux builds + small subgroup size fixes (#11767) Eve 2025-02-14 02:59:40 +00:00
  • a7b8ce2260
    llama-bench : fix unexpected global variable initialize sequence issue (#11832) theraininsky 2025-02-14 09:13:43 +08:00
  • 04045bb842
    readme : minor Georgi Gerganov 2025-02-14 00:16:56 +02:00
  • 8a8c4ceb60
    llamafile: use member variable instead of constant for iq4nlt (#11780) Jeffrey Morgan 2025-02-13 09:05:04 -08:00
  • c1f958c038
    server : (docs) Update wrong tool calling example (#11809) Reza Rahemtola 2025-02-13 17:22:44 +01:00
  • c48f630d1c
    llama : add --completion-bash option (#11846) Daniel Bevenius 2025-02-13 14:46:59 +01:00
  • bd6e55bfd3
    musa: bump MUSA SDK version to rc3.1.1 (#11822) R0CKSTAR 2025-02-13 20:28:18 +08:00
  • c7f460ab88
    server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607) Olivier Chafik 2025-02-13 10:05:16 +00:00
  • 27e8a23300
    sampling: add Top-nσ sampler (#11223) Vinesh Janarthanan 2025-02-13 00:45:57 -06:00
  • e4376270d9
    llama.cpp: fix warning message (#11839) Oleksandr Kuvshynov 2025-02-13 01:25:34 -05:00
  • 3e69319772
    llama : update llama_decode_internal ref [no ci] (#11840) Daniel Bevenius 2025-02-13 07:07:51 +01:00
  • a394039db0
    ggml-cpu : add chunking support to mul_mat_id (#11666) Diego Devesa 2025-02-13 01:02:38 +01:00
  • be3bbd6215
    ggml : x2 speed for WASM by optimizing SIMD (#11453) Xuan-Son Nguyen 2025-02-13 00:33:45 +01:00
  • 31afcbee0e
    server : (webui) Give copy button back to all message bubbles (#11814) Woof Dog 2025-02-12 22:47:11 +00:00
  • 5c4284d57b
    HIP: Remove GCN from list of devices that avoid MMQ (#11831) uvos 2025-02-12 22:25:28 +01:00
  • bfd11a2344
    Fix: Compile failure due to Microsoft STL breaking change (#11836) JC 2025-02-12 20:36:11 +00:00
  • 0fb77f821f
    sync : ggml Georgi Gerganov 2025-02-12 21:46:02 +02:00
  • e598697d63
    HIP: Switch to std::vector in rocblas version check (#11820) uvos 2025-02-12 17:25:03 +01:00
  • fef0cbeadf
    cleanup: fix compile warnings associated with gnu_printf (#11811) bandoti 2025-02-12 10:06:53 -04:00
  • 748ee9fe93
    ggml : fix multi-threaded clamp_f32 (#11824) Richard 2025-02-12 13:57:33 +00:00