Commit Graph

  • 4314e56c4f
    server : use lambda instead of std::bind (#11507) Daniel Bevenius 2025-01-30 11:05:00 +01:00
  • 496e5bf46b
    server : (docs) added response format for /apply-template [no ci] (#11503) Isaac McFadyen 2025-01-30 04:11:53 -05:00
  • 7919256c57
    readme : reference examples relative links (#11505) Guspan Tanadi 2025-01-30 12:58:02 +07:00
  • e0449763a4
    server : update json snippets in README.md [no ci] (#11492) Daniel Bevenius 2025-01-30 05:48:14 +01:00
  • eb7cf15a80
    server : add /apply-template endpoint for additional use cases of Minja functionality (#11489) Nigel Bosch 2025-01-29 12:45:44 -06:00
  • 66ee4f297c
    vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) Rémy Oudompheng 2025-01-29 18:29:39 +01:00
  • e51c47b401
    server : update auto gen files comments [no ci] (#11484) Daniel Bevenius 2025-01-29 16:34:18 +01:00
  • 2711d0215f
    vulkan: Catch pipeline creation failure and print an error message (#11436) Jeff Bolz 2025-01-29 09:26:50 -06:00
  • f0d4b29edf
    Parse https://ollama.com/library/ syntax (#11480) Eric Curtin 2025-01-29 12:23:10 +01:00
  • 815857791d
    sync : ggml Georgi Gerganov 2025-01-29 11:25:29 +02:00
  • 1a0e87d291
    ggml : add option to not print stack on abort (ggml/1081) William Tambellini 2025-01-23 11:59:08 -08:00
  • d2e518e9b4
    ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) issixx 2025-01-17 21:29:08 +09:00
  • b636228c0a
    embedding : enable --no-warmup option (#11475) Daniel Bevenius 2025-01-29 09:38:54 +01:00
  • 325afb370a
    llama: fix missing k_cache store for rwkv6qwen2 (#11445) Molly Sophia 2025-01-29 12:07:21 +08:00
  • 794fe23f29
    cmake: add hints for locating ggml on Windows using Llama find-package (#11466) Emreerdog 2025-01-29 02:22:06 +03:00
  • cf8cc856d7
    server : Fixed wrong function name in llamacpp server unit test (#11473) peidaqi 2025-01-28 16:03:42 -07:00
  • d0c08040b6
    ci : fix build CPU arm64 (#11472) Xuan-Son Nguyen 2025-01-29 00:02:56 +01:00
  • be5ef7963f
    HIP: Supress transformation warning in softmax.cu uvos 2025-01-28 23:06:32 +01:00
  • cae9fb4361
    HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080) Nikita Sarychev 2025-01-28 07:42:20 -08:00
  • 7fee2889e6
    Add github protocol pulling and http:// (#11465) Eric Curtin 2025-01-28 15:45:41 +01:00
  • d7d1eccacc
    docker: allow installing pip packages system-wide (#11437) Nuno 2025-01-28 15:17:25 +01:00
  • 4bf3119d61
    cmake : don't fail on GGML_CPU=OFF (#11457) someone13574 2025-01-28 09:15:34 -05:00
  • f643120bad
    docker: add perplexity and bench commands to full image (#11438) Nuno 2025-01-28 11:42:32 +01:00
  • 6e84b0ab8e
    SYCL : SOFTMAX F16 mask support and other fixes (#11261) Akarshan Biswas 2025-01-28 15:26:58 +05:30
  • 2b8525d5c8
    Handle missing model in CLI parameters for llama-run (#11399) Michael Engel 2025-01-28 09:32:40 +01:00
  • a4417ddda9
    Add new hf protocol for ollama (#11449) Eric Curtin 2025-01-27 19:36:10 +01:00
  • d6d24cd9ed
    AMD: parse the architecture as supplied by gcnArchName (#11244) Haus1 2025-01-27 08:58:17 -05:00
  • a5203b4465
    llama : minor fixes for up llama load model speed (#11448) lexasub 2025-01-27 17:42:09 +04:00
  • df984e0147
    llama: refactor llama_decode_impl (#11381) Johannes Gäßler 2025-01-27 12:07:12 +01:00
  • acd38efee3
    metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441) Ihar Hrachyshka 2025-01-27 02:41:59 -05:00
  • caf773f249
    docker : fix ARM build and Vulkan build (#11434) Xuan Son Nguyen 2025-01-26 22:45:32 +01:00
  • 178a7eb952
    metal : use residency sets (#11427) Georgi Gerganov 2025-01-26 20:06:16 +02:00
  • 6f53d8a6b4
    docker: add missing vulkan library to base layer and update to 24.04 (#11422) Nuno 2025-01-26 18:22:43 +01:00
  • 19f65187cb
    cmake: add ggml find package (#11369) bandoti 2025-01-26 12:07:48 -04:00
  • 1d8ee06000
    rpc: fix register position (#11424) Frank Mai 2025-01-26 23:20:34 +08:00
  • 2cc9b8c32c
    readme : update hot topics Georgi Gerganov 2025-01-26 14:30:15 +02:00
  • f35726c2fb
    build: apply MSVC /bigobj option to c/cpp files only (#11423) Jeff Bolz 2025-01-25 20:10:03 -06:00
  • 4a75d19376
    vulkan: compile shaders on-demand (#11406) Jeff Bolz 2025-01-25 15:29:57 -06:00
  • 26771a1491
    Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) uvos 2025-01-25 21:01:12 +01:00
  • ca6baf76c1
    build: add /bigobj to MSVC build (#11407) Jeff Bolz 2025-01-25 11:26:37 -06:00
  • 6e264a905b
    docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) Diego Devesa 2025-01-25 17:22:41 +01:00
  • 49b0e3cec4
    server : fix cleaning up stream task (#11418) Xuan Son Nguyen 2025-01-25 16:36:44 +01:00
  • 20a758155b
    docker : fix CPU ARM build (#11403) Diego Devesa 2025-01-25 15:22:29 +01:00
  • 00c24acb2a
    ci : fix line breaks on windows builds (#11409) Georgi Gerganov 2025-01-25 13:36:48 +02:00
  • 466ea66f33
    CANN: Add Ascend CANN build ci (#10217) jiahao su 2025-01-25 07:26:01 +08:00
  • 5f0db9522f
    hip : Add hipGraph and VMM support to ROCM (#11362) uvos 2025-01-25 00:02:23 +01:00
  • c5d9effb49
    CUDA: fix FP16 cuBLAS GEMM (#11396) Johannes Gäßler 2025-01-24 21:02:43 +01:00
  • 9fbadaef4f
    rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356) uvos 2025-01-24 17:50:49 +01:00
  • 9755129c27
    release : pack /lib in the packages (#11392) Georgi Gerganov 2025-01-24 18:41:30 +02:00
  • a07c2c8a52
    docs : Update readme to build targets for local docker build (#11368) Jafar Uruç 2025-01-24 13:30:13 +00:00
  • 8137b4bb2b
    CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380) Johannes Gäßler 2025-01-24 12:38:31 +01:00
  • 1af6945eb0
    cmake : avoid -march=native when reproducible build is wanted (#11366) Bernhard M. Wiedemann 2025-01-24 12:21:35 +01:00
  • 01f37edf1a
    Update llama-run README.md (#11386) Eric Curtin 2025-01-24 09:39:24 +00:00
  • c07e87f38b
    server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364) stduhpf 2025-01-24 09:02:38 +01:00
  • 564804b79b
    tests: fix some mul_mat test gaps (#11375) Jeff Bolz 2025-01-23 14:51:24 -06:00
  • 05f63cc9ee
    Update documentation (#11373) Eric Curtin 2025-01-23 20:04:31 +00:00
  • f7fb43cd0b
    Add -ngl (#11372) Eric Curtin 2025-01-23 16:16:18 +00:00
  • 5845661640
    server : add more clean up when cancel_tasks is called (#11340) Xuan Son Nguyen 2025-01-23 13:56:05 +01:00
  • f211d1dc10
    Treat hf.co/ prefix the same as hf:// (#11350) Eric Curtin 2025-01-23 10:38:20 +00:00
  • 955a6c2d91
    Vulkan-run-test: fix mmq_wg_denoms (#11343) amd-dwang 2025-01-23 15:14:28 +08:00
  • 1971adf55e
    vulkan: sort shaders for more deterministic binary (#11315) Jeff Bolz 2025-01-23 01:07:50 -06:00
  • 5245729e33
    vulkan: fix diag_mask_inf (#11323) Jeff Bolz 2025-01-23 01:01:17 -06:00
  • 6152129d05
    main : update README documentation for batch size (#11353) Diego Devesa 2025-01-22 19:22:20 +01:00
  • 16d3df7ab0
    readme : add plugin links (#11355) Georgi Gerganov 2025-01-22 19:44:26 +02:00
  • 12c2bdf2de
    server : fix draft context not being released (#11354) Diego Devesa 2025-01-22 17:44:40 +01:00
  • c64d2becb1
    minja: sync at 0f5f7f2b37 (#11352) Olivier Chafik 2025-01-22 16:16:27 +00:00
  • 96f4053934
    Adding logprobs to /v1/completions (#11344) Jiří Podivín 2025-01-22 12:51:32 +01:00
  • a94f3b2727
    common: utils to split / join / repeat strings (from json converter) (#11342) Olivier Chafik 2025-01-22 09:51:44 +00:00
  • 3e3357fd77
    llava : support Minicpm-omni (#11289) tc-mb 2025-01-22 15:35:48 +08:00
  • 6171c9d258
    Add Jinja template support (#11016) Olivier Chafik 2025-01-21 13:18:51 +00:00
  • e28245f35f
    export-lora : fix tok_embd tensor (#11330) Xuan Son Nguyen 2025-01-21 14:07:12 +01:00
  • 6da5bec81c
    rpc : better caching of the base buffer pointer (#11331) Radoslav Gerganov 2025-01-21 15:06:41 +02:00
  • 2e2f8f093c
    linenoise.cpp refactoring (#11301) Eric Curtin 2025-01-21 09:32:35 +00:00
  • 2139667ec4
    metal : fix out-of-bounds write (#11314) Georgi Gerganov 2025-01-21 08:48:13 +02:00
  • 80d0d6b4b7
    common : add -hfd option for the draft model (#11318) Georgi Gerganov 2025-01-20 22:29:43 +02:00
  • aea8ddd516
    vulkan: fix coopmat2 validation failures (#11284) Jeff Bolz 2025-01-20 10:38:32 -06:00
  • 9f7add1cde
    examples : fix add_special conditions (#11311) Georgi Gerganov 2025-01-20 16:36:08 +02:00
  • 90d987b105
    mmap: add include for cerrno (#11296) Christopher Nielsen 2025-01-20 09:02:43 -05:00
  • a4251edd6f
    cmake: fix shell command quoting in build-info script (#11309) Michael Podvitskiy 2025-01-20 15:02:15 +01:00
  • ec7f3ac9ab
    llama : add support for Deepseek-R1-Qwen distill model (#11310) Xuan Son Nguyen 2025-01-20 14:35:07 +01:00
  • ef6dada60c
    cont : fix whitespaces (#11305) Georgi Gerganov 2025-01-20 09:29:32 +02:00
  • ae3c1db2f9
    llama : re-add LLM_ARCH_PHIMOE (#11305) Kyle Bruene 2025-01-20 01:21:01 -06:00
  • 92bc493917
    tests : increase timeout when sanitizers are enabled (#11300) Georgi Gerganov 2025-01-19 20:22:30 +02:00
  • b9daaffe02
    simple-chat : fix BOS being added to each message (#11278) Georgi Gerganov 2025-01-19 18:12:09 +02:00
  • 99487b57d4
    SYCL: Introducing memory host pool (#11251) Nicolò Scipione 2025-01-19 14:33:34 +01:00
  • a1649cc13f
    Adding linenoise.cpp to llama-run (#11252) Eric Curtin 2025-01-18 14:42:31 +00:00
  • 4dd34ff831
    cmake : add sanitizer flags for llama.cpp (#11279) Georgi Gerganov 2025-01-18 16:18:15 +02:00
  • f30f099228
    server : implement cancellable request (#11285) Xuan Son Nguyen 2025-01-18 14:12:05 +01:00
  • f26c874179
    scripts : restore hf.sh (#11288) Georgi Gerganov 2025-01-18 13:18:32 +02:00
  • 6390a998bf
    tts : add guide tokens support (#11186) LostRuins Concedo 2025-01-18 18:20:57 +08:00
  • 44e18ef939
    vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281) Jeff Bolz 2025-01-18 02:26:50 -06:00
  • 3edfa7d375
    llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) codezjx 2025-01-17 20:57:56 +08:00
  • 667d72846c
    rpc : early register backend devices (#11262) Radoslav Gerganov 2025-01-17 10:57:09 +02:00
  • a133566d34
    vocab : fix double-eos check (#11273) Georgi Gerganov 2025-01-17 09:28:00 +02:00
  • 960ec65273
    llama : fix deprecation message: vocabable -> vocab (#11269) David Renshaw 2025-01-17 02:12:01 -05:00
  • 7a689c415e
    README : added kalavai to infrastructure list (#11216) musoles 2025-01-17 00:10:49 +00:00
  • bd38ddea01
    vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166) Jeff Bolz 2025-01-16 15:47:10 -06:00
  • 466300fe14
    vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206) Jeff Bolz 2025-01-16 15:23:49 -06:00
  • 206bc53422
    vulkan: optimize coopmat2 q2_k dequant function (#11130) Jeff Bolz 2025-01-16 15:16:39 -06:00
  • 4dbc8b9cb7
    llama : add internlm3 support (#11233) RunningLeon 2025-01-17 02:10:38 +08:00