Commit Graph

  • 08c5ee87e4
    llama : remove deprecated API (#5770) Georgi Gerganov 2024-02-28 18:43:38 +02:00
  • 78aacf3634
    awq-py : remove (#5768) Georgi Gerganov 2024-02-28 17:36:53 +02:00
  • 8c0e8f4e73
    sync : ggml Georgi Gerganov 2024-02-28 11:17:32 +02:00
  • 2774b0c974
    add google magika inference example (ggml/748) slaren 2024-02-25 20:41:35 +01:00
  • 5f70671856
    Introduce backend GUIDs (ggml/743) UEXTM.com 2024-02-24 11:27:36 -05:00
  • a693bea1e6
    server : hit Ctrl+C twice to exit (#5734) Xuan Son Nguyen 2024-02-28 09:55:37 +01:00
  • adcb12a9ba
    llama : fix non-quantization of expert gating tensors (#5754) compilade 2024-02-28 03:52:56 -05:00
  • 177628bfd8
    llama : improve BERT tokenization (#5740) Douglas Hanley 2024-02-28 02:51:11 -06:00
  • 6c4416868d
    readme : add link to LLaVA 1.6 models (#5758) Daniel Bevenius 2024-02-28 09:39:39 +01:00
  • efc72253f7
    server : add "/chat/completions" alias for "/v1/...` (#5722) Jorge A 2024-02-28 01:39:15 -07:00
  • 7c4263d426
    ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) Kawrakow 2024-02-28 10:37:02 +02:00
  • cb49e0f8c9
    Attempt to fix android build (#5752) Kawrakow 2024-02-27 19:16:49 +02:00
  • 0becb22ac0
    IQ4_XS: a 4.25 bpw quantization (#5747) Kawrakow 2024-02-27 16:34:24 +02:00
  • c24a2a6e60
    cuda : replace remaining shfl_xor with calls to warp_reduce functions (#5744) Engininja2 2024-02-27 07:22:45 -06:00
  • 1f30b7a9f1
    ggml-quants : fix avx2 iq1_s vec_dot when compiled with gcc (#5742) Engininja2 2024-02-27 06:50:18 -06:00
  • 9d533a77d0
    llama : fix defrag bugs + add parameter (#5735) Georgi Gerganov 2024-02-27 14:35:51 +02:00
  • cbbd1efa06
    Makefile: use variables for cublas (#5689) le.chang 2024-02-27 10:03:06 +08:00
  • b11a93df41
    fix server hangs on empty prompt (#5733) Xuan Son Nguyen 2024-02-26 23:15:48 +01:00
  • a33e6a0d2a
    Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721) Kawrakow 2024-02-26 18:28:38 +02:00
  • 47bb7b48c7
    CUDA: fix DEBUG_CUDA_MALLOC (#5729) Johannes Gäßler 2024-02-26 15:36:38 +01:00
  • c4d7f81786
    readme : update ui list (#5731) Artem 2024-02-26 17:15:28 +03:00
  • e849078c6e
    [SYCL] Add support for soft_max ALiBi (#5639) AidanBeltonS 2024-02-26 14:02:11 +00:00
  • 67fd33132f
    unicode : reuse iterator (#5726) Georgi Gerganov 2024-02-26 14:02:12 +02:00
  • 4804215cb8
    server: CI fix trailing space (#5728) Pierrick Hymbert 2024-02-26 11:41:34 +01:00
  • 8a533f0d90
    server: CI tests reduce build matrix (#5725) Pierrick Hymbert 2024-02-26 09:56:10 +01:00
  • 269de86ba0
    llama : fix Gemma rope type (#5691) Georgi Gerganov 2024-02-26 08:30:17 +02:00
  • c393733988 flake.lock: Update github-actions[bot] 2024-02-25 00:17:11 +00:00
  • e3965cf35a
    server: tests - slow inference causes timeout on the CI (#5715) Pierrick Hymbert 2024-02-25 22:48:33 +01:00
  • 8b350356b2
    server: docs - refresh and tease a little bit more the http server (#5718) Pierrick Hymbert 2024-02-25 21:46:29 +01:00
  • bf08e00643
    llama : refactor k-shift implementation + KV defragmentation (#5691) Georgi Gerganov 2024-02-25 22:12:24 +02:00
  • f7625019c5
    server : fix crash when system prompt is bigger than batch size (#5714) compilade 2024-02-25 13:43:50 -05:00
  • abbabc5e51
    ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711) Radosław Gryta 2024-02-25 19:43:00 +01:00
  • f1a98c5254
    make : fix nvcc version is empty (#5713) kwin1412 2024-02-26 00:46:49 +08:00
  • 7d548a1827
    readme : add Msty to UI list (#5618) Ashok Gelal 2024-02-25 10:57:34 -05:00
  • 930b178026
    server: logs - unified format and --log-format option (#5700) Pierrick Hymbert 2024-02-25 13:50:32 +01:00
  • d52d7819b8
    server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708) Pierrick Hymbert 2024-02-25 13:49:43 +01:00
  • 1289408817
    cmake : fix compilation for Android armeabi-v7a (#5702) Radosław Gryta 2024-02-25 11:53:11 +01:00
  • ab336a9d5e
    code : normalize enum names (#5697) Georgi Gerganov 2024-02-25 12:09:09 +02:00
  • 69917dfa55
    py : fix StableLM conversion after config.json changes (#5703) Anas Ahouzi 2024-02-25 10:54:04 +01:00
  • 9e359a4f47
    server: continue to update other slots on embedding concurrent request (#5699) Pierrick Hymbert 2024-02-24 19:16:04 +01:00
  • 4c4cb30736
    IQ3_S: a much better alternative to Q3_K (#5676) Kawrakow 2024-02-24 16:23:52 +02:00
  • 525213d2f5
    server: init functional tests (#5566) Pierrick Hymbert 2024-02-24 12:28:55 +01:00
  • fd43d66f46
    server : add KV cache quantization options (#5684) AlpinDale 2024-02-23 19:31:54 +00:00
  • 54fbcd2ce6
    convert : fix missing ftype for gemma (#5690) Jared Van Bortel 2024-02-23 13:39:14 -05:00
  • 15499eb942
    mpt : do not duplicate token_embd.weight on disk (#5670) Jared Van Bortel 2024-02-22 17:05:23 -05:00
  • 96633eeca1
    gemma : use more bits for the token_embd.weight tensor (#5650) Georgi Gerganov 2024-02-22 23:23:46 +02:00
  • 847eedbdb2
    py : add Gemma conversion from HF models (#5647) Georgi Gerganov 2024-02-22 23:22:48 +02:00
  • 7e4f339c40
    ggml : always define ggml_fp16_t as uint16_t (#5666) Georgi Gerganov 2024-02-22 23:21:39 +02:00
  • 334f76fa38
    sync : ggml Georgi Gerganov 2024-02-22 23:21:05 +02:00
  • efd56b1c21
    ggml : 32-bit arm compat (whisper/1891) Georgi Gerganov 2024-02-22 18:31:40 +02:00
  • 201294ae17
    nix: init singularity and docker images (#5056) Someone 2024-02-22 19:44:10 +00:00
  • 5a9e2f60ba
    py : minor fixes (#5668) Georgi Gerganov 2024-02-22 20:13:25 +02:00
  • 373ee3fbba
    Add Gemma chat template (#5665) Xuan Son Nguyen 2024-02-22 19:10:21 +01:00
  • 4cb4d8b22d
    workflows: nix: hardcode cachix ids, build unconditionally (#5663) Someone 2024-02-22 16:32:09 +00:00
  • 3a03541ced
    minor : fix trailing whitespace (#5638) Georgi Gerganov 2024-02-22 13:54:03 +02:00
  • 56d03d92be
    readme : update hot topics Georgi Gerganov 2024-02-22 10:35:54 +02:00
  • a46f50747b
    server : fallback to chatml, add AlphaMonarch chat template (#5628) Xuan Son Nguyen 2024-02-22 09:33:24 +01:00
  • c5688c6250
    server : clarify some params in the docs (#5640) Alexey Parfenov 2024-02-22 08:27:32 +00:00
  • 4ef245a92a
    mpt : add optional bias tensors (#5638) Dat Quoc Nguyen 2024-02-22 18:15:13 +10:00
  • 973053d8b0
    llama : fix loading models with shared tok_embd and output (#5651) slaren 2024-02-22 00:42:09 +01:00
  • 7c8bcc11dc
    Add docs for llama_chat_apply_template (#5645) Xuan Son Nguyen 2024-02-22 00:31:00 +01:00
  • 7fe4678b02
    llama : fix session save/load with quantized KV (#5649) slaren 2024-02-21 22:52:39 +01:00
  • ba2135ccae
    gemma : allow offloading the output tensor (#5646) slaren 2024-02-21 22:18:23 +01:00
  • 89febfed93
    examples : do not assume BOS when shifting context (#5622) Jared Van Bortel 2024-02-21 10:33:54 -05:00
  • 5022cf242d
    sync : ggml Georgi Gerganov 2024-02-21 16:52:39 +02:00
  • 1ecea255eb
    server: health: fix race condition on slots data using tasks queue (#5634) Pierrick Hymbert 2024-02-21 15:47:48 +01:00
  • a00a35cef9
    readme : add LocalAI to the availables UI (#5629) Ettore Di Giacinto 2024-02-21 15:39:10 +01:00
  • eccd7a26dd
    sync : ggml (#5633) Georgi Gerganov 2024-02-21 16:17:10 +02:00
  • c14f72db9c
    readme : update hot topics Georgi Gerganov 2024-02-21 15:39:54 +02:00
  • cc6cac08e3
    llava : add --skip-unknown to 1.6 convert.py (#5632) Daniel Bevenius 2024-02-21 14:36:57 +01:00
  • 580111d42b
    llama : add gemma model (#5631) postmasters 2024-02-21 05:08:22 -08:00
  • 88c46cbdac
    [SYCL] conext add name (#5624) Meng, Hengyu 2024-02-21 17:52:06 +08:00
  • a14679cc30
    IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590) Kawrakow 2024-02-21 11:39:52 +02:00
  • 6560bed3f0
    server : support llava 1.6 (#5553) CJ Pais 2024-02-20 11:07:22 -08:00
  • 06bf2cf8c4
    make : fix debug build with CUDA (#5616) slaren 2024-02-20 20:06:17 +01:00
  • 4ed8e4fbef
    llava : add explicit instructions for llava-1.6 (#5611) Daniel Bevenius 2024-02-20 18:30:27 +01:00
  • 9c405c9f9a
    Server: use llama_chat_apply_template (#5593) Xuan Son Nguyen 2024-02-20 15:58:27 +01:00
  • 5207b3fbc5
    readme : update UI list (#5605) Dane Madsen 2024-02-20 21:00:23 +11:00
  • 8dbbd75754
    metal : add build system support for embedded metal library (#5604) Haoxiang Fei 2024-02-19 22:58:36 -11:00
  • c0a8c6db37
    server : health endpoint configurable failure on no slot (#5594) Pierrick Hymbert 2024-02-20 08:48:19 +01:00
  • b9111bd209
    Update ggml_sycl_op_mul_mat_vec_q (#5502) AidanBeltonS 2024-02-20 07:01:25 +00:00
  • 633782b8d9 nix: now that we can do so, allow MacOS to build Vulkan binaries Mathijs de Bruin 2024-02-13 20:28:02 +00:00
  • 22f83f0c38 Enable Vulkan MacOS CI 0cc4m 2024-02-10 22:18:33 +01:00
  • bb9dcd560a Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init() 0cc4m 2024-02-14 20:57:17 +01:00
  • f50db6ae0b Add check for VK_KHR_portability_enumeration for MoltenVK support 0cc4m 2024-02-10 22:14:52 +01:00
  • d8c054517d Add preprocessor checks for Apple devices. Mathijs de Bruin 2024-02-06 14:39:22 +00:00
  • 42f664a382 Resolve ErrorIncompatibleDriver with Vulkan on MacOS. Mathijs de Bruin 2024-02-03 18:00:11 +00:00
  • 5dde540897 Allow for Vulkan build with Accelerate. Mathijs de Bruin 2024-02-03 17:56:46 +00:00
  • 40c3a6c1e1
    cuda : ignore peer access already enabled errors (#5597) slaren 2024-02-19 23:40:26 +01:00
  • f24ed14ee0
    make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598) Jared Van Bortel 2024-02-19 15:54:12 -05:00
  • 9d679f0fcc
    examples : support minItems/maxItems in JSON grammar converter (#5039) nopperl 2024-02-19 14:14:07 +00:00
  • 1387cf60f7
    llava : remove extra cont (#5587) Georgi Gerganov 2024-02-19 15:23:17 +02:00
  • 6fd413791a llava : replace ggml_cpy with ggml_cont slaren 2024-02-19 14:02:36 +01:00
  • 337c9cbd52 sync : ggml Georgi Gerganov 2024-02-19 14:54:21 +02:00
  • a3145bdc30 ggml-alloc : apply ggml/731 Georgi Gerganov 2024-02-19 14:53:48 +02:00
  • 890559ab28 metal : option to embed MSL source into compiled binary (whisper/1842) Didzis Gosko 2024-02-11 16:41:41 +02:00
  • d0e3ce51f4
    ci : enable -Werror for CUDA builds (#5579) Georgi Gerganov 2024-02-19 14:45:41 +02:00
  • 68a6b98b3c
    make : fix CUDA build (#5580) Georgi Gerganov 2024-02-19 13:41:51 +02:00
  • 70d45af0ef
    readme : fix typo in README-sycl.md (#5353) valiray 2024-02-19 02:37:10 -08:00
  • 13e2c771aa
    cmake : remove obsolete sycl compile flags (#5581) Abhilash Majumder 2024-02-19 14:45:18 +05:30