Commit Graph

  • 0e64591e82
    swiftui : enable stream updating (#7754) Shuichi Tsutsumi 2024-06-21 14:30:58 +09:00
  • b1ef562bc1
    requirements : Bump torch and numpy for python3.12 (#8041) Hamdoud Hakem 2024-06-20 21:01:15 +01:00
  • 17b291a6a5
    convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040) Hamdoud Hakem 2024-06-20 20:59:59 +01:00
  • abd894ad96
    common: fix warning (#8036) Johannes Gäßler 2024-06-20 16:40:13 +02:00
  • de391e4c80
    [SYCL] Fix windows build and inference (#8003) luoyu-intel 2024-06-20 13:19:05 +00:00
  • d50f8897a7
    CUDA: stream-k decomposition for MMQ (#8018) Johannes Gäßler 2024-06-20 14:39:21 +02:00
  • 2075a66a96
    metal : fix ggml_metal_supports_op for BF16 (#8021) Michael de Gans 2024-06-19 22:32:01 -07:00
  • ba58993152
    server : fix smart slot selection (#8020) sasha0552 2024-06-19 23:57:10 +00:00
  • a7854743c5
    un-ignore build-info.cmake and build-info.sh (#7996) Michael de Gans 2024-06-19 13:10:42 -07:00
  • 9c77ec1d74
    ggml : synchronize threads using barriers (#7993) slaren 2024-06-19 15:04:15 +02:00
  • a04a953cab
    codecov : remove (#8004) Georgi Gerganov 2024-06-19 13:04:36 +03:00
  • 623494a478
    [SYCL] refactor (#6408) Meng, Hengyu 2024-06-19 09:11:51 +08:00
  • 37bef89433
    tokenizer : BPE fixes (#7530) jaime-m-p 2024-06-18 18:40:52 +02:00
  • 91c188d6c2
    Only use FIM middle token if it exists (#7648) Sigbjørn Skjæret 2024-06-18 14:19:45 +02:00
  • 84f6de17f6
    Fix no gcc pragma on Windows (#7751) jojorne 2024-06-18 09:18:32 -03:00
  • 61665277af
    Allow compiling with CUDA without CUDA runtime installed (#7989) Ulrich Drepper 2024-06-18 14:00:14 +02:00
  • b96f9afb0d
    chore: clean useless beam search param (#7985) Frank Mai 2024-06-18 15:11:40 +08:00
  • 1193778105
    readme : update UI list (#7943) Abheek Gulati 2024-06-17 23:57:41 -07:00
  • 5326bcceeb
    ggml : sync Georgi Gerganov 2024-06-18 09:50:45 +03:00
  • e6ecc2be47
    whisper : use ggml_backend_sched (whisper/2239) Georgi Gerganov 2024-06-18 09:37:20 +03:00
  • a94e6ff877
    update: support Qwen2-57B-A14B (#7835) Ștefan-Gabriel Muscalu 2024-06-17 22:08:46 +03:00
  • 5b6da18750
    Make updates to type cast based on compiler instead of OS (#7851) Srihari-mcw 2024-06-17 23:53:17 +05:30
  • 7c26775adb
    llama : disable FA if KV head size do not match (#7982) Georgi Gerganov 2024-06-17 19:40:01 +03:00
  • b473e95084
    Add Nix and Flox install instructions (#7899) Bryan Honof 2024-06-17 17:37:55 +02:00
  • 99052cd227
    sched : offload_op also requires supports_op (#7977) slaren 2024-06-17 16:51:42 +02:00
  • c637fcd34d
    fix: divide 0 exception in mamba (#7932) Frank Mai 2024-06-17 22:11:08 +08:00
  • 6a2f0b3474
    Implement non-mapped async IO for CUDA on Windows. (#7896) Markus Tavenrath 2024-06-17 16:10:15 +02:00
  • 21be9cab94
    rpc : fix load/store misaligned addresses (#7948) Georgi Gerganov 2024-06-17 11:09:20 +03:00
  • 006167aaf6
    gguf-dump.py: add --markdown dump output (#7853) Brian 2024-06-17 15:25:20 +10:00
  • df68d4fa5d
    [SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7946) Neo Zhang 2024-06-17 11:17:07 +08:00
  • 43b35e38ba
    Add support for sqrt on CUDA (#7953) Calvin Laurenson 2024-06-16 15:23:04 -07:00
  • 19b7a836f6
    cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231) Georgi Gerganov 2024-06-11 17:39:01 +03:00
  • b5fcf8ef5c
    ggml : fix and optimize ppc64le (ggml/849) Hong Bo PENG 2024-06-16 16:53:11 +08:00
  • 398105ff43
    ggml : remove duplicate include of ggml-common.h (ggml/853) Daniel Bevenius 2024-06-16 10:51:18 +02:00
  • bc6c457fa3
    flake.lock: Update (#7951) Georgi Gerganov 2024-06-16 19:16:21 +03:00
  • 52399254b3
    unicode : avoid char32_t (#7957) Georgi Gerganov 2024-06-16 14:51:40 +03:00
  • 6fe1c62741
    readme : update UI list [no ci] (#7958) hopkins385 2024-06-16 13:51:18 +02:00
  • cddaf028ad
    ggml : fix handling of zero blocks in IQ quants (#7955) Georgi Gerganov 2024-06-16 14:50:12 +03:00
  • c8a82194a8
    github : update pr template Georgi Gerganov 2024-06-16 10:46:51 +03:00
  • 7c7836d9d4
    Vulkan Shader Refactor, Memory Debugging Option (#7947) 0cc4m 2024-06-16 07:17:31 +02:00
  • 0c7b3595b9
    Add cvector-generator example (#7514) Xuan Son Nguyen 2024-06-15 18:53:40 +02:00
  • 7b2f4a7d19
    [SYCL] remove global variables (#7710) Meng, Hengyu 2024-06-15 14:05:10 +08:00
  • f8ec8877b7
    ci : fix macos x86 build (#7940) olexiyb 2024-06-14 20:28:34 +03:00
  • 76d66ee0be
    CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921) Johannes Gäßler 2024-06-14 18:41:49 +02:00
  • 66ef1ceedf
    metal : utilize max shared memory for mul_mat_id (#7935) Georgi Gerganov 2024-06-14 17:14:09 +03:00
  • e65bbf606c
    llama-bench : fix RPC indication (#7936) Radoslav Gerganov 2024-06-14 16:47:41 +03:00
  • 6fcd1331ef
    llama : more checks before assuming FIM tokens (#7644) Sigbjørn Skjæret 2024-06-14 12:20:04 +02:00
  • 41b9260f18
    convert : add Poro-34B-chat tokenizer support (#7713) Elaine 2024-06-14 13:16:49 +03:00
  • 172c825684
    rpc : fix ggml_backend_rpc_supports_buft() (#7918) Radoslav Gerganov 2024-06-13 15:18:44 +03:00
  • a55eb1bf0f
    readme : Remove outdated instructions from README.md (#7914) [no ci] Galunid 2024-06-13 09:42:41 +02:00
  • f578b86b21
    move BLAS to a separate backend (#6210) slaren 2024-06-13 03:11:35 +02:00
  • 1c641e6aac
    build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) Olivier Chafik 2024-06-13 00:41:52 +01:00
  • 963552903f
    CUDA: fix broken oob check for FA vec f32 kernel (#7904) Johannes Gäßler 2024-06-12 17:41:51 +02:00
  • a9cae48003
    tests : add non-cont unary tests (#7857) Georgi Gerganov 2024-06-12 16:00:22 +03:00
  • bfaa676b08
    ggml : improve ggml_is_contiguous logic (#7856) Georgi Gerganov 2024-06-12 15:24:20 +03:00
  • 704a35b183
    server : restore numeric prompts (#7883) Georgi Gerganov 2024-06-12 14:42:29 +03:00
  • dcf752707d
    update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894) Meng, Hengyu 2024-06-12 17:05:35 +08:00
  • f2b5764beb
    Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci] Patrice Ferlet 2024-06-12 03:18:16 +02:00
  • 73bac2b11d
    vulkan: select only one device for single gpu with multiple drivers (#7582) k.h.lai 2024-06-12 03:26:05 +08:00
  • ef52d1d16a
    Update Vulkan RoPE implementation (#7818) 0cc4m 2024-06-11 21:20:29 +02:00
  • 14f83526cd
    fix broken link in pr template (#7880) [no ci] Deven Mistry 2024-06-11 12:18:58 -04:00
  • 6fe42d073f
    github: move PR template to .github/ root (#7868) Brian 2024-06-12 00:43:41 +10:00
  • 148995e5e5
    llama-bench: more compact markdown tables (#7879) Johannes Gäßler 2024-06-11 14:45:40 +02:00
  • 4bfe50f741
    tests : check the Python version (#7872) Georgi Gerganov 2024-06-11 10:10:20 +03:00
  • bdcb8f4222
    CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) Johannes Gäßler 2024-06-11 08:26:07 +02:00
  • c2ce6c47e4
    fix CUDA CI by using a windows-2019 image (#7861) slaren 2024-06-11 07:59:20 +02:00
  • b61eb9644d
    json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) Olivier Chafik 2024-06-11 02:22:57 +01:00
  • 396b18dfec
    json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) Olivier Chafik 2024-06-11 01:00:30 +01:00
  • 864a99e7a0
    cmake : fix CMake requirement for CUDA (#7821) Jared Van Bortel 2024-06-10 18:32:10 -04:00
  • fd5ea0f897
    ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • c28a83902c
    examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • d9da0e4986
    server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00
  • 1f0dabda8d
    CUDA: use tensor cores for MMQ (#7676) Johannes Gäßler 2024-06-10 11:45:13 +02:00
  • af4ae502dd
    use the correct SYCL context for host USM allocations (#7777) Ben Ashbaugh 2024-06-10 02:21:31 -07:00
  • 10ceba354a
    flake.lock: Update (#7838) Georgi Gerganov 2024-06-10 02:04:50 +03:00
  • e95beeb1fc
    imatrix : handle partial entries (#7833) Georgi Gerganov 2024-06-09 20:19:35 +03:00
  • 57bf62ce7c
    docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700) Nicolás Pérez 2024-06-09 11:24:29 -04:00
  • 3e2ee44315
    server: do not remove whitespace at the start of a completion chunk (#7830) mgroeber9110 2024-06-09 12:50:35 +02:00
  • 42b53d192f
    CUDA: revise q8_1 data layout for mul_mat_q (#7824) Johannes Gäßler 2024-06-09 09:42:25 +02:00
  • 2decf57bc6
    convert-hf : set the model name based on cli arg, if present (#7693) sasha0552 2024-06-09 06:39:25 +00:00
  • 5795b94182
    convert-hf : match model part name prefix and suffix (#7687) compilade 2024-06-08 22:47:25 -04:00
  • ed9f252118
    gguf-py : decouple adding metadata from writing in GGUFWriter (#7827) compilade 2024-06-08 22:34:29 -04:00
  • fe1e3917cf
    Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808) slaren 2024-06-09 01:43:39 +02:00
  • d4d915d351
    url: save -mu downloads to new cache location (#7826) Olivier Chafik 2024-06-08 20:21:08 +01:00
  • 7a16ce7db2
    server : smart slot selection using Longest Common Prefix (#7728) sasha0552 2024-06-08 07:50:31 +00:00
  • da799b4189
    vulkan : reuse parent extra for views (#7806) slaren 2024-06-07 19:47:49 +02:00
  • c00fad71e5
    gguf-split : change binary multi-byte units to decimal (#7803) Christian Zhou-Zheng 2024-06-07 08:56:01 -04:00
  • 27615f5ab2
    cmake : fix BUILD_SHARED_LIBS=ON build (#7784) intelmatt 2024-06-07 05:15:07 -07:00
  • 7027b27d76
    server: update cache_prompt documentation [no ci] (#7745) Johannes Gäßler 2024-06-07 11:15:49 +02:00
  • a5cabd7649
    server : do not get prompt in infill mode (#7286) woodx 2024-06-07 15:09:45 +08:00
  • d5c938cd77
    [SYCL] fix softmax r2r result wrong issue (#7811) pengxin99 2024-06-07 14:28:26 +08:00
  • c9ee7118d5
    check for nans in imatrix and quantize (#7807) slaren 2024-06-07 08:01:29 +02:00
  • ee459f40f6
    server : fix --threads-http arg (#7801) Georgi Gerganov 2024-06-06 19:19:59 +03:00
  • f83351f9a6
    imatrix : migrate to gpt_params (#7771) Georgi Gerganov 2024-06-06 16:30:58 +03:00
  • ad675e1c67
    Added support for . (any character) token in grammar engine. (#6467) Clint Herron 2024-06-06 06:08:52 -07:00
  • a143c04375
    README minor fixes (#7798) [no ci] Mattheus Chediak 2024-06-06 09:17:54 -03:00
  • 55b2d0849d
    grammars: x{min,max} repetition operator (#6640) Olivier Chafik 2024-06-06 10:07:06 +01:00
  • f5d7b268ec
    llama : add jina v2 base code (#7596) Joan Fontanals 2024-06-06 09:22:41 +02:00
  • 2d08b7fbb4
    docker : build only main and server in their images (#7782) slaren 2024-06-06 07:19:49 +02:00
  • d67caea0d6
    docker : add openmp lib (#7780) slaren 2024-06-06 07:17:21 +02:00