Commit Graph

  • dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) Johannes Gäßler 2024-08-31 14:35:42 +02:00
  • ba1cf846ed cann : fix doxy (ggml/0) Georgi Gerganov 2024-08-28 18:45:01 +03:00
  • d2d3200b38 cann : add Ascend NPU support (whisper/2336) Mengqing Cao 2024-08-09 20:21:56 +08:00
  • 51d964a4ef cuda : mark BF16 CONT as unsupported Georgi Gerganov 2024-08-28 17:08:03 +03:00
  • efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) Salvatore Mesoraca 2024-08-28 10:23:02 +02:00
  • fbb7fcffbc
    llama : set attrs of mislabelled EOT/EOM tokens (#9348) Kevin Gibbons 2024-09-07 22:51:00 -07:00
  • a5b5d9a101
    llama.android : fix build (#9350) Georgi Gerganov 2024-09-08 00:33:50 +03:00
  • f12295b8a9
    llama : fix empty ring buffer push (#9358) Georgi Gerganov 2024-09-08 00:33:33 +03:00
  • faf69d4237
    llama : sanitize invalid tokens (#9357) Georgi Gerganov 2024-09-08 00:33:13 +03:00
  • e536426ded
    llamafile : disable sgemm for batch-size 1 (#9330) Eve 2024-09-07 19:02:26 +00:00
  • 1b9ae5189c
    common : refactor arg parser (#9308) Xuan Son Nguyen 2024-09-07 20:43:51 +02:00
  • e32d0816ed
    ggml : always check bounds on get_rows operations (#9354) slaren 2024-09-07 20:23:07 +02:00
  • df270ef745
    llama : refactor sampling v2 (#9294) Georgi Gerganov 2024-09-07 15:16:19 +03:00
  • 947538acb8
    ggml : fix missing cpu_set_t on emscripten (#9336) Xuan Son Nguyen 2024-09-07 12:01:34 +02:00
  • 6c89eb0b47
    ci : disable rocm image creation (#9340) slaren 2024-09-07 09:48:54 +02:00
  • 9b2c24c099
    server : simplify state machine for slot (#9283) Xuan Son Nguyen 2024-09-06 23:21:29 +02:00
  • 134bc38ecf
    llama-bench : log benchmark progress (#9287) Aarni Koskela 2024-09-07 00:03:01 +03:00
  • 815b1fb20a
    batched-bench : add --output-format jsonl option (#9293) Aarni Koskela 2024-09-06 18:59:58 +03:00
  • 409dc4f8bb
    ggml : fix build break for the vulkan-debug (#9265) Changyeon Kim 2024-09-06 21:54:50 +09:00
  • 4a1411b4f1
    server : fix missing lock (#9334) Xuan Son Nguyen 2024-09-06 14:06:04 +02:00
  • 8ebe8ddebd
    Improve Vulkan shader build system (#9239) Markus Tavenrath 2024-09-06 08:56:17 +02:00
  • 9bc6db28d0
    ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) compilade 2024-09-05 21:48:47 -04:00
  • 32b2ec88bc
    Update build.yml (#9184) awatuna 2024-09-06 06:34:36 +08:00
  • 1031771faa
    CMake fix: host for msvc compiler can only be x86 or x64 (#8624) Michael Podvitskiy 2024-09-06 00:14:12 +02:00
  • 4db04784f9
    cuda : fix defrag with quantized KV (#9319) slaren 2024-09-05 11:13:11 +02:00
  • bdf314f38a
    llama-bench : fix NUL terminators in CPU name (#9313) slaren 2024-09-05 02:19:39 +02:00
  • 581c305186
    ggml : AVX2 support for Q4_0_8_8 (#8713) Srihari-mcw 2024-09-04 22:21:22 +05:30
  • 5910ea9427
    [SYCL] Fix DMMV dequantization (#9279) Ouadie EL FAROUKI 2024-09-04 16:26:33 +01:00
  • c8671ae282
    Fix broken links in docker.md (#9306) 杨朱 · Kiki 2024-09-04 19:45:28 +08:00
  • 82e3b03c11
    rpc : make RPC servers come first in the device list (#9296) Radoslav Gerganov 2024-09-04 11:08:32 +03:00
  • 9379d3cc17
    readme : rename result_format to response_format (#9300) Pascal Patry 2024-09-04 02:45:40 -04:00
  • 7605ae7daf
    flake.lock: Update (#9261) Georgi Gerganov 2024-09-04 02:36:43 +03:00
  • 8962422b1c
    llama-bench : add JSONL (NDJSON) output mode (#9288) Aarni Koskela 2024-09-03 20:58:54 +03:00
  • b69a480af4
    readme : refactor API section + remove old hot topics Georgi Gerganov 2024-09-03 10:00:36 +03:00
  • 48baa61ecc
    server : test script : add timeout for all requests (#9282) Xuan Son Nguyen 2024-09-02 22:08:38 +02:00
  • f1485161e5
    src: make tail invalid when kv cell is intersection for mamba (#9249) Zhenwei Jin 2024-09-03 01:53:23 +08:00
  • 048de848ee
    docker : fix missing binaries in full-cuda image (#9278) slaren 2024-09-02 18:11:13 +02:00
  • f771d064a9
    ggml : add pthread includes on FreeBSD (#9258) yuri@FreeBSD 2024-09-02 08:25:30 -07:00
  • 6e7d133a5f
    server : refactor multitask handling (#9274) Xuan Son Nguyen 2024-09-02 17:11:51 +02:00
  • b60074f1c2
    llama-cli : remove duplicated log message (#9275) Guoliang Hua 2024-09-02 20:36:43 +08:00
  • 9c1ba55733
    build(nix): Package gguf-py (#5664) Tushar 2024-09-02 16:51:01 +05:30
  • c6d4cb4655
    llama : minor style Georgi Gerganov 2024-09-02 11:52:04 +03:00
  • 8f1d81a0b6
    llama : support RWKV v6 models (#8980) Molly Sophia 2024-09-01 22:38:17 +08:00
  • a47667cff4 nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook Echo Nolan 2024-08-22 17:19:14 -04:00
  • ea5d7478b1
    sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908) Srihari-mcw 2024-08-31 13:50:35 +05:30
  • 49271efbaf
    llama : fix typo in xcda_array_view comment [no ci] (#9132) Daniel Bevenius 2024-08-31 09:50:22 +02:00
  • 0ab30f8d82
    llama : fix llama_split_mode enum values in main_gpu document (#9057) Sutou Kouhei 2024-08-31 03:08:10 +09:00
  • cddae4884c
    Correct typo run_llama2.sh > run-llama2.sh (#9149) 蕭澧邦 2024-08-30 20:10:01 +08:00
  • 7ea8d80d53
    llava : the function "clip" should be int (#9237) tc-mb 2024-08-30 13:21:57 +08:00
  • 42c76d1358
    Threadpool: take 2 (#8672) Faisal Zaghloul 2024-08-29 19:20:53 -04:00
  • 9f7d4bcf5c server : fix crash when error handler dumps invalid utf-8 json (#9195) Jan Boon 2024-08-27 18:28:06 +08:00
  • 1d1ccce676
    flake.lock: Update (#9162) Georgi Gerganov 2024-08-29 07:28:14 +03:00
  • 9fe94ccac9
    docker : build images only once (#9225) slaren 2024-08-28 17:28:00 +02:00
  • 66b039a501
    docker : update CUDA images (#9213) slaren 2024-08-28 13:20:36 +02:00
  • 20f1789dfb vulkan : fix build (#0) Georgi Gerganov 2024-08-27 22:10:58 +03:00
  • 231cff5f6f sync : ggml Georgi Gerganov 2024-08-27 22:01:45 +03:00
  • 3246fe84d7
    Fix minicpm example directory (#9111) Xie Yanbo 2024-08-27 20:33:08 +08:00
  • 78eb487bb0
    llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156) compilade 2024-08-27 06:09:23 -04:00
  • a77feb5d71
    server : add some missing env variables (#9116) Xuan Son Nguyen 2024-08-27 11:07:01 +02:00
  • 2e59d61c1b
    llama : fix ChatGLM4 wrong shape (#9194) CausalLM 2024-08-27 14:58:22 +08:00
  • 75e1dbbaab
    llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141) Carsten Kragelund Jørgensen 2024-08-27 08:53:40 +02:00
  • ad76569f8e
    common : Update stb_image.h to latest version (#9161) arch-btw 2024-08-26 22:58:50 -07:00
  • 7d787ed96c
    ggml : do not crash when quantizing q4_x_x with an imatrix (#9192) slaren 2024-08-26 19:44:43 +02:00
  • 06658ad7c3
    metal : separate scale and mask from QKT in FA kernel (#9189) Georgi Gerganov 2024-08-26 18:31:02 +03:00
  • fc18425b6a
    ggml : add SSM Metal kernels (#8546) Georgi Gerganov 2024-08-26 17:55:36 +03:00
  • 879275ac98
    tests : fix compile warnings for unreachable code (#9185) Georgi Gerganov 2024-08-26 16:30:25 +03:00
  • 7a3df798fc
    ci : add VULKAN support to ggml-ci (#9055) Georgi Gerganov 2024-08-26 12:19:39 +03:00
  • e5edb210cd
    server : update deps (#9183) Georgi Gerganov 2024-08-26 12:16:57 +03:00
  • 0c41e03ceb
    metal : gemma2 flash attention support (#9159) slaren 2024-08-26 11:08:59 +02:00
  • f12ceaca0c
    ggml-ci : try to improve build time (#9160) slaren 2024-08-26 11:03:30 +02:00
  • 436787f170
    llama : fix time complexity of string replacement (#9163) Justine Tunney 2024-08-25 23:09:53 -07:00
  • 93bc3839f9
    common: fixed not working find argument --n-gpu-layers-draft (#9175) Herman Semenov 2024-08-25 22:54:37 +00:00
  • f91fc5639b
    CUDA: fix Gemma 2 numerical issues for FA (#9166) Johannes Gäßler 2024-08-25 22:11:48 +02:00
  • e11bd856d5
    CPU/CUDA: Gemma 2 FlashAttention support (#8542) Johannes Gäßler 2024-08-24 21:34:59 +02:00
  • 8f824ffe8e
    quantize : fix typo in usage help of quantize.cpp (#9145) João Dinis Ferreira 2024-08-24 07:22:45 +01:00
  • 3ba780e2a8
    lora : fix llama conversion script with ROPE_FREQS (#9117) Xuan Son Nguyen 2024-08-23 12:58:53 +02:00
  • a07c32ea54
    llama : use F32 precision in GLM4 attention and no FA (#9130) piDack 2024-08-23 15:27:17 +08:00
  • 11b84eb457
    [SYCL] Add a space to supress a cmake warning (#9133) Akarshan Biswas 2024-08-22 19:39:47 +05:30
  • 1731d4238f
    [SYCL] Add oneDNN primitive support (#9091) luoyu-intel 2024-08-22 12:50:10 +08:00
  • a1631e53f6
    llama : simplify Mamba with advanced batch splits (#8526) compilade 2024-08-21 17:58:11 -04:00
  • fc54ef0d1c
    server : support reading arguments from environment variables (#9105) Xuan Son Nguyen 2024-08-21 11:04:34 +02:00
  • b40eb84895
    llama : support for falcon-mamba architecture (#9074) Younes Belkada 2024-08-21 12:06:36 +04:00
  • f63f603c87
    llava : zero-initialize clip_ctx structure fields with aggregate initialization 908) fairydreaming 2024-08-21 09:45:49 +02:00
  • 8455340b87
    llama : std::move llm_bigram_bpe from work_queue (#9062) Daniel Bevenius 2024-08-21 09:32:58 +02:00
  • 2f3c1466ff
    llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984) Changyeon Kim 2024-08-21 04:00:00 +09:00
  • 50addec9a5
    [SYCL] fallback mmvq (#9088) Meng, Hengyu 2024-08-20 23:50:17 +08:00
  • 4f8d19ff17
    [SYCL] Fix SYCL im2col and convert Overflow with Large Dims (#9052) zhentaoyu 2024-08-20 23:06:51 +08:00
  • 90db8146d5
    tests : add missing comma in grammar integration tests (#9099) fairydreaming 2024-08-20 11:09:55 +02:00
  • cfac111e2b
    cann: add doc for cann backend (#8867) wangshuai09 2024-08-19 16:46:38 +08:00
  • 1b6ff90ff8
    rpc : print error message when failed to connect endpoint (#9042) Radoslav Gerganov 2024-08-19 10:11:45 +03:00
  • 18eaf29f4c
    rpc : prevent crashes on invalid input (#9040) Radoslav Gerganov 2024-08-19 10:10:21 +03:00
  • 554b049068
    flake.lock: Update (#9068) Georgi Gerganov 2024-08-18 17:43:32 +03:00
  • 2339a0be1c
    tests : add integration test for lora adapters (#8957) ltoniazzi 2024-08-18 10:58:04 +01:00
  • 2fb9267887
    Fix incorrect use of ctx_split for bias tensors (#9063) Yoshi Suhara 2024-08-17 06:34:21 -07:00
  • 8b3befc0e2
    server : refactor middleware and /health endpoint (#9056) Xuan Son Nguyen 2024-08-16 17:19:05 +02:00
  • d565bb2fd5
    llava : support MiniCPM-V-2.6 (#8967) tc-mb 2024-08-16 21:34:41 +08:00
  • ee2984bdaf
    py : fix wrong input type for raw_dtype in ggml to gguf scripts (#8928) Farbod Bijary 2024-08-16 14:06:30 +03:30
  • c8ddce8560
    Fix inference example lacks required parameters (#9035) Aisuko 2024-08-16 19:08:59 +10:00
  • 23fd453544
    gguf-py : bump version from 0.9.1 to 0.10.0 (#9051) compilade 2024-08-16 02:36:11 -04:00
  • c679e0cb5c
    llama : add EXAONE model support (#9025) Minsoo Cheong 2024-08-16 15:35:18 +09:00