Commit Graph

  • 7672adeec7
    Fix encoding in python scripts (#7733) Galunid 2024-06-05 19:07:24 +02:00
  • 7d1a378b8f
    CUDA: refactor mmq, dmmv, mmvq (#7716) Johannes Gäßler 2024-06-05 16:53:00 +02:00
  • 2b3389677a
    ggml : refactor rope norm/neox (#7634) Georgi Gerganov 2024-06-05 11:29:20 +03:00
  • 9973e81c5c
    readme : remove -ins (#7759) arch-btw 2024-06-04 23:40:49 -07:00
  • c90dbe026b
    Fix per token atrributes bits (#7749) jaime-m-p 2024-06-05 01:26:14 +02:00
  • b90dc566c1
    Allow number of nodes in CUDA graph to change (#7738) agray3 2024-06-04 21:06:49 +01:00
  • 1442677f92
    common : refactor cli arg parsing (#7675) Georgi Gerganov 2024-06-04 21:23:39 +03:00
  • 554c247caf
    ggml : remove OpenCL (#7735) Georgi Gerganov 2024-06-04 21:23:20 +03:00
  • 0cd6bd3483
    llama : remove beam search (#7736) Georgi Gerganov 2024-06-04 21:23:05 +03:00
  • 5ca0944a15
    readme : remove obsolete Zig instructions (#7471) Georgi Gerganov 2024-06-04 19:43:01 +03:00
  • adc9ff3841
    llama-bench : allow using a different printer for stderr with -oe (#7722) slaren 2024-06-04 14:32:42 +02:00
  • 987d743d6b
    Improve hipBLAS support in CMake (#7696) Daniele 2024-06-04 12:09:15 +00:00
  • b226c1227b
    refine .gitignore (#7688) zhouwg 2024-06-04 19:21:26 +08:00
  • 3b38d48609
    Per token attributes (#7685) jaime-m-p 2024-06-04 09:17:17 +02:00
  • 6d1616944d
    ggml : prevent builds with -ffinite-math-only (#7726) Georgi Gerganov 2024-06-04 10:01:09 +03:00
  • bde7cd3cd9
    llama : offload to RPC in addition to other backends (#7640) Radoslav Gerganov 2024-06-03 20:03:26 +03:00
  • a5735e4426
    ggml : use OpenMP as a thread pool (#7606) Masaya, Kato 2024-06-04 00:14:15 +09:00
  • 0b832d53ba
    make: fix debug options not being applied to NVCC (#7714) Johannes Gäßler 2024-06-03 16:28:58 +02:00
  • 3d7ebf6312
    Vulkan Mixture of Experts (MoE) support (#7628) 0cc4m 2024-06-03 10:59:14 +02:00
  • a10cda58d3
    cmake : add pkg-config spec file for llama.cpp (#7702) Andy Tai 2024-06-03 01:06:24 -07:00
  • 6f28a333c1
    llama : MiniCPM support tied embeddings (#7664) zhangkaihuo 2024-06-03 15:49:30 +08:00
  • 549279d804
    llama : avoid double token-to-piece cache (#7654) Georgi Gerganov 2024-06-03 08:34:43 +03:00
  • 9e405b6e2e
    kompute : implement op_getrows_f32 (#6403) woachk 2024-06-03 07:32:16 +02:00
  • 3413ae2193
    fix bug introduced in using calloc (#7701) Dave Airlie 2024-06-03 07:59:54 +10:00
  • 1669810d7c
    flake.lock: Update (#7686) Georgi Gerganov 2024-06-03 00:13:12 +03:00
  • 7c4e5b7eae
    chore : add ignore rule for generated server themes (#7689) Austin 2024-06-02 13:39:08 -04:00
  • 9422c5e34b
    [SYCL] Update rpc-server.cpp to include SYCL backend (#7682) nickp27 2024-06-02 19:13:54 +10:00
  • e141ce624a
    Fix FlashAttention debug test, FP32 assert (#7684) Johannes Gäßler 2024-06-01 23:26:10 +02:00
  • 2e666832e6
    server : new UI (#7633) Yazan Agha-Schrader 2024-06-01 21:31:48 +02:00
  • 2ac95c9d56
    SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548) HanishKVC 2024-06-01 21:50:18 +05:30
  • 750f60c03e
    CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) Johannes Gäßler 2024-06-01 15:47:04 +02:00
  • 9b596417af
    CUDA: quantized KV support for FA vec (#7527) Johannes Gäßler 2024-06-01 08:44:14 +02:00
  • a323ec60af
    server : update js (#7670) Georgi Gerganov 2024-05-31 22:23:04 +03:00
  • 0515ad93f4
    convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660) Galunid 2024-05-31 17:42:33 +02:00
  • c8047d538f
    scripts: update compare_llama_bench.py [no ci] (#7673) Johannes Gäßler 2024-05-31 16:26:21 +02:00
  • 30e238b246
    Improve HIP compatibility (#7672) Daniele 2024-05-31 14:00:29 +00:00
  • 16926dff92
    readme : link homebrew discussion Georgi Gerganov 2024-05-31 15:04:58 +03:00
  • 0c27e6f62e
    ggml : fix loongson compile warnings (#7537) Georgi Gerganov 2024-05-31 14:17:10 +03:00
  • 2e32f874e6
    Somehow '**' got lost (#7663) Galunid 2024-05-31 10:24:41 +02:00
  • 1af511fc22
    Add convert.py removal to hot topics (#7662) Galunid 2024-05-31 10:09:20 +02:00
  • 0541f06296
    [no ci] docs: add aikit to readme (#7650) Sertaç Özercan 2024-05-30 16:57:16 -07:00
  • 9022c33646
    Fixed painfully slow single process builds. (#7326) JohnnyB 2024-05-30 21:32:38 +01:00
  • 5921b8f089
    llama : cache llama_token_to_piece (#7587) Georgi Gerganov 2024-05-30 19:01:41 +03:00
  • 5dcdf94676
    Fix conan badge display [no ci] (#7645) Martin Delille 2024-05-30 17:07:39 +02:00
  • 2e2340de17
    Add brew installation instruction to README [no ci] (#7616) Manuel 2024-05-30 16:58:15 +02:00
  • 7846540bd2
    readme : add Conan badge (#7638) Martin Delille 2024-05-30 14:52:50 +02:00
  • e6157f94c8
    github: add contact links to issues and convert question into research [no ci] (#7612) Brian 2024-05-30 21:55:36 +10:00
  • 9c4c9cc83f
    Move convert.py to examples/convert-legacy-llama.py (#7430) Galunid 2024-05-30 13:40:00 +02:00
  • 59b0d07766
    faster avx512 exp implementation (#7551) Chris Elrod 2024-05-30 07:32:55 -04:00
  • d5c05821f3
    ggml : fix loongarch build (O2 issue) (#7636) junchao-loongson 2024-05-30 17:30:10 +08:00
  • 972b555ab9
    README: explain parallel build [no ci] (#7618) Johannes Gäßler 2024-05-30 09:52:39 +02:00
  • 3854c9d07f
    [SYCL] fix intel docker (#7630) Meng, Hengyu 2024-05-30 14:19:08 +08:00
  • eb57fee51f
    gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627) Galunid 2024-05-30 02:10:40 +02:00
  • 55d62262a9
    metal : remove invalid asserts (#7617) Georgi Gerganov 2024-05-29 22:20:40 +03:00
  • 975ec63ff2
    metal : add missing asserts (#7617) Georgi Gerganov 2024-05-29 20:45:25 +03:00
  • fb76ec31a9
    ggml : fix YARN + add tests + add asserts (#7617) Georgi Gerganov 2024-05-29 20:17:31 +03:00
  • cce3dcffc5
    cuda : non-cont concat support (#7610) Georgi Gerganov 2024-05-29 15:38:26 +03:00
  • 210d99173d
    llama-bench : add support for the RPC backend (#7435) Radoslav Gerganov 2024-05-29 14:45:44 +03:00
  • 87bdf2a199
    ggml : use atomic_flag for critical section (#7598) slaren 2024-05-29 13:36:39 +02:00
  • 00281b7be3
    scripts : remove mpi remnants Georgi Gerganov 2024-05-29 14:31:18 +03:00
  • 2ab977282b
    sync : ggml Georgi Gerganov 2024-05-29 14:29:52 +03:00
  • 72de268bec
    ggml : restore ggml_rope_xpos_inplace (ggml/0) Georgi Gerganov 2024-05-26 18:35:23 +03:00
  • 0e8d8bfd6c
    Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605) Akarshan Biswas 2024-05-29 12:23:47 +05:30
  • 504f0c340f
    ggml : fix typo in ggml.c (#7603) zhouwg 2024-05-29 10:09:31 +08:00
  • b864b50ce5
    [SYCL] Align GEMM dispatch (#7566) Meng, Hengyu 2024-05-29 07:00:24 +08:00
  • 02c1ecad07
    Tokenizer WPM fixes (#7500) jaime-m-p 2024-05-28 21:46:34 +02:00
  • 6bd12ce409
    sycl : fix assert (#7563) Georgi Gerganov 2024-05-28 22:22:50 +03:00
  • 5442939fcc
    llama : support small Granite models (#7481) Giuseppe Scrivano 2024-05-28 20:49:49 +02:00
  • 56411a950f
    vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) k.h.lai 2024-05-29 01:25:08 +08:00
  • 2b737caae1
    rpc : resource management rework (#7562) Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • ee3dff6b8e
    Add support for DeepseekV2ForCausalLM (#7519) fairydreaming 2024-05-28 17:07:05 +02:00
  • edc29433fa
    tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00
  • 8b99e2aa66
    llama : handle unknown utf8 bytes (#7588) Georgi Gerganov 2024-05-28 13:55:35 +03:00
  • 271ff3fc44
    github: add refactor to issue template (#7561) Brian 2024-05-28 20:27:27 +10:00
  • e2b065071c
    [SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436) Neo Zhang 2024-05-28 17:53:37 +08:00
  • 0548a4187f
    ggml : generalize GGML_OP_CONCAT (#7563) Georgi Gerganov 2024-05-28 11:04:19 +03:00
  • 9335b969e8
    server: do not remove whitespace at the start of a completion chunk (#7524) mgroeber9110 2024-05-28 06:55:51 +02:00
  • c41767154e
    Markdownish code block fix (#7571) Nathan Epstein 2024-05-28 00:41:14 -04:00
  • 74b239b3d5
    llava : update clip.h (#7580) Ikko Eltociear Ashimine 2024-05-28 11:48:16 +09:00
  • 852aafb163
    update HIP_UMA #7399 (#7414) Djip007 2024-05-28 01:40:47 +02:00
  • 0136966daf
    adding in x64 targets to cmake presets (#7574) kunnis 2024-05-27 18:40:12 -05:00
  • 10b1e45876
    make: add --device-debug to NVCC debug flags (#7542) Johannes Gäßler 2024-05-27 19:34:40 +02:00
  • 197c00681b
    Allow multiple copy function pointers for CUDA graph kernel param updates (#7565) agray3 2024-05-27 18:33:42 +01:00
  • 95f84d5ce8
    Fix q_xxs using mul_mat_q (#7459) AidanBeltonS 2024-05-27 17:34:51 +01:00
  • 5487593bc7
    Add freq factors (#7495) AidanBeltonS 2024-05-27 13:34:09 +01:00
  • 1d8fca72ae
    metal : add GGML_OP_REPEAT kernels (#7557) Georgi Gerganov 2024-05-27 12:10:19 +03:00
  • 62bfef5194
    metal : disable FA kernel for HS=256 (#7556) Georgi Gerganov 2024-05-27 10:38:39 +03:00
  • eaf6e03174
    llama : add comments about experimental flags (#7544) Georgi Gerganov 2024-05-27 09:24:13 +03:00
  • d6ef0e77dd
    github: add self sorted issue ticket forms (#7543) Brian 2024-05-27 10:54:30 +10:00
  • dff451cfa1
    flake.lock: Update (#7540) Georgi Gerganov 2024-05-26 18:54:56 +03:00
  • d298382ad9
    main: replace --no-special with --special (#7534) Brian 2024-05-27 00:10:17 +10:00
  • 32a28217f4
    Fix aya-23 conversion scripts (#7539) Galunid 2024-05-26 16:02:34 +02:00
  • c429b33beb
    llama : add Smaug 70B support (#7402) Bartowski 2024-05-26 08:28:35 -04:00
  • 9146d36fe7
    Readme: add akx/ggify to tools (#1484) Aarni Koskela 2024-05-26 15:09:42 +03:00
  • b9adcbbf92
    SimpleChat Completion Mode flexibility and cleanup, Settings gMe, Optional sliding window (#7480) HanishKVC 2024-05-26 06:26:34 +05:30
  • 9588f196b1
    train : change default FA argument (#7528) Georgi Gerganov 2024-05-25 15:21:30 +03:00
  • 3cbd23ed88
    labeler: added Apple Metal detector (+Kompute) (#7529) Brian 2024-05-25 19:30:42 +10:00
  • 00c6390793
    main : don't print special tokens with --grammar (#6923) Justine Tunney 2024-05-25 05:04:03 -04:00
  • faa0e6979a
    ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433) Masaya, Kato 2024-05-25 17:42:31 +09:00
  • 9791f40258
    android : module (#7502) Elton Kola 2024-05-25 04:11:33 -04:00