Commit Graph

  • 3dfda05956
    llama : de-duplicate deepseek2 norm Georgi Gerganov 2024-07-15 14:10:39 +03:00
  • bda62d7999
    Vulkan MMQ Fix (#8479) 0cc4m 2024-07-15 09:38:52 +02:00
  • 090fca7a07
    pydantic : replace uses of __annotations__ with get_type_hints (#8474) compilade 2024-07-14 19:51:21 -04:00
  • aaab2419ea
    flake.lock: Update (#8475) Georgi Gerganov 2024-07-14 18:54:02 +03:00
  • 73cf442e7b
    llama : fix Gemma-2 Query scaling factors (#8473) Georgi Gerganov 2024-07-14 14:05:09 +03:00
  • e236528e76
    gguf_hash.py: Add sha256 (#8470) Brian 2024-07-14 16:47:14 +10:00
  • fa79495bb4
    llama : fix pre-tokenization of non-special added tokens (#8228) compilade 2024-07-13 23:35:10 -04:00
  • 17eb6aa8a9
    vulkan : cmake integration (#8119) bandoti 2024-07-13 13:12:39 -03:00
  • c917b67f06
    metal : template-ify some of the kernels (#8447) Georgi Gerganov 2024-07-13 18:32:33 +03:00
  • 4e24cffd8c
    server : handle content array in chat API (#8449) Georgi Gerganov 2024-07-12 14:48:15 +03:00
  • 6af51c0d96
    main : print error on empty input (#8456) Georgi Gerganov 2024-07-12 14:48:04 +03:00
  • f53226245f
    llama : suppress unary minus operator warning (#8448) Daniel Bevenius 2024-07-12 11:05:21 +02:00
  • c3ebcfa148
    server : ensure batches are either all embed or all completion (#8420) Douglas Hanley 2024-07-12 03:14:12 -05:00
  • 8a4441ea1a
    docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441) Armen Kaleshian 2024-07-12 04:08:19 -04:00
  • 5aefbce27a
    convert : remove fsep token from GPTRefactForCausalLM (#8237) Jiří Podivín 2024-07-12 10:06:33 +02:00
  • 71c1121d11
    examples : sprintf -> snprintf (#8434) Georgi Gerganov 2024-07-12 10:46:14 +03:00
  • 370b1f7e7a
    ggml : minor naming changes (#8433) Georgi Gerganov 2024-07-12 10:46:02 +03:00
  • b549a1bbef
    [SYCL] fix the mul_mat_id ut issues (#8427) Chen Xi 2024-07-12 00:52:04 +00:00
  • 368645698a
    ggml : add NVPL BLAS support (#8329) (#8425) Nicholai Tukanov 2024-07-11 11:49:15 -05:00
  • b078c619aa
    cuda : suppress 'noreturn' warn in no_device_code (#8414) Daniel Bevenius 2024-07-11 17:53:42 +02:00
  • 808aba3916
    CUDA: optimize and refactor MMQ (#8416) Johannes Gäßler 2024-07-11 16:47:47 +02:00
  • a977c11544
    gitignore : deprecated binaries Georgi Gerganov 2024-07-11 11:20:40 +03:00
  • 9a55ffe6fb
    tokenize : add --no-parse-special option (#8423) compilade 2024-07-11 03:41:48 -04:00
  • 7a221b672e
    llama : use F32 precision in Qwen2 attention and no FA (#8412) Georgi Gerganov 2024-07-11 10:21:30 +03:00
  • 278d0e1846
    Initialize default slot sampling parameters from the global context. (#8418) Clint Herron 2024-07-10 20:08:17 -04:00
  • dd07a123b7
    Name Migration: Build the deprecation-warning 'main' binary every time (#8404) Clint Herron 2024-07-10 12:35:18 -04:00
  • f4444d992c
    [SYCL] Use multi_ptr to clean up deprecated warnings (#8256) AidanBeltonS 2024-07-10 16:10:49 +01:00
  • 6b2a849d1f
    ggml : move sgemm sources to llamafile subfolder (#8394) Georgi Gerganov 2024-07-10 15:23:29 +03:00
  • 0f1a39f343
    ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) Dibakar Gope 2024-07-10 07:14:51 -05:00
  • 83321c6958
    gguf-py rel pipeline (#8410) M. Yusuf Sarıgöz 2024-07-10 15:12:35 +03:00
  • cc61948b1f
    llama : C++20 compatibility for u8 strings (#8408) Borislav Stanimirov 2024-07-10 14:45:44 +03:00
  • 7a80710d93
    msvc : silence codecvt c++17 deprecation warnings (#8395) Borislav Stanimirov 2024-07-10 14:40:53 +03:00
  • a8be1e6f59
    llama : add assert about missing llama_encode() call (#8400) fairydreaming 2024-07-10 13:38:58 +02:00
  • e4dd31ff89
    py : fix converter for internlm2 (#8321) RunningLeon 2024-07-10 19:26:40 +08:00
  • 8f0fad42b9
    py : fix extra space in convert_hf_to_gguf.py (#8407) laik 2024-07-10 19:19:10 +08:00
  • a59f8fdc85
    Server: Enable setting default sampling parameters via command-line (#8402) Clint Herron 2024-07-09 18:26:40 -04:00
  • fd560fe680
    Update README.md to fix broken link to docs (#8399) Andy Salerno 2024-07-09 11:58:44 -07:00
  • e500d6135a
    Deprecation warning to assist with migration to new binary names (#8283) Clint Herron 2024-07-09 11:54:43 -04:00
  • a03e8dd99d
    make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) Johannes Gäßler 2024-07-09 17:11:07 +02:00
  • 5b0b8d8cfb
    sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372) Alberto Cabrera Pérez 2024-07-09 15:03:15 +01:00
  • 9925ca4087
    cmake : allow external ggml (#8370) Borislav Stanimirov 2024-07-09 11:38:00 +03:00
  • 9beb2dda03
    readme : fix typo [no ci] (#8389) daghanerdonmez 2024-07-09 09:16:00 +03:00
  • 7d0e23d72e
    gguf-py : do not use internal numpy types (#7472) compilade 2024-07-09 01:04:49 -04:00
  • 7fdb6f73e3
    flake.lock: Update (#8342) Georgi Gerganov 2024-07-09 01:36:38 +03:00
  • a130eccef4
    labeler : updated sycl to match docs and code refactor (#8373) Alberto Cabrera Pérez 2024-07-08 21:35:17 +01:00
  • c4dd11d1d3
    readme : fix web link error [no ci] (#8347) b4b4o 2024-07-08 22:19:24 +08:00
  • 2ec846d558
    sycl : fix powf call in device code (#8368) Alberto Cabrera Pérez 2024-07-08 14:22:41 +01:00
  • 3f2d538b81
    scripts : fix sync for sycl Georgi Gerganov 2024-07-08 13:51:31 +03:00
  • 2ee44c9a18 sync : ggml Georgi Gerganov 2024-07-08 10:39:50 +03:00
  • 6847d54c4f tests : fix whitespace (#0) Georgi Gerganov 2024-07-08 10:39:36 +03:00
  • fde13b3bb9 feat: cuda implementation for ggml_conv_transpose_1d (ggml/854) John Balis 2024-07-02 11:09:52 -05:00
  • 470939d483
    common : preallocate sampling token data vector (#8363) Kevin Wang 2024-07-08 03:26:53 -04:00
  • 6f0dbf6ab0
    infill : assert prefix/suffix tokens + remove old space logic (#8351) Georgi Gerganov 2024-07-08 09:34:35 +03:00
  • ffd00797d8
    common : avoid unnecessary logits fetch (#8358) Kevin Wang 2024-07-08 02:31:55 -04:00
  • 04ce3a8b19
    readme : add supported glm models (#8360) toyer 2024-07-08 13:57:19 +08:00
  • 3fd62a6b1c
    py : type-check all Python scripts with Pyright (#8341) compilade 2024-07-07 15:04:39 -04:00
  • a8db2a9ce6
    Update llama-cli documentation (#8315) Denis Spasyuk 2024-07-07 09:08:28 -06:00
  • 4090ea5501
    ci : add checks for cmake,make and ctest in ci/run.sh (#8200) Alex Tuddenham 2024-07-07 15:59:14 +01:00
  • f1948f1e10
    readme : update bindings list (#8222) Andy Tai 2024-07-07 06:21:37 -07:00
  • f7cab35ef9
    gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048) Brian 2024-07-07 22:58:43 +10:00
  • 905942abdb
    llama : support glm3 and glm4 (#8031) toyer 2024-07-07 20:52:10 +08:00
  • b5040086d4
    llama : fix n_rot default (#8348) Georgi Gerganov 2024-07-07 14:59:02 +03:00
  • d39130a398
    py : use cpu-only torch in requirements.txt (#8335) compilade 2024-07-07 07:23:38 -04:00
  • b81ba1f96b
    finetune: Rename command name in README.md (#8343) standby24x7 2024-07-07 19:38:02 +09:00
  • 210eb9ed0a
    finetune: Rename an old command name in finetune.sh (#8344) standby24x7 2024-07-07 19:37:47 +09:00
  • cb4d86c4d7
    server: Retrieve prompt template in /props (#8337) Bjarke Viksøe 2024-07-07 11:10:38 +02:00
  • 86e7299ef5
    added support for Authorization Bearer tokens when downloading model (#8307) Derrick T. Woolworth 2024-07-06 15:32:04 -05:00
  • 60d83a0149
    update main readme (#8333) Xuan Son Nguyen 2024-07-06 19:01:23 +02:00
  • 87e25a1d1b
    llama : add early return for empty range (#8327) Daniel Bevenius 2024-07-06 09:22:16 +02:00
  • 213701b51a
    Detokenizer fixes (#8039) jaime-m-p 2024-07-05 19:01:35 +02:00
  • be20e7f49d
    Reorganize documentation pages (#8325) Xuan Son Nguyen 2024-07-05 18:08:32 +02:00
  • 7ed03b8974
    llama : fix compile warning (#8304) Georgi Gerganov 2024-07-05 17:32:09 +03:00
  • 1d894a790e
    cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281) Natsu 2024-07-05 22:29:35 +08:00
  • 1f3e1b66e2
    Enabled more data types for oneMKL gemm_batch (#8236) Ouadie EL FAROUKI 2024-07-05 13:23:25 +01:00
  • 148ec970b6
    convert : remove AWQ remnants (#8320) Georgi Gerganov 2024-07-05 10:15:36 +03:00
  • 2cccbaa008
    llama : minor indentation during tensor loading (#8304) Georgi Gerganov 2024-07-05 10:15:24 +03:00
  • 8e558309dc
    CUDA: MMQ support for iq4_nl, iq4_xs (#8278) Johannes Gäßler 2024-07-05 09:06:31 +02:00
  • 0a423800ff
    CUDA: revert part of the RDNA1 optimizations (#8309) Daniele 2024-07-05 07:06:09 +00:00
  • d12f781074
    llama : streamline embeddings from "non-embedding" models (#8087) Douglas Hanley 2024-07-05 02:05:56 -05:00
  • bcefa03bc0
    CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311) Johannes Gäßler 2024-07-05 09:05:34 +02:00
  • 5a7447c569
    readme : fix minor typos [no ci] (#8314) Pieter Ouwerkerk 2024-07-05 02:58:41 -04:00
  • 61ecafa390
    passkey : add short intro to README.md [no-ci] (#8317) Daniel Bevenius 2024-07-05 08:14:24 +02:00
  • aa5898dc53
    llama : prefer n_ over num_ prefix (#8308) Georgi Gerganov 2024-07-05 09:10:03 +03:00
  • 6c05752c50
    contributing : update guidelines (#8316) Georgi Gerganov 2024-07-05 09:09:47 +03:00
  • a9554e20b6
    [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) luoyu-intel 2024-07-05 05:06:13 +00:00
  • e235b267a2
    py : switch to snake_case (#8305) Georgi Gerganov 2024-07-05 07:53:33 +03:00
  • f09b7cb609
    rm get_work_group_size() by local cache for performance (#8286) Neo Zhang Jianyu 2024-07-05 10:32:29 +08:00
  • a38b884c6c
    cli: add EOT when user hit Ctrl+C (#8296) Xuan Son Nguyen 2024-07-04 20:55:03 +02:00
  • d7fd29fff1
    llama : add OpenELM support (#7359) Icecream95 2024-07-05 05:14:21 +12:00
  • 6f63d646c1
    tokenize : add --show-count (token) option (#8299) Daniel Bevenius 2024-07-04 18:38:58 +02:00
  • 51d2ebadbb build: Export hf-to-gguf as snakecase ditsuke 2024-07-04 20:54:35 +05:30
  • 1e920018d3 doc: Add context for why we add an explicit pytorch source ditsuke 2024-07-03 01:02:56 +05:30
  • 01a5f06550 chore: Remove rebase artifacts ditsuke 2024-07-02 15:48:13 +05:30
  • 07786a61a2 chore: Fixup requirements and build ditsuke 2024-07-02 15:35:43 +05:30
  • de14e2ea2b chore: ignore all __pychache__ ditsuke 2024-07-02 15:18:13 +05:30
  • 821922916f fix: Update script paths in CI scripts ditsuke 2024-03-10 23:21:46 +05:30
  • b1c3f26e5e fix: Actually include scripts in build ditsuke 2024-02-29 01:47:15 +05:30
  • b0a46993df build(python): Package scripts with pip-0517 compliance ditsuke 2024-02-27 12:01:02 +05:30
  • 807b0c49ff
    Inference support for T5 and FLAN-T5 model families (#5763) fairydreaming 2024-07-04 15:46:11 +02:00
  • f8c4c0738d
    tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231) Daniel Bevenius 2024-07-04 12:53:42 +02:00