Commit Graph

  • 402d6feffa
    llama : suppress unref var in Windows MSVC (#8150) Daniel Bevenius 2024-07-04 12:50:57 +02:00
  • 20fc3804bf
    convert : fix gemma v1 tokenizer convert (#8248) Georgi Gerganov 2024-07-04 10:41:03 +03:00
  • f619024764
    [SYCL] Remove unneeded semicolons (#8280) AidanBeltonS 2024-07-04 02:07:19 +01:00
  • d23287f122
    Define and optimize RDNA1 (#8085) Daniele 2024-07-03 23:02:58 +00:00
  • 5f2d4e60e2
    ppl : fix n_seq_max for perplexity (#8277) slaren 2024-07-03 19:33:31 +02:00
  • 916248af1f
    fix phi 3 conversion (#8262) Xuan Son Nguyen 2024-07-03 16:01:54 +02:00
  • f8d6a23804
    fix typo (#8267) Judd 2024-07-03 20:40:16 +08:00
  • fadde67135
    Dequant improvements rebase (#8255) AidanBeltonS 2024-07-03 02:55:34 +01:00
  • a27152b602
    fix: add missing short command line argument -mli for multiline-input (#8261) MistApproach 2024-07-02 22:56:46 +02:00
  • 3e2618bc7b
    Adding step to clean target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257) Clint Herron 2024-07-02 13:19:56 -04:00
  • 07a3fc0608
    Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) Clint Herron 2024-07-02 12:18:10 -04:00
  • 968967376d
    Add JAIS model(s) (#8118) Faisal Zaghloul 2024-07-02 10:36:00 -04:00
  • 023b8807e1
    convert-hf : print output file name when completed (#8181) Daniel Bevenius 2024-07-02 08:40:49 +02:00
  • 0e0590adab
    cuda : update supports_op for matrix multiplication (#8245) slaren 2024-07-02 08:39:38 +02:00
  • a9f3b10215
    [SYCL] Fix win build conflict of math library (#8230) luoyu-intel 2024-07-02 04:50:07 +00:00
  • d08c20edde
    [SYCL] Fix the sub group size of Intel (#8106) luoyu-intel 2024-07-02 02:16:00 +00:00
  • 5fac350b9c
    Fix gemma2 tokenizer convert (#8244) Xuan Son Nguyen 2024-07-02 01:07:23 +02:00
  • cb5fad4c6c
    CUDA: refactor and optimize IQ MMVQ (#8215) Johannes Gäßler 2024-07-01 20:39:06 +02:00
  • dae57a1ebc
    readme: add Paddler to the list of projects (#8239) Mateusz Charytoniuk 2024-07-01 19:13:22 +02:00
  • 49122a873f
    gemma2: add sliding window mask (#8227) Xuan Son Nguyen 2024-07-01 18:48:34 +02:00
  • 0ddeff1023
    readme : update tool list (#8209) Roni 2024-07-01 14:48:16 +02:00
  • 3840b6f593
    nix : enable curl (#8043) Michael Francis 2024-07-01 07:47:04 -04:00
  • 257f8e41e2
    nix : remove OpenCL remnants (#8235) Georgi Gerganov 2024-07-01 14:46:18 +03:00
  • 694c59cb42
    Document BERT support. (#8205) iacore 2024-07-01 11:40:58 +00:00
  • 197fe6c1d7
    [SYCL] Update SYCL-Rope op and Refactor (#8157) zhentaoyu 2024-07-01 19:39:06 +08:00
  • d0a7145ba9
    flake.lock: Update (#8218) Georgi Gerganov 2024-07-01 02:09:34 +03:00
  • 9ef0780062
    Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203) Xuan Son Nguyen 2024-06-30 20:27:13 +02:00
  • 1c5eba6f8e
    llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197) Andrei 2024-06-29 20:44:08 -07:00
  • 72272b83a3
    fix code typo in llama-cli (#8198) Xuan Son Nguyen 2024-06-29 00:14:20 +02:00
  • 8748d8ac6f
    json: attempt to skip slow tests when running under emulator (#8189) Olivier Chafik 2024-06-28 18:02:05 +01:00
  • 26a39bbd6b
    Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal (#8172) Xuan Son Nguyen 2024-06-28 15:11:44 +02:00
  • 38373cfbab
    Add SPM infill support (#8016) Sigbjørn Skjæret 2024-06-28 12:53:43 +02:00
  • b851b3fba0
    cmake : allow user to override default options (#8178) slaren 2024-06-28 12:37:45 +02:00
  • 139cc621e9
    json: restore default additionalProperties to false, fix some pattern escapes (#8180) Olivier Chafik 2024-06-28 09:26:45 +01:00
  • e57dc62057
    llama: Add support for Gemma2ForCausalLM (#8156) pculliton 2024-06-28 00:00:43 -04:00
  • a27aa50ab7
    Add missing items in makefile (#8177) Xuan Son Nguyen 2024-06-28 02:19:11 +02:00
  • cb0b06a8a6
    json: update grammars/README w/ examples & note about additionalProperties (#8132) Olivier Chafik 2024-06-27 22:08:42 +01:00
  • 558f44bf83
    CI: fix release build (Ubuntu+Mac) (#8170) loonerin 2024-06-27 15:01:23 -04:00
  • 8172ee9da9
    cmake : fix deprecated option names not working (#8171) slaren 2024-06-27 20:04:39 +02:00
  • 16791b8f0b
    Add chatml fallback for cpp llama_chat_apply_template (#8160) Xuan Son Nguyen 2024-06-27 18:14:19 +02:00
  • ab3679112d
    flake.lock: Update (#8071) Georgi Gerganov 2024-06-27 18:37:29 +03:00
  • 97877eb10b
    Control vector loading fixes (#8137) jukofyork 2024-06-27 15:48:07 +01:00
  • 387952651a
    Delete examples/llama.android/llama/CMakeLists.txt (#8165) Raj Hammeer Singh Hada 2024-06-27 20:09:29 +05:30
  • 6030c61281
    Add Qwen2MoE 57B-A14B model identifier (#8158) Sigbjørn Skjæret 2024-06-27 16:27:41 +02:00
  • 85a267daaa
    CUDA: fix MMQ stream-k for --split-mode row (#8167) Johannes Gäßler 2024-06-27 16:26:05 +02:00
  • f675b20a3b
    Added support for Viking pre-tokenizer (#8135) kustaaya 2024-06-27 11:58:54 +03:00
  • 911e35bb8b
    llama : fix CodeLlama FIM token checks (#8144) Sigbjørn Skjæret 2024-06-27 09:46:41 +02:00
  • ac146628e4
    Fix llama-android.cpp for error - "common/common.h not found" (#8145) Raj Hammeer Singh Hada 2024-06-27 07:27:57 +05:30
  • 9b31a40c6d
    clip : suppress unused variable warnings (#8105) Daniel Bevenius 2024-06-27 01:50:09 +02:00
  • c70d117c37
    scripts : fix filename sync Georgi Gerganov 2024-06-26 23:25:22 +03:00
  • ae5d0f4b89
    ci : publish new docker images only when the files change (#8142) slaren 2024-06-26 21:59:28 +02:00
  • 31ec3993f6
    ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140) slaren 2024-06-26 21:34:14 +02:00
  • c7ab7b612c
    make : fix missing -O3 (#8143) slaren 2024-06-26 20:20:22 +02:00
  • f2d48fffde
    sync : ggml Georgi Gerganov 2024-06-26 19:39:19 +03:00
  • 4713bf3093
    authors : regen Georgi Gerganov 2024-06-26 19:36:44 +03:00
  • 0e814dfc42
    devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139) Georgi Gerganov 2024-06-26 19:32:07 +03:00
  • a95631ee97
    readme : update API notes Georgi Gerganov 2024-06-26 19:26:13 +03:00
  • f3f65429c4
    llama : reorganize source code + improve CMake (#8006) Georgi Gerganov 2024-06-26 18:33:02 +03:00
  • 8854044561
    Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (#8115) Isaac McFadyen 2024-06-26 02:29:28 -04:00
  • c8771ab5f8
    CUDA: fix misaligned shared memory read (#8123) Johannes Gäßler 2024-06-26 08:28:02 +02:00
  • 494165f3b6
    llama : extend llm_build_ffn() to support _scale tensors (#8103) Eddie-Wang 2024-06-26 14:27:46 +08:00
  • 9b2f16f805
    json: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863) Olivier Chafik 2024-06-26 01:46:35 +01:00
  • 6777c544bd
    json: fix additionalProperties, allow space after enum/const (#7840) Olivier Chafik 2024-06-26 01:45:58 +01:00
  • 163d50adaf
    fixes #7999 (adds control vectors to all build_XXX() functions in llama.cpp [needs testing] (#8060) jukofyork 2024-06-25 21:47:40 +01:00
  • 6fcbf68235
    llama : implement Unigram tokenizer needed by T5 and FLAN-T5 model families (#5763) fairydreaming 2024-06-25 21:14:35 +02:00
  • e6bf007744
    llama : return nullptr from llama_grammar_init (#8093) Daniel Bevenius 2024-06-25 21:07:28 +02:00
  • 84631fe150
    json: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum (#7797) Olivier Chafik 2024-06-25 20:06:20 +01:00
  • dd047b476c
    disable docker CI on pull requests (#8110) slaren 2024-06-25 19:20:06 +02:00
  • 925c30956d
    Add healthchecks to llama-server containers (#8081) joecryptotoo 2024-06-25 08:13:27 -07:00
  • c8ad35955a
    Gguf dump start data offset via --data-offset and some extra refactor (#8054) Brian 2024-06-25 22:03:25 +10:00
  • 49c03c79cd
    cvector: better prompt handling, add "mean vector" method (#8069) Xuan Son Nguyen 2024-06-25 13:59:54 +02:00
  • 48e6b92cc3
    Add chat template support for llama-cli (#8068) Xuan Son Nguyen 2024-06-25 13:56:49 +02:00
  • 3791ad2193
    SimpleChat v3.1: Boolean chat request options in Settings UI, cache_prompt (#7950) HanishKVC 2024-06-25 16:57:35 +05:30
  • f702a90e24
    Update control vector help (#8104) HatsuneMikuUwU33 2024-06-25 10:44:48 +02:00
  • 083bacce14
    [SYCL] Re-enabled mul_mat_batched_sycl (#8095) Meng, Hengyu 2024-06-25 10:19:20 +08:00
  • 2df373ac40
    CUDA: fix matrix multiplication algorithm choice (#8102) Johannes Gäßler 2024-06-25 01:22:33 +02:00
  • 3b099bcd9c
    CUDA: fix MMQ writeback for int8 tensor cores (#8100) Johannes Gäßler 2024-06-24 22:15:33 +02:00
  • a818f3028d
    CUDA: use MMQ instead of cuBLAS by default (#8075) Johannes Gäßler 2024-06-24 17:43:42 +02:00
  • d62e4aaa02
    gguf-py : fix tensor groups for encoder-decoder models in gguf-dump.py (#8090) fairydreaming 2024-06-24 14:13:39 +02:00
  • 9a590c8226
    CUDA: optimize MMQ int8 tensor core performance (#8062) Johannes Gäßler 2024-06-24 12:41:23 +02:00
  • 52fc8705a0
    Option to split during conversion (#6942) Christian Zhou-Zheng 2024-06-24 05:42:03 -04:00
  • 8cb508d0d5
    disable publishing the full-rocm docker image (#8083) slaren 2024-06-24 07:36:11 +02:00
  • 646ef4a9cf
    embedding : more cli arguments (#7458) Yann Follet 2024-06-24 13:30:24 +08:00
  • de0d6a68ac
    gguf-py, convert-hf : model conversion support for T5 and FLAN-T5 model variants (#5763) fairydreaming 2024-06-24 07:06:05 +02:00
  • 95f57bb5d5
    ggml : remove ggml_task_type and GGML_PERF (#8017) slaren 2024-06-24 03:07:59 +02:00
  • e112b610a1
    llama : add support for BitnetForCausalLM (#7931) Eddie-Wang 2024-06-24 02:27:57 +08:00
  • 6a2f298bd7
    server : fix JSON-Scheme typo (#7975) Aarni Koskela 2024-06-23 18:03:08 +03:00
  • 11318d9aa1
    Fix typo in llama_set_embeddings comment (#8077) Daniel Bevenius 2024-06-23 15:39:45 +02:00
  • b6b9a8e606
    fix CI failures (#8066) slaren 2024-06-23 13:14:45 +02:00
  • 45c0e2e4c1
    Refactor Vulkan backend to allow multiple contexts (#7961) 0cc4m 2024-06-23 10:21:25 +02:00
  • b5a5f34efa
    Removing extra blank lines that were breaking Lint. (#8067) Clint Herron 2024-06-22 14:28:18 -04:00
  • 3e58b0ee35
    cvector: fix CI + correct help message (#8064) Xuan Son Nguyen 2024-06-22 18:11:30 +02:00
  • adf480c3ab
    cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (#8052) HatsuneMikuUwU33 2024-06-22 17:19:37 +02:00
  • 3aa184a8c7
    convert-hf : change assert to exception (#8015) 0xspringtime 2024-06-22 09:37:41 -04:00
  • 5b48cd53a8
    Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058) ddh0 2024-06-22 07:16:10 -06:00
  • c5a8d4b749
    JSON Schema to GBNF integration tests (#7790) Clint Herron 2024-06-21 23:18:36 -04:00
  • 557b653dc9
    vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022) k.h.lai 2024-06-21 16:28:20 +08:00
  • 7d5e8777ae
    ggml : AVX IQ quants (#7845) Eve 2024-06-21 05:57:36 +00:00
  • a927b0f3dd
    llama : optimize long word tokenization with WPM (#8034) Georgi Gerganov 2024-06-21 08:51:28 +03:00
  • 80ea089d77
    llama : allow pooled embeddings on any model (#7477) Douglas Hanley 2024-06-21 00:38:22 -05:00