Commit Graph

  • 902184dd3a
    fix missing slash in fs_get_cache_directory() (#7503) Xuan Son Nguyen 2024-05-25 05:30:59 +02:00
  • 57684331fc
    Make tokenize CLI tool have nicer command line arguments. (#6188) Mikko Juola 2024-05-24 18:14:42 -07:00
  • b83bab15a5
    gguf-py : fix and simplify quantized shape round-trip (#7483) compilade 2024-05-24 21:11:48 -04:00
  • d041d2ceaa
    flake.lock: Update (#7232) Georgi Gerganov 2024-05-24 18:59:06 +03:00
  • 27891f6db0
    docker.yml: disable light-intel and server-intel test (#7515) Brian 2024-05-24 23:47:56 +10:00
  • fbca2f27fc
    Add support for ArcticForCausalLM (#7020) fairydreaming 2024-05-24 14:31:13 +02:00
  • 0df0aa8e43
    add build shared lib in win release package (#7438) Neo Zhang 2024-05-24 10:06:56 +08:00
  • 74f33adf5f
    readme : remove trailing space (#7469) Georgi Gerganov 2024-05-23 17:43:18 +03:00
  • 1debe72737
    ggml : silence UB sanitizer error during iq2_xxs quantization (#0) Georgi Gerganov 2024-05-23 17:17:43 +03:00
  • 007489e895
    Fix phi3 chat template confusion with zephyr (#7449) Tristan Druyen 2024-05-23 16:15:15 +02:00
  • 8b94e799df
    readme : add Bunny in supported models [no ci] (#7469) Raj Hammeer Singh Hada 2024-05-23 18:00:13 +05:30
  • 3015851c5a
    llama : add getters for n_threads/n_threads_batch (#7464) Daniel Bevenius 2024-05-23 14:29:26 +02:00
  • 55ac3b7aea
    ci : use Pythia models instead of OpenLlama (#7470) Georgi Gerganov 2024-05-23 15:28:14 +03:00
  • dacfcebd60
    readme : add GPT-NeoX + Pythia to the list of supported models (#7491) Victor Nogueira 2024-05-23 15:12:43 +03:00
  • 9b82476ee9
    Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461) fairydreaming 2024-05-23 11:49:53 +02:00
  • a61a94e543
    llama : rename n_ctx -> cache.size, less confusing (#0) Georgi Gerganov 2024-05-23 12:38:18 +03:00
  • 152da28ae5
    labeler.yml: add embedding label detector [no ci] (#7482) Brian 2024-05-23 17:40:43 +10:00
  • d48c88cbd5
    ggml : remove ggml_flash_attn and ggml_flash_ff (#7463) Georgi Gerganov 2024-05-23 10:00:44 +03:00
  • e84b71c2c6
    ggml : drop support for QK_K=64 (#7473) Georgi Gerganov 2024-05-23 10:00:21 +03:00
  • 1b1e27cb49
    Update vulkan rope implementation to support frequency factors (#7475) 0cc4m 2024-05-23 08:59:59 +02:00
  • fbf777d2b9
    main : minor (#7462) Georgi Gerganov 2024-05-23 09:43:24 +03:00
  • cd93a28cb1
    CUDA: fix FA out-of-bounds reads (#7479) Johannes Gäßler 2024-05-23 00:31:20 +02:00
  • 1e374365d1
    SimpleChat: a simple and dumb web front end for testing /chat/completions and /completions end points and try chat (#7350) HanishKVC 2024-05-22 23:23:21 +05:30
  • 197ff91462
    build : remove zig (#7471) Georgi Gerganov 2024-05-22 20:05:38 +03:00
  • 6ff13987ad
    common : normalize naming style (#7462) Georgi Gerganov 2024-05-22 20:04:20 +03:00
  • 38c03478a3
    CUDA: fix FA out-of-bounds writes (#7465) Johannes Gäßler 2024-05-22 17:58:25 +02:00
  • b18532a4ef
    phi3 : duplicate rope factors in each layer (#7447) slaren 2024-05-22 16:10:46 +02:00
  • fcda1128bc
    vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) k.h.lai 2024-05-22 20:53:21 +08:00
  • 03d8900ebe
    llama : add missing model type names (#7445) Justine Tunney 2024-05-22 07:08:18 -04:00
  • 9b3d833189
    cuda : fix compile warning (#7454) Georgi Gerganov 2024-05-22 12:36:37 +03:00
  • 95fb0aefab
    CUDA: remove incorrect precision check (#7454) Johannes Gäßler 2024-05-22 10:24:29 +02:00
  • 3e5faa8503
    cuda : fix rope + add tests (#7452) Georgi Gerganov 2024-05-22 11:01:35 +03:00
  • 201cc11afa
    llama : add phi3 128K model support (#7225) liuwei-git 2024-05-22 04:28:32 +08:00
  • 6369bf0433
    metal : handle F16 inf values, fix FA partial offload (#7434) Georgi Gerganov 2024-05-21 23:03:42 +03:00
  • e402de364b
    grammars: fix resampling logic regression (#7424) Olivier Chafik 2024-05-21 20:40:00 +01:00
  • fcf6538ba6
    CUDA: fix unused warning in mmq.cu (#7442) Johannes Gäßler 2024-05-21 19:27:12 +02:00
  • c3f8d58356
    tests : test-tokenizer-0.sh print more info (#7402) Georgi Gerganov 2024-05-21 19:53:48 +03:00
  • 11474e756d
    examples: cache hf model when --model not provided (#7353) Amir 2024-05-21 17:13:12 +03:00
  • d8ee902227
    CUDA: deduplicate mmq code (#7397) Johannes Gäßler 2024-05-21 16:02:12 +02:00
  • d7e852c1bc
    Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425) jaime-m-p 2024-05-21 14:39:48 +02:00
  • 917dc8cfa6
    Tokenizer SPM fixes for phi-3 and llama-spm (#7375) jaime-m-p 2024-05-20 20:15:57 +02:00
  • fabf30b4c4
    llama : remove Persimmon (#7408) Georgi Gerganov 2024-05-20 19:35:28 +03:00
  • 20385cebcc
    perplexity: update README FP16 results [no ci] (#7413) Johannes Gäßler 2024-05-20 18:15:38 +02:00
  • db10f01310
    rpc : track allocated buffers (#7411) Radoslav Gerganov 2024-05-20 16:36:55 +03:00
  • 3bc10cb485
    server : fix temperature + disable some tests (#7409) Georgi Gerganov 2024-05-20 15:10:03 +03:00
  • 6bf9b66fa3
    [SYCL] Update SYCL upscale operation (#7321) AidanBeltonS 2024-05-20 12:08:23 +01:00
  • 26cd4237bc
    Update README.md (#7410) Bingan 2024-05-20 17:55:34 +08:00
  • 213e90ed73
    ggml-opencl, llama: using reserve() if count already known (#7272) Herman Semenov 2024-05-20 07:33:21 +00:00
  • 65c58207ec
    ggml : add loongarch lsx and lasx support (#6454) junchao-loongson 2024-05-20 15:19:21 +08:00
  • 1cc0155d04
    server : tuning tests (#7388) Georgi Gerganov 2024-05-20 10:16:41 +03:00
  • e932094d58
    server : return error on too large embedding input (#7389) Georgi Gerganov 2024-05-20 08:56:05 +03:00
  • 2789baf480
    tests : fix --keep_split -> --keep-split (#7374) Georgi Gerganov 2024-05-20 08:55:09 +03:00
  • 33c8d50acc
    Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) Srihari-mcw 2024-05-19 19:18:39 -07:00
  • d359f30921
    llama : remove MPI backend (#7395) slaren 2024-05-20 01:17:03 +02:00
  • 1ea2a0036e
    quantize : fix --keep-split check (#7374) Fred Douglas 2024-05-19 11:37:04 -05:00
  • f030ec1f7a
    Vulkan Embedding Fix (#7360) 0cc4m 2024-05-19 17:19:53 +02:00
  • e4e6f67be6
    ggml : fix another case of quants nans (#7387) slaren 2024-05-19 17:08:46 +02:00
  • 5ca49cbecd
    ggml: implement quantized KV cache for FA (#7372) Johannes Gäßler 2024-05-19 16:46:13 +02:00
  • 1b01f06db0
    server: add test for token probs (#7347) Johannes Gäßler 2024-05-19 16:26:02 +02:00
  • 41858392e1
    server: fix seed being reported back (#7382) Johannes Gäßler 2024-05-19 16:06:33 +02:00
  • 6aade19ee7
    Add StableLM2 pre-tokenizer (#7349) Anas Ahouzi 2024-05-19 14:46:46 +02:00
  • ab33f7a338
    cuda : clear error after buffer allocation failure (#7376) slaren 2024-05-19 14:19:37 +02:00
  • e23b974f4c
    labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363) Brian 2024-05-19 20:51:03 +10:00
  • 854d365aba
    cmake : update android comments (#7341) Georgi Gerganov 2024-05-19 11:01:01 +03:00
  • f5bf761747
    Capture CUDA logging output (#7298) fraxy-v 2024-05-19 01:44:42 +03:00
  • 059031b8c4
    ci : re-enable sanitizer runs (#7358) Georgi Gerganov 2024-05-18 18:55:54 +03:00
  • 511182eabb
    android : use "ci-android" branch for CI (#7341) Georgi Gerganov 2024-05-18 13:40:39 +03:00
  • 133d99c599
    CUDA: deduplicate FlashAttention code (#7352) Johannes Gäßler 2024-05-18 12:36:25 +02:00
  • cb42c29427
    server: correct --threads documentation [no ci] (#7362) Johannes Gäßler 2024-05-18 11:10:47 +02:00
  • d233b507cd
    cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263) Engininja2 2024-05-18 02:05:17 -06:00
  • 0f98acfac6
    llama : add support for larger Granite Code Models (20B, 34B) (#7324) Steffen Röcker 2024-05-18 10:04:55 +02:00
  • ca57e0f35e
    perplexity : ndot progress and show stats with < 100 tasks (#7348) strawberrymelonpanda 2024-05-18 00:57:08 -07:00
  • c1b295eea5
    Update and fix Vulkan soft_max and argsort implementations (#7237) 0cc4m 2024-05-18 08:10:58 +02:00
  • de73196344
    github-actions-labeler: initial commit (#7330) Brian 2024-05-18 16:04:23 +10:00
  • b49a13dd2f
    convert : fix set_vocab_sentencepiece (#6866) Georgi Gerganov 2024-05-18 08:46:20 +03:00
  • 05834841dc
    ggml : fix quants nans when all the group weights are very close to zero (#7313) slaren 2024-05-18 02:39:54 +02:00
  • ef277de2ad
    cmake : fix typo in AMDGPU_TARGETS (#7356) Engininja2 2024-05-17 18:39:25 -06:00
  • b43272afa2
    Unicode codepoint flags for custom regexs (#7245) jaime-m-p 2024-05-18 01:09:13 +02:00
  • 0fc1e820a9
    CUDA: faster large batch FA without tensor cores (#7314) Johannes Gäßler 2024-05-17 18:54:52 +02:00
  • 82ca83db3c
    ROCm: use native CMake HIP support (#5966) Gavin Zhao 2024-05-17 11:03:03 -04:00
  • f4bd8b3d26
    rpc : set SO_REUSEADDR for the server socket (#7320) Radoslav Gerganov 2024-05-17 17:25:44 +03:00
  • 51e9d02599
    Added a single test function script and fix debug-test.sh to be more robust (#7279) Brian 2024-05-17 22:40:14 +10:00
  • d273c1402b
    py : convert-hf-to-gguf-update improvements (#7340) Aarni Koskela 2024-05-17 15:11:45 +03:00
  • 27b040691c
    llama : use n_embd_head_v when reshaping kqv (#7327) fairydreaming 2024-05-17 13:24:38 +02:00
  • 29c60d8cdd
    tokenization: add warning for double BOS (#7332) Johannes Gäßler 2024-05-17 09:59:57 +02:00
  • 359cbe3f46
    ggml-quants, llama : removed excess checks (#7274) Herman Semenov 2024-05-17 07:08:49 +00:00
  • e18bc6aaf3
    convert : fix Qwen/Qwen-7b conversion (#7308) amd-lalithnc 2024-05-17 12:31:58 +05:30
  • ee94172d33
    server : add support for the RPC backend (#7305) Radoslav Gerganov 2024-05-17 10:00:17 +03:00
  • 934266c0e0
    ggml : rewrite silu and softmax for cpu (#7154) Justine Tunney 2024-05-17 02:58:52 -04:00
  • 9c4fdcbec8
    [Server] Added --verbose option to README [no ci] (#7335) Leon Knauer 2024-05-17 02:11:03 +02:00
  • 24ecb58168
    Revert "server bench: fix bench not waiting for model load (#7284)" (#7334) Pierrick Hymbert 2024-05-16 20:43:45 +02:00
  • 9afdffe70e rpc : get available mem for the CPU backend Radoslav Gerganov 2024-05-15 16:04:40 +03:00
  • 3b3963c55c rpc : add command line arg for specifying backend memory Radoslav Gerganov 2024-05-15 15:29:07 +03:00
  • dda64fc17c
    convert : get general.name from model dir, not its parent (#5615) Jared Van Bortel 2024-05-16 02:15:23 -04:00
  • 0350f58152
    grammar, json, llama: replace push on emplace if it possible (#7273) Herman Semenov 2024-05-16 06:14:24 +00:00
  • ad52d5c259
    doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288) Vaibhav Srivastav 2024-05-16 07:38:43 +02:00
  • 172b78210a
    ci: fix bin/Release path for windows-arm64 builds (#7317) Max Krasnyansky 2024-05-15 22:36:43 -07:00
  • 13ad16af12
    Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191) Max Krasnyansky 2024-05-15 19:47:36 -07:00
  • 8f7080bf48
    readme : remove stray double quote (#7310) Daniel Bevenius 2024-05-15 23:41:03 +02:00
  • e1b40ac3b9
    ggml : use dynamic thread scheduling for matrix multiplication (#6915) kunnis 2024-05-15 12:59:12 -05:00