Commit Graph

  • 381efbf480
    llava : expose as a shared library for downstream projects (#3613) Damian Stewart 2023-11-06 22:36:23 +01:00
  • 2833a6f63c
    ggml-cuda : fix f16 mul mat (#3961) slaren 2023-11-05 18:45:16 +01:00
  • d9ccce2e33
    Allow common process_escapes to handle \x sequences (#3928) Kerfuffle 2023-11-05 10:06:06 -07:00
  • bb60fd0bf6
    server : fix typo for --alias shortcut from -m to -a (#3958) Thái Hoàng Tâm 2023-11-05 23:15:27 +07:00
  • 132d25b8a6
    cuda : fix disabling device with --tensor-split 1,0 (#3951) Jared Van Bortel 2023-11-05 10:08:57 -05:00
  • 3d48f42efc
    llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) Meng Zhang 2023-11-05 04:40:08 -08:00
  • c41ea36eaa
    cmake : MSVC instruction detection (fixed up #809) (#3923) Eve 2023-11-05 08:03:09 +00:00
  • a7fac013cf
    ci : use intel sde when ci cpu doesn't support avx512 (#3949) Eve 2023-11-05 07:46:44 +00:00
  • 48ade94538
    cuda : revert CUDA pool stuff (#3944) slaren 2023-11-05 08:12:13 +01:00
  • f28af0d81a
    gguf-py: Support 01.AI Yi models (#3943) Kerfuffle 2023-11-04 16:20:34 -06:00
  • d9b33fe95b
    metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) Peter Sugihara 2023-11-03 12:18:18 -07:00
  • 5ba3746171
    ggml-metal: fix yarn rope (#3937) Xiao-Yong Jin 2023-11-03 13:00:31 -05:00
  • abb77e7319
    ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) slaren 2023-11-03 12:13:09 +01:00
  • 8f961abdc4
    speculative : change default p_accept to 0.5 + CLI args (#3919) Georgi Gerganov 2023-11-03 09:41:17 +02:00
  • 05816027d6
    common : YAYF (yet another YARN fix) (#3925) Georgi Gerganov 2023-11-03 09:24:00 +02:00
  • 3fdbe6b66b
    llama : change yarn_ext_factor placeholder to -1 (#3922) cebtenzzre 2023-11-03 02:31:58 -04:00
  • 629f917cd6
    cuda : add ROCM aliases for CUDA pool stuff (#3918) Kerfuffle 2023-11-02 13:58:22 -06:00
  • 51b2fc11f7
    cmake : fix relative path to git submodule index (#3915) Andrei 2023-11-02 15:40:31 -04:00
  • 224e7d5b14
    readme : add notice about #3912 Georgi Gerganov 2023-11-02 20:44:12 +02:00
  • c7743fe1c1
    cuda : fix const ptrs warning causing ROCm build issues (#3913) Georgi Gerganov 2023-11-02 20:32:11 +02:00
  • d6069051de
    cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) Oleksii Maryshchenko 2023-11-02 18:10:39 +01:00
  • 4ff1046d75
    gguf : print error for GGUFv1 files (#3908) Georgi Gerganov 2023-11-02 16:22:30 +02:00
  • 21958bb393
    cmake : disable LLAMA_NATIVE by default (#3906) slaren 2023-11-02 13:10:33 +01:00
  • 2756c4fbff
    gguf : remove special-case code for GGUFv1 (#3901) Georgi Gerganov 2023-11-02 11:20:21 +02:00
  • 1efae9b7dc
    llm : prevent from 1-D tensors being GPU split (#3697) Georgi Gerganov 2023-11-02 09:54:18 +02:00
  • b12fa0d1c1
    build : link against build info instead of compiling against it (#3879) cebtenzzre 2023-11-02 02:50:16 -04:00
  • 4d719a6d4e
    cuda : check if this fixes Pascal card regression (#3882) Georgi Gerganov 2023-11-02 08:35:10 +02:00
  • 183b3fac6c
    metal : fix build errors and kernel sig after #2268 (#3898) Georgi Gerganov 2023-11-02 08:33:37 +02:00
  • 2fffa0d61f
    cuda : fix RoPE after #2268 (#3897) cebtenzzre 2023-11-02 01:49:44 -04:00
  • 0eb332a10f
    llama : fix llama_context_default_params after #2268 (#3893) cebtenzzre 2023-11-01 19:29:14 -04:00
  • d02e98cde0
    ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891) slaren 2023-11-01 23:10:09 +01:00
  • 898aeca90a
    llama : implement YaRN RoPE scaling (#2268) cebtenzzre 2023-11-01 18:04:33 -04:00
  • c43c2da8af
    llm : fix llm_build_kqv taking unused tensor (benign, #3837) Georgi Gerganov 2023-11-01 23:08:30 +02:00
  • 523e49b111
    llm : fix falcon norm after refactoring (#3837) Georgi Gerganov 2023-11-01 23:00:50 +02:00
  • e16b9fa4ba
    metal : multi-simd softmax (#3710) Georgi Gerganov 2023-11-01 21:25:00 +02:00
  • ff8f9a88da
    common : minor (#3715) Georgi Gerganov 2023-11-01 21:15:55 +02:00
  • 50337961a6
    llm : add llm_build_context (#3881) Georgi Gerganov 2023-11-01 20:11:02 +02:00
  • 0e40806c1c
    common : allow caller to handle help/argument exceptions (#3715) bandoti 2023-11-01 14:42:01 -03:00
  • a2758d08e4
    log : make generating separate log files optional (#3787) staviq 2023-11-01 15:18:27 +01:00
  • e75dfdd31b
    sampling : null grammar field after reset (#3885) l3utterfly 2023-11-01 21:40:43 +08:00
  • 9a3b4f6c86
    ggml : fix UNUSED macro (#3762) Georgi Gerganov 2023-11-01 13:50:45 +02:00
  • 73bdcb395e
    finetune : add -ngl parameter (#3762) Andrew Godfrey 2023-11-01 04:49:04 -07:00
  • f0e209324a
    scripts : add server-llm.sh (#3868) Georgi Gerganov 2023-11-01 11:29:07 +02:00
  • ca190bca8e
    server : re-enable completion and embedded at the same time (#3876) Adrian Hesketh 2023-11-01 09:28:28 +00:00
  • 71e3718abd
    llama : refactor graph build code (#3837) Georgi Gerganov 2023-11-01 08:04:02 +02:00
  • 238657db23
    samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841) kalomaze 2023-10-31 14:44:49 -05:00
  • 07178c98e1
    flake.nix: fix for rocm 5.7 (#3853) Tungsten842 2023-10-31 18:24:03 +01:00
  • 207b51900e
    ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) Georgi Gerganov 2023-10-30 19:19:15 +02:00
  • 6e08281e58
    Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843) Kerfuffle 2023-10-29 11:31:40 -06:00
  • 2046eb4345
    make : remove unnecessary dependency on build-info.h (#3842) cebtenzzre 2023-10-29 12:33:47 -04:00
  • 71a09da301
    llama : fix kv shift bug (#3835) Georgi Gerganov 2023-10-29 18:32:51 +02:00
  • d69d777c02
    ggml : quantization refactoring (#3833) Georgi Gerganov 2023-10-29 18:32:28 +02:00
  • ff3bad83e2
    flake : update flake.lock for newer transformers version + provide extra dev shell (#3797) Erik Scholz 2023-10-28 16:41:07 +02:00
  • 82a6646e02
    metal : try cwd for ggml-metal.metal if bundle lookup fails (#3793) Aarni Koskela 2023-10-28 15:43:01 +03:00
  • ba231e8a6d
    issues : change label from bug to bug-unconfirmed (#3748) Georgi Gerganov 2023-10-28 15:25:33 +03:00
  • 8a2f2fea29
    convert : ignore tokens if their IDs are within [0, vocab_size) (#3831) Georgi Gerganov 2023-10-28 15:25:15 +03:00
  • bd6d9e2059
    llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747) Kerfuffle 2023-10-28 05:54:24 -06:00
  • ee1a0ec9cb
    llama : add option for greedy sampling with probs (#3813) Georgi Gerganov 2023-10-28 14:23:11 +03:00
  • 177461104b
    common : print that one line of the syntax help *also* to standard output (#3823) Henk Poley 2023-10-28 12:16:33 +02:00
  • fdee152e4e
    starcoder : add GPU offloading (#3827) Georgi Gerganov 2023-10-28 12:06:08 +03:00
  • 41aee4df82
    speculative : ensure draft and target model vocab matches (#3812) Kerfuffle 2023-10-27 15:40:07 -06:00
  • 6d459cbfbe
    llama : correctly report GGUFv3 format (#3818) cebtenzzre 2023-10-27 17:33:53 -04:00
  • c8d6a1f34a
    simple : fix batch handling (#3803) Thibault Terrasson 2023-10-27 16:37:41 +02:00
  • 2f9ec7e271
    cuda : improve text-generation and batched decoding performance (#3776) Georgi Gerganov 2023-10-27 17:01:23 +03:00
  • 34b2a5e1ee
    server : do not release slot on image input (#3798) Georgi Gerganov 2023-10-26 22:53:37 +03:00
  • 6961c4bd0b
    batched-bench : print params at start Georgi Gerganov 2023-10-25 10:26:27 +03:00
  • cc44877486
    log : disable pid in log filenames Georgi Gerganov 2023-10-25 10:09:16 +03:00
  • ad93962657
    server : add parameter -tb N, --threads-batch N (#3584) (#3768) cebtenzzre 2023-10-24 16:10:43 -04:00
  • 1717521cdb
    server : do not block system prompt update (#3767) Georgi Gerganov 2023-10-24 23:08:20 +03:00
  • b2f7e04bd3
    sync : ggml (conv ops + cuda MSVC fixes) (#3765) Georgi Gerganov 2023-10-24 21:51:20 +03:00
  • abd21fc99f
    cmake : add missed dependencies (#3763) John Smith 2023-10-25 01:48:45 +08:00
  • 2b4ea35e56
    cuda : add batched cuBLAS GEMM for faster attention (#3749) Georgi Gerganov 2023-10-24 16:48:37 +03:00
  • daab3d7f45
    Add more tokenizer tests (#3742) Galunid 2023-10-24 09:17:17 +02:00
  • 469c9addef
    metal : handle ggml_scale for n%4 != 0 (close #3754) Georgi Gerganov 2023-10-24 09:46:50 +03:00
  • e3932593d4
    Revert "make : add optional CUDA_NATIVE_ARCH (#2482)" Georgi Gerganov 2023-10-23 23:46:05 +03:00
  • 9d02956443
    issues : separate bug and enhancement template + no default title (#3748) M. Yusuf Sarıgöz 2023-10-23 22:57:16 +03:00
  • 69a6735087
    Update special token handling in conversion scripts for gpt2 derived tokenizers (#3746) Galunid 2023-10-23 21:46:00 +02:00
  • 5be6c803fa
    llama : remove token functions with context args in favor of model (#3720) Marcus Dunn 2023-10-23 12:40:03 -07:00
  • 6336701c93
    Fix baichuan convert script not detecing model (#3739) Galunid 2023-10-23 17:47:03 +02:00
  • 96981f37b1
    make : add optional CUDA_NATIVE_ARCH (#2482) Alex 2023-10-22 15:56:53 -04:00
  • 438c2ca830
    server : parallel decoding and multimodal (#3677) Georgi Gerganov 2023-10-22 22:53:08 +03:00
  • 9e70cc0322
    Add test for MPT tokenization (#3728) goerch 2023-10-22 21:21:42 +02:00
  • 5a42a5f8e8
    readme : remove unsupported node.js library (#3703) Ian Scrivener 2023-10-23 05:16:43 +11:00
  • a5e7dbd614
    llama : validate special token ids are in range when loading GGUF model (#3635) Kerfuffle 2023-10-22 12:14:56 -06:00
  • d3956aea53
    main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623) vvhg1 2023-10-22 20:09:51 +02:00
  • 22c69a2794
    batched : add len CLI argument Georgi Gerganov 2023-10-22 08:37:20 +03:00
  • 465219b914 CLBlast: Add outer loops over src0 for broadcasting in mulmat shibe2 2023-10-12 16:01:23 +04:00
  • d1031cf49c
    sampling : refactor init to use llama_sampling_params (#3696) Georgi Gerganov 2023-10-20 21:07:23 +03:00
  • 8cf19d60dc
    gguf : support big endian platform (#3552) Qin Yue Chen 2023-10-20 06:19:40 -05:00
  • a0edf73bda
    server : fix uninitialized sampling context (close #3685) Georgi Gerganov 2023-10-20 13:06:10 +03:00
  • f439e506e8
    ggml : fix rope + llama minor optimizations (#3560) Herman Semenov 2023-10-20 10:02:12 +00:00
  • e78f3ef24a
    convert : restore compat with old Falcon models (#3680) cebtenzzre 2023-10-20 01:32:08 -04:00
  • f3b25e4043
    multimodal : add BakLLaVA conversion support (#3682) M. Yusuf Sarıgöz 2023-10-19 19:40:41 +03:00
  • 60abea9798
    llava : avoid segfault in case of non-existent mmproj file (#3674) M. Yusuf Sarıgöz 2023-10-19 16:59:11 +03:00
  • 004797f6ac
    readme : update hot topics Georgi Gerganov 2023-10-18 21:44:43 +03:00
  • 4e82b2ea3f
    speculative : bug fixes Georgi Gerganov 2023-10-18 18:49:40 +03:00
  • 0e89203b51
    speculative : add tree-based sampling example (#3624) Georgi Gerganov 2023-10-18 16:21:57 +03:00
  • c67fe68e41
    metal : implement q5_0 and q5_1 kernels (#3648) Jhen-Jie Hong 2023-10-18 07:21:48 -05:00
  • 1117d06607
    opencl : fix element-wise multiplication (#3656) shibe2 2023-10-18 16:09:22 +04:00
  • cb33f43a2a
    fix embeddings when using CUDA (#3657) slaren 2023-10-17 22:24:50 +02:00