Commit Graph

  • 463173a6c0
    llama : speedup tokenization (#2831) Kawrakow 2023-08-27 16:50:33 +03:00
  • eaa13a48ff
    falcon : fix CUDA inference by making K and Q contiguous (#2830) Georgi Gerganov 2023-08-27 16:40:48 +03:00
  • da7455d046
    readme : fix headings Georgi Gerganov 2023-08-27 15:52:34 +03:00
  • 25423e9185
    scripts : helper convert script Georgi Gerganov 2023-08-27 15:24:40 +03:00
  • a6d1189fdd
    k_quants tuning for Falcon-7b (#2816) Kawrakow 2023-08-27 15:19:59 +03:00
  • c48c5bb0b0
    readme : update hot topics Georgi Gerganov 2023-08-27 14:44:35 +03:00
  • d0cee0d36d
    gguf : add 64-bit support (GGUF v2) (#2821) Georgi Gerganov 2023-08-27 14:19:54 +03:00
  • edd4c14817
    llama : more tokenizer fixes (#2810) Georgi Gerganov 2023-08-27 14:19:19 +03:00
  • 1591e2e590
    ggml : detect SSSE3 (#2825) Przemysław Pawełczyk 2023-08-27 10:10:25 +02:00
  • 789c8c945a
    ci : add LoRA test to CI (#2650) slaren 2023-08-27 09:03:27 +02:00
  • c1ac54b77a
    server : add /detokenize endpoint (#2802) Bruce MacDonald 2023-08-26 16:11:45 -07:00
  • 730d9c681e
    convert.py : advanced option (#2753) Kerfuffle 2023-08-26 14:13:36 -06:00
  • c7d92e6dfe
    llama : use Unicode Escape Sequence to replace encoded characters (#2814) Tim Miller 2023-08-27 03:27:07 +09:00
  • 61d1a2895e
    flake.nix : add rocm support and cleanup (#2808) Tungsten842 2023-08-26 20:19:44 +02:00
  • 741ca7dd1c
    llama : move #includes out of _GNU_SOURCE conditional (#2817) Cebtenzzre 2023-08-26 14:17:51 -04:00
  • 72f895c923
    main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (#1528) Dr. Tom Murphy VII Ph.D 2023-08-26 14:12:56 -04:00
  • 50526f37eb
    llama : use std::abs in llama_sample_tail_free (#2800) Cebtenzzre 2023-08-26 12:53:52 -04:00
  • 04f4b1eb10
    k-quants : remove unnecessary tensor shape restrictions (#2811) Georgi Gerganov 2023-08-26 17:37:35 +03:00
  • 7592375403
    Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (#2807) Kawrakow 2023-08-26 17:27:49 +03:00
  • 771551a793
    Fix HellaSwag (#2805) Kawrakow 2023-08-26 16:48:53 +03:00
  • f305bad11e
    flake : build llama.cpp on Intel with nix (#2795) Volodymyr Vitvitskyi 2023-08-26 14:25:39 +01:00
  • a2ca4e9de9
    Handle null rope scaling value (#2793) Nigel Bosch 2023-08-26 07:11:17 -05:00
  • 2ba83c8685
    Fix spm whitespaces (#2806) klosax 2023-08-26 13:45:53 +02:00
  • bae5c5f679
    examples : skip unnecessary external lib in server README.md how-to (#2804) lon 2023-08-26 10:07:43 +02:00
  • 232caf3c15
    llama : fix struct decl (#2790) Marcus Dunn 2023-08-25 09:17:15 -07:00
  • d046dcee08
    Faster perplexity computation (#2786) Kawrakow 2023-08-25 19:05:02 +03:00
  • c82742ac9c
    llama : add llama_beam_search() (#2267) Matt Pulver 2023-08-25 11:18:48 -04:00
  • 28b2c996ca
    convert.py : Get rope scale from HuggingFace models (#2772) Nigel Bosch 2023-08-25 09:41:52 -05:00
  • 154725c543
    llama-bench : add model sizes (#2771) slaren 2023-08-25 15:16:19 +02:00
  • 12e2e33a97
    convert.py : export rope freq_base when converting CodeLlama from an HF model (#2773) slaren 2023-08-25 14:08:53 +02:00
  • 29674ab4e8
    server : display token probabilities in the UI (#2489) Jhen-Jie Hong 2023-08-25 18:32:45 +08:00
  • 5439a0ab57
    ci : pip install gguf in editable mode (#2782) Georgi Gerganov 2023-08-25 13:03:25 +03:00
  • 8194cd8772
    gguf : export objects to user code (#2780) M. Yusuf Sarıgöz 2023-08-25 12:43:41 +03:00
  • 6bbc598a63
    ROCm Port (#1087) Henri Vasserman 2023-08-25 12:09:42 +03:00
  • 3f460a2b72
    cuda : add RoPE kernel for mode == 2 (NeoX) (#2760) Georgi Gerganov 2023-08-25 11:55:59 +03:00
  • 87e3733f24
    gguf : make gguf pip-installable M. Yusuf Sarıgöz 2023-08-25 09:26:05 +03:00
  • b91ad7f461
    ggml-alloc : enlarge size of parse_seq (#2776) Shouzheng Liu 2023-08-25 01:58:00 -04:00
  • 2e5f70a25f
    Added enum to llama_token_get_type return type (#2774) Marcus Dunn 2023-08-24 14:49:30 -07:00
  • d0f77b1353
    convert.py : try to determine n_ctx automatically for CodeLlama (#2770) slaren 2023-08-24 21:10:39 +02:00
  • 0d3094f0c7
    gguf : add rope_freq_base parameter for CodeLlama (#2769) slaren 2023-08-24 20:04:05 +02:00
  • 01f2224682
    falcon : write file type Georgi Gerganov 2023-08-24 19:58:30 +03:00
  • 38b16dfca6
    metal : bug-fix when enable ggml-alloc (#2757) Shouzheng Liu 2023-08-24 12:27:25 -04:00
  • 8f8c28e89c
    convert : auto-determine model name based on dir + scripts update Georgi Gerganov 2023-08-24 19:26:19 +03:00
  • 7694adda8d
    Fix for main example getting stuck when -n -2 and --interactive (#2767) Kerfuffle 2023-08-24 10:11:13 -06:00
  • fea95c682d
    fix convert.py for codellama, add llama 34B to the list of recognized models (#2768) slaren 2023-08-24 17:44:11 +02:00
  • ef955fbd23
    Tag release with build number (#2732) DannyDaemonic 2023-08-24 06:58:02 -07:00
  • d67777c202
    metal : add Q8_0 support (#2763) Georgi Gerganov 2023-08-24 16:19:57 +03:00
  • c3e53b421a
    llama : escape all U+2581 in a string (#2750) Georgi Gerganov 2023-08-24 12:26:01 +03:00
  • 6e91a1b070
    llama : fix grammar sometimes generating null char (#2756) Evan Jones 2023-08-24 00:07:13 -04:00
  • 44d5462b5c
    readme : fix link Georgi Gerganov 2023-08-23 23:44:19 +03:00
  • c7868b0753
    minor : fix trailing whitespace Georgi Gerganov 2023-08-23 23:43:00 +03:00
  • 79da24b58c
    readme : update hot topics Georgi Gerganov 2023-08-23 23:41:16 +03:00
  • cf658adc83
    llm : add Falcon support (#2717) Georgi Gerganov 2023-08-23 23:08:04 +03:00
  • a192860cfe
    minor : fix trailing whitespace Georgi Gerganov 2023-08-23 22:37:39 +03:00
  • 95385241a9
    examples : restore the functionality to import llama2.c models (#2685) Olivier Chafik 2023-08-23 20:33:05 +01:00
  • 335acd2ffd
    fix convert-lora-to-ggml.py (#2738) slaren 2023-08-23 16:46:54 +02:00
  • 5290c38e6e
    main : insert bos if no tokens (#2727) klosax 2023-08-23 16:46:03 +02:00
  • cc34dbda96
    gitignore : fix for windows (#2729) akawrykow 2023-08-23 07:31:34 -07:00
  • 7c2227a197
    chmod : make scripts executable (#2675) Cebtenzzre 2023-08-23 10:29:09 -04:00
  • f19dca04ea
    devops : RPM Specs (#2723) JohnnyB 2023-08-23 15:28:22 +01:00
  • 8207214b6a
    Fix values shown in the quantize tool help (#2735) Kawrakow 2023-08-23 12:57:12 +03:00
  • 62959e740e
    Strided perplexity (#2714) Kawrakow 2023-08-23 12:56:42 +03:00
  • 7f7ddd5002
    Fix ggml to gguf conversion on Windows (#2733) IgnacioFDM 2023-08-23 06:31:09 -03:00
  • b8ad1b66b2
    server : allow json array in prompt or content for direct token input (#2306) Xiao-Yong Jin 2023-08-23 02:12:12 -05:00
  • f5fe98d11b
    docs : add grammar docs (#2701) Evan Jones 2023-08-22 21:01:57 -04:00
  • 777f42ba18
    Improve handling of special tokens in GGML to GGUF converter (#2725) Kerfuffle 2023-08-22 17:39:39 -06:00
  • 46ef5b5fcf
    llama : fix whitespace escaping in tokenizer (#2724) goerch 2023-08-22 23:10:42 +02:00
  • c63bb1d16a
    CUDA: use mul_mat_q kernels by default (#2683) Johannes Gäßler 2023-08-22 22:47:05 +02:00
  • 3b6cfe7c92
    convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
  • 800c9635b4
    Fix CUDA softmax by subtracting max value before exp (#2665) Jiahao Li 2023-08-23 02:27:06 +08:00
  • deb7dfca4b
    gguf : add ftype meta info to the model (#2710) Georgi Gerganov 2023-08-22 20:05:59 +03:00
  • bac66994cf
    Quantization imrovements for k_quants (#2707) Kawrakow 2023-08-22 19:14:09 +03:00
  • 519c981f8b
    embedding : evaluate prompt in batches (#2713) slaren 2023-08-22 16:03:12 +02:00
  • 1123f7fbdf
    ggml-cuda : use graph allocator (#2684) slaren 2023-08-22 15:25:19 +02:00
  • ef3f333d37
    ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) Georgi Gerganov 2023-08-22 14:22:08 +03:00
  • 8e4364f2af
    llama-bench : minor fixes (#2695) slaren 2023-08-22 09:56:03 +02:00
  • 1e3bc523d8
    ggml : support CUDA's half type for aarch64(#1455) (#2670) Kylin 2023-08-22 15:14:23 +08:00
  • 14b1d7e6f7
    metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
  • 226255b44e
    server : fallback to default if client param is null (#2688) Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
  • 930523c8e1
    Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
  • c8dba409e6
    py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
  • 6381d4e110
    gguf : new file format with flexible meta data (beta) (#2398) Georgi Gerganov 2023-08-21 23:07:43 +03:00
  • dadbed99e6
    metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
  • cb1c0727bd
    HellaSwag: split token evaluation into batches if needed (#2681) Kawrakow 2023-08-21 11:11:31 +03:00
  • 9e232f0234
    ggml : move all type info to ggml_type_traits (#2663) slaren 2023-08-20 22:17:53 +02:00
  • 5e9ff54a67
    More efficient Hellaswag implementation (#2677) Kawrakow 2023-08-20 16:44:46 +03:00
  • 1f0bccb279
    server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
  • f63564adfa
    server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
  • 2d8b76a110
    Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
  • 7af633aec3
    readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
  • 097e121e2f
    llama : add benchmark example (#2626) slaren 2023-08-18 12:44:58 +02:00
  • eaf98c2649
    readme : add link to Rust bindings (#2656) mdrokz 2023-08-18 15:47:58 +05:30
  • e9b12c332e
    perplexity : more meaningful ETA number - 2 decimal points Georgi Gerganov 2023-08-18 12:48:55 +03:00
  • 604b8bdfa6
    Fix unicode in grammars (fixes #2501) (#2553) Evan Jones 2023-08-17 19:54:44 -04:00
  • 10151bee2e
    server : support for saving templates in browser LocalStorage (#2486) staviq 2023-08-17 23:34:01 +00:00
  • 0992a7b8b1
    README: fix LLAMA_CUDA_MMV_Y documentation (#2647) Johannes Gäßler 2023-08-17 23:57:59 +02:00
  • 6ddeefad9b
    [Zig] Fixing Zig build and improvements (#2554) Henri Vasserman 2023-08-17 23:11:18 +03:00
  • 8dae7ce684
    Add --cfg-negative-prompt-file option for examples (#2591) Kerfuffle 2023-08-17 07:29:44 -06:00
  • a73ccf1aa3
    llama : replace (permute + reshape + view_1d) with (view_3d) (#2538) Georgi Gerganov 2023-08-17 10:47:09 +03:00
  • 7cf54e1f74
    tests : adds simple llama grammar tests (#2618) drbh 2023-08-17 03:41:01 -04:00