llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-03 14:13:46 +08:00

463173a6c0

llama : speedup tokenization (#2831) Kawrakow 2023-08-27 16:50:33 +03:00
eaa13a48ff

falcon : fix CUDA inference by making K and Q contiguous (#2830) Georgi Gerganov 2023-08-27 16:40:48 +03:00
da7455d046

readme : fix headings Georgi Gerganov 2023-08-27 15:52:34 +03:00
25423e9185

scripts : helper convert script Georgi Gerganov 2023-08-27 15:24:40 +03:00
a6d1189fdd

k_quants tuning for Falcon-7b (#2816) Kawrakow 2023-08-27 15:19:59 +03:00
c48c5bb0b0

readme : update hot topics Georgi Gerganov 2023-08-27 14:44:35 +03:00
d0cee0d36d

gguf : add 64-bit support (GGUF v2) (#2821) Georgi Gerganov 2023-08-27 14:19:54 +03:00
edd4c14817

llama : more tokenizer fixes (#2810) Georgi Gerganov 2023-08-27 14:19:19 +03:00
1591e2e590

ggml : detect SSSE3 (#2825) Przemysław Pawełczyk 2023-08-27 10:10:25 +02:00
789c8c945a

ci : add LoRA test to CI (#2650) slaren 2023-08-27 09:03:27 +02:00
c1ac54b77a

server : add /detokenize endpoint (#2802) Bruce MacDonald 2023-08-26 16:11:45 -07:00
730d9c681e

convert.py : advanced option (#2753) Kerfuffle 2023-08-26 14:13:36 -06:00
c7d92e6dfe

llama : use Unicode Escape Sequence to replace encoded characters (#2814) Tim Miller 2023-08-27 03:27:07 +09:00
61d1a2895e

flake.nix : add rocm support and cleanup (#2808) Tungsten842 2023-08-26 20:19:44 +02:00
741ca7dd1c

llama : move #includes out of _GNU_SOURCE conditional (#2817) Cebtenzzre 2023-08-26 14:17:51 -04:00
72f895c923

main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (#1528) Dr. Tom Murphy VII Ph.D 2023-08-26 14:12:56 -04:00
50526f37eb

llama : use std::abs in llama_sample_tail_free (#2800) Cebtenzzre 2023-08-26 12:53:52 -04:00
04f4b1eb10

k-quants : remove unnecessary tensor shape restrictions (#2811) Georgi Gerganov 2023-08-26 17:37:35 +03:00
7592375403

Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (#2807) Kawrakow 2023-08-26 17:27:49 +03:00
771551a793

Fix HellaSwag (#2805) Kawrakow 2023-08-26 16:48:53 +03:00
f305bad11e

flake : build llama.cpp on Intel with nix (#2795) Volodymyr Vitvitskyi 2023-08-26 14:25:39 +01:00
a2ca4e9de9

Handle null rope scaling value (#2793) Nigel Bosch 2023-08-26 07:11:17 -05:00
2ba83c8685

Fix spm whitespaces (#2806) klosax 2023-08-26 13:45:53 +02:00
bae5c5f679

examples : skip unnecessary external lib in server README.md how-to (#2804) lon 2023-08-26 10:07:43 +02:00
232caf3c15

llama : fix struct decl (#2790) Marcus Dunn 2023-08-25 09:17:15 -07:00
d046dcee08

Faster perplexity computation (#2786) Kawrakow 2023-08-25 19:05:02 +03:00
c82742ac9c

llama : add llama_beam_search() (#2267) Matt Pulver 2023-08-25 11:18:48 -04:00
28b2c996ca

convert.py : Get rope scale from HuggingFace models (#2772) Nigel Bosch 2023-08-25 09:41:52 -05:00
154725c543

llama-bench : add model sizes (#2771) slaren 2023-08-25 15:16:19 +02:00
12e2e33a97

convert.py : export rope freq_base when converting CodeLlama from an HF model (#2773) slaren 2023-08-25 14:08:53 +02:00
29674ab4e8

server : display token probabilities in the UI (#2489) Jhen-Jie Hong 2023-08-25 18:32:45 +08:00
5439a0ab57

ci : pip install gguf in editable mode (#2782) Georgi Gerganov 2023-08-25 13:03:25 +03:00
8194cd8772

gguf : export objects to user code (#2780) M. Yusuf Sarıgöz 2023-08-25 12:43:41 +03:00
6bbc598a63

ROCm Port (#1087) Henri Vasserman 2023-08-25 12:09:42 +03:00
3f460a2b72

cuda : add RoPE kernel for mode == 2 (NeoX) (#2760) Georgi Gerganov 2023-08-25 11:55:59 +03:00
87e3733f24

gguf : make gguf pip-installable M. Yusuf Sarıgöz 2023-08-25 09:26:05 +03:00
b91ad7f461

ggml-alloc : enlarge size of parse_seq (#2776) Shouzheng Liu 2023-08-25 01:58:00 -04:00
2e5f70a25f

Added enum to llama_token_get_type return type (#2774) Marcus Dunn 2023-08-24 14:49:30 -07:00
d0f77b1353

convert.py : try to determine n_ctx automatically for CodeLlama (#2770) slaren 2023-08-24 21:10:39 +02:00
0d3094f0c7

gguf : add rope_freq_base parameter for CodeLlama (#2769) slaren 2023-08-24 20:04:05 +02:00
01f2224682

falcon : write file type Georgi Gerganov 2023-08-24 19:58:30 +03:00
38b16dfca6

metal : bug-fix when enable ggml-alloc (#2757) Shouzheng Liu 2023-08-24 12:27:25 -04:00
8f8c28e89c

convert : auto-determine model name based on dir + scripts update Georgi Gerganov 2023-08-24 19:26:19 +03:00
7694adda8d

Fix for main example getting stuck when -n -2 and --interactive (#2767) Kerfuffle 2023-08-24 10:11:13 -06:00
fea95c682d

fix convert.py for codellama, add llama 34B to the list of recognized models (#2768) slaren 2023-08-24 17:44:11 +02:00
ef955fbd23

Tag release with build number (#2732) DannyDaemonic 2023-08-24 06:58:02 -07:00
d67777c202

metal : add Q8_0 support (#2763) Georgi Gerganov 2023-08-24 16:19:57 +03:00
c3e53b421a

llama : escape all U+2581 in a string (#2750) Georgi Gerganov 2023-08-24 12:26:01 +03:00
6e91a1b070

llama : fix grammar sometimes generating null char (#2756) Evan Jones 2023-08-24 00:07:13 -04:00
44d5462b5c

readme : fix link Georgi Gerganov 2023-08-23 23:44:19 +03:00
c7868b0753

minor : fix trailing whitespace Georgi Gerganov 2023-08-23 23:43:00 +03:00
79da24b58c

readme : update hot topics Georgi Gerganov 2023-08-23 23:41:16 +03:00
cf658adc83

llm : add Falcon support (#2717) Georgi Gerganov 2023-08-23 23:08:04 +03:00
a192860cfe

minor : fix trailing whitespace Georgi Gerganov 2023-08-23 22:37:39 +03:00
95385241a9

examples : restore the functionality to import llama2.c models (#2685) Olivier Chafik 2023-08-23 20:33:05 +01:00
335acd2ffd

fix convert-lora-to-ggml.py (#2738) slaren 2023-08-23 16:46:54 +02:00
5290c38e6e

main : insert bos if no tokens (#2727) klosax 2023-08-23 16:46:03 +02:00
cc34dbda96

gitignore : fix for windows (#2729) akawrykow 2023-08-23 07:31:34 -07:00
7c2227a197

chmod : make scripts executable (#2675) Cebtenzzre 2023-08-23 10:29:09 -04:00
f19dca04ea

devops : RPM Specs (#2723) JohnnyB 2023-08-23 15:28:22 +01:00
8207214b6a

Fix values shown in the quantize tool help (#2735) Kawrakow 2023-08-23 12:57:12 +03:00
62959e740e

Strided perplexity (#2714) Kawrakow 2023-08-23 12:56:42 +03:00
7f7ddd5002

Fix ggml to gguf conversion on Windows (#2733) IgnacioFDM 2023-08-23 06:31:09 -03:00
b8ad1b66b2

server : allow json array in prompt or content for direct token input (#2306) Xiao-Yong Jin 2023-08-23 02:12:12 -05:00
f5fe98d11b

docs : add grammar docs (#2701) Evan Jones 2023-08-22 21:01:57 -04:00
777f42ba18

Improve handling of special tokens in GGML to GGUF converter (#2725) Kerfuffle 2023-08-22 17:39:39 -06:00
46ef5b5fcf

llama : fix whitespace escaping in tokenizer (#2724) goerch 2023-08-22 23:10:42 +02:00
c63bb1d16a

CUDA: use mul_mat_q kernels by default (#2683) Johannes Gäßler 2023-08-22 22:47:05 +02:00
3b6cfe7c92

convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
800c9635b4

Fix CUDA softmax by subtracting max value before exp (#2665) Jiahao Li 2023-08-23 02:27:06 +08:00
deb7dfca4b

gguf : add ftype meta info to the model (#2710) Georgi Gerganov 2023-08-22 20:05:59 +03:00
bac66994cf

Quantization imrovements for k_quants (#2707) Kawrakow 2023-08-22 19:14:09 +03:00
519c981f8b

embedding : evaluate prompt in batches (#2713) slaren 2023-08-22 16:03:12 +02:00
1123f7fbdf

ggml-cuda : use graph allocator (#2684) slaren 2023-08-22 15:25:19 +02:00
ef3f333d37

ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) Georgi Gerganov 2023-08-22 14:22:08 +03:00
8e4364f2af

llama-bench : minor fixes (#2695) slaren 2023-08-22 09:56:03 +02:00
1e3bc523d8

ggml : support CUDA's half type for aarch64(#1455) (#2670) Kylin 2023-08-22 15:14:23 +08:00
14b1d7e6f7

metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
226255b44e

server : fallback to default if client param is null (#2688) Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
930523c8e1

Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
c8dba409e6

py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
6381d4e110

gguf : new file format with flexible meta data (beta) (#2398) Georgi Gerganov 2023-08-21 23:07:43 +03:00
dadbed99e6

metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
cb1c0727bd

HellaSwag: split token evaluation into batches if needed (#2681) Kawrakow 2023-08-21 11:11:31 +03:00
9e232f0234

ggml : move all type info to ggml_type_traits (#2663) slaren 2023-08-20 22:17:53 +02:00
5e9ff54a67

More efficient Hellaswag implementation (#2677) Kawrakow 2023-08-20 16:44:46 +03:00
1f0bccb279

server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
f63564adfa

server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
2d8b76a110

Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
7af633aec3

readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
097e121e2f

llama : add benchmark example (#2626) slaren 2023-08-18 12:44:58 +02:00
eaf98c2649

readme : add link to Rust bindings (#2656) mdrokz 2023-08-18 15:47:58 +05:30
e9b12c332e

perplexity : more meaningful ETA number - 2 decimal points Georgi Gerganov 2023-08-18 12:48:55 +03:00
604b8bdfa6

Fix unicode in grammars (fixes #2501) (#2553) Evan Jones 2023-08-17 19:54:44 -04:00
10151bee2e

server : support for saving templates in browser LocalStorage (#2486) staviq 2023-08-17 23:34:01 +00:00
0992a7b8b1

README: fix LLAMA_CUDA_MMV_Y documentation (#2647) Johannes Gäßler 2023-08-17 23:57:59 +02:00
6ddeefad9b

[Zig] Fixing Zig build and improvements (#2554) Henri Vasserman 2023-08-17 23:11:18 +03:00
8dae7ce684

Add --cfg-negative-prompt-file option for examples (#2591) Kerfuffle 2023-08-17 07:29:44 -06:00
a73ccf1aa3

llama : replace (permute + reshape + view_1d) with (view_3d) (#2538) Georgi Gerganov 2023-08-17 10:47:09 +03:00
7cf54e1f74

tests : adds simple llama grammar tests (#2618) drbh 2023-08-17 03:41:01 -04:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master