llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-04 06:33:12 +08:00

bee938da74

nix: remove nixConfig from flake.nix (#4984) Philip Taron 2024-01-16 09:56:21 -08:00
cec8a48470

finetune : add training data file to log message (#4979) Daniel Bevenius 2024-01-16 18:54:24 +01:00
334a835a1c

ggml : importance matrix support for legacy quants (#4969) Kawrakow 2024-01-16 19:51:26 +02:00
4feb4b33ee

examples : add complete parallel function calling example (#4974) Maximilian Winter 2024-01-16 18:41:42 +01:00
959ef0c0df

perplexity : fix kv cache handling for hellaswag (#4981) Georgi Gerganov 2024-01-16 19:34:54 +02:00
c37b3474e6

flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920) Georgi Gerganov 2024-01-16 19:13:54 +02:00
158f8c9e21

metal : localized logic in ggml_metal_graph_compute (#4924) Paul Tsochantaris 2024-01-16 17:05:19 +00:00
862f5e41ab

android : introduce starter project example (#4926) Neuman Vong 2024-01-17 00:47:34 +11:00
3a48d558a6

metal : replace loop of dispatch_async with dispatch_apply (#4934) Alex Azarov 2024-01-16 14:41:27 +01:00
7c8d3abd1a

metal : log recommendedMaxWorkingSetSize on iOS 16+ (#4936) Alex Azarov 2024-01-16 14:33:02 +01:00
122ed4840c

examples : fix and improv docs for the grammar generator (#4909) Maximilian Winter 2024-01-16 13:10:48 +01:00
a0b3ac8c48

ggml : introduce GGML_CALL function annotation (#4850) Justine Tunney 2024-01-16 03:16:33 -08:00
d75c232e1d

finetune : use LLAMA_FILE_MAGIC_GGLA (#4961) Daniel Bevenius 2024-01-16 12:14:19 +01:00
e0324285a5

speculative : threading options (#4959) stduhpf 2024-01-16 12:04:32 +01:00
3e5ca7931c

pass cpu-architecture arguments only to host code (C;C++) (#4943) ngc92 2024-01-15 20:40:48 +02:00
4483396751

llama : apply classifier-free guidance to logits directly (#4951) David Friehs 2024-01-15 14:06:52 +01:00
d9aa4ffa6e

awq-py : fix typo in awq-py/README.md (#4947) Victor Z. Peng 2024-01-15 04:41:46 -08:00
ddb008d845

cuda : fix dequantize kernel names (#4938) Georgi Gerganov 2024-01-15 13:27:00 +02:00
2faaef3979

llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950) Kawrakow 2024-01-15 10:09:38 +02:00
4a3156de2f

CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938) Kawrakow 2024-01-15 07:48:06 +02:00
a836c8f534

llama : fix missing quotes (#4937) David Pflug 2024-01-14 10:46:00 -05:00
467a882fd2

Add ability to use importance matrix for all k-quants (#4930) Kawrakow 2024-01-14 16:21:12 +02:00
bb0c139247

llama : check LLAMA_TRACE env for extra logging (#4929) Georgi Gerganov 2024-01-14 13:26:53 +02:00
9408cfdad6

scripts : sync-ggml-am.sh option to skip commits Georgi Gerganov 2024-01-14 11:08:09 +02:00
03c5267490

llama : use LLAMA_LOG_ macros for logging Georgi Gerganov 2024-01-14 11:03:19 +02:00
a128c38de8

Fix ffn_down quantization mix for MoE models (#4927) Kawrakow 2024-01-14 10:53:39 +02:00
5f5fe1bd60

metal : correctly set SIMD support flags on iOS (#4923) Alex Azarov 2024-01-14 09:44:39 +01:00
ac32902a87

llama : support WinXP build with MinGW 8.1.0 (#3419) Karthik Kumar Viswanathan 2024-01-14 00:41:44 -08:00
147b17ac94

2-bit quantizations (#4897) Kawrakow 2024-01-14 09:45:56 +02:00
807179ec58

Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) Kawrakow 2024-01-14 09:44:30 +02:00
76484fbfd3

sync : ggml Georgi Gerganov 2024-01-14 00:14:46 +02:00
c71d608ce7

ggml: cache sin/cos for RoPE (#4908) Johannes Gäßler 2024-01-13 21:41:37 +01:00
4be5ef556d

metal : remove old API (#4919) Georgi Gerganov 2024-01-13 20:45:45 +02:00
0ea069b87b

server : fix prompt caching with system prompt (#4914) Georgi Gerganov 2024-01-13 19:31:26 +02:00
f172de03f1

llama : fix detokenization of non-special added-tokens (#4916) Georgi Gerganov 2024-01-13 18:47:38 +02:00
2d57de5255

metal : disable log for loaded kernels (#4794) Georgi Gerganov 2024-01-13 18:46:37 +02:00
df845cc982

llama : minimize size used for state save/load (#4820) David Friehs 2024-01-13 17:29:43 +01:00
6b48ed0893

workflows: unbreak nix-build-aarch64, and split it out (#4915) Someone 2024-01-13 16:29:16 +00:00
722d33f34e

main : add parameter --no-display-prompt (#4541) Yann Follet 2024-01-14 00:09:08 +08:00
c30b1ef39a

gguf : fix potential infinite for-loop (#4600) texmex76 2024-01-13 17:06:20 +01:00
b38b5e93ae

metal : refactor kernel loading code (#4794) Georgi Gerganov 2024-01-13 18:03:45 +02:00
7dc78764e2

compare-llama-bench: tweak output format (#4910) Johannes Gäßler 2024-01-13 15:52:53 +01:00
356327feb3

server : fix deadlock that occurs in multi-prompt scenarios (#4905) Ziad Ben Hadj-Alouane 2024-01-13 09:20:46 -05:00
ee8243adaa

server : fix crash with multimodal models without BOS token (#4904) makomk 2024-01-13 14:16:11 +00:00
15ebe59210

convert : update phi-2 to latest HF repo (#4903) Georgi Gerganov 2024-01-13 13:44:37 +02:00
de473f5f8e

sync : ggml Georgi Gerganov 2024-01-12 22:02:43 +02:00
f238461236

ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758) Georgi Gerganov 2024-01-12 14:02:30 +02:00
fa5c1fb44a

backend_sched : fix assignments slaren 2024-01-12 20:38:34 +01:00
52ee4540c0

examples : add pydantic models to GBNF grammar generator (#4883) Maximilian Winter 2024-01-12 20:46:45 +01:00
3fe81781e3

CUDA: faster q8_0 -> f16 dequantization (#4895) Johannes Gäßler 2024-01-12 20:38:54 +01:00
e7e4df031b

llama : ggml-backend integration (#4766) slaren 2024-01-12 20:07:38 +01:00
584d674be6

llama : remove redundant assert for StableLM (#4901) Georgi Gerganov 2024-01-12 20:54:12 +02:00
930f907d3e

export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) Daniel Bevenius 2024-01-12 18:54:53 +01:00
e790eef21c

llama.swiftui : update models layout (#4826) Zay 2024-01-12 05:48:00 -07:00
5537d9d36b

gitignore : imatrix Georgi Gerganov 2024-01-12 14:33:21 +02:00
1b280c9fff

CUDA: fix softmax compile for old CUDA versions (#4862) Johannes Gäßler 2024-01-12 12:30:41 +01:00
3cabe80630

llama : fix typo "imp_embd" -> "inp_embd" Georgi Gerganov 2024-01-12 13:10:19 +02:00
4315a94366

common : streamline the formatting of help (#4890) howlger 2024-01-12 12:05:32 +01:00
2d00741e12

py : fix lint (#4889) Georgi Gerganov 2024-01-12 13:03:38 +02:00
f445c0e68c

llama : fix llm_build_k_shift to use correct n_rot (#4889) Georgi Gerganov 2024-01-12 13:01:56 +02:00
326b418b59

Importance Matrix calculation (#4861) Kawrakow 2024-01-12 06:59:57 +01:00
1d118386fe

server : fix infill when prompt is empty (#4833) Georgi Gerganov 2024-01-11 23:23:49 +02:00
7edefbd79c

main : better name for variable n_print (#4874) Georgi Gerganov 2024-01-11 22:46:26 +02:00
3ca63b4538

main : disable token count by default (#4874) Georgi Gerganov 2024-01-11 22:43:05 +02:00
b037787548

swift : track ggml release branch (#4867) Georgi Gerganov 2024-01-11 21:58:28 +02:00
469e75d0a3

llama : restore intended k-quants mixes for MoE models (#4872) Kawrakow 2024-01-11 20:43:15 +01:00
49662cbed3

ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) Kawrakow 2024-01-11 20:39:39 +01:00
3ba5b8ca8e

swift : pin ggml commit + remove ggml.h from spm-headers (#4878) Georgi Gerganov 2024-01-11 21:31:31 +02:00
4330bd83fe

server : implement credentialed CORS (#4514) Laura 2024-01-11 19:02:48 +01:00
27379455c3

server : support for multiple api keys (#4864) Michael Coppola 2024-01-11 12:51:17 -05:00
eab6795006

server : add LOG_INFO when model is successfully loaded (#4881) Behnam M 2024-01-11 12:41:39 -05:00
d8d90aa343

ci: nix-flake-update: new token with pr permissions (#4879) Someone 2024-01-11 17:22:34 +00:00
43f76bf1c3

main : print total token count and tokens consumed so far (#4874) pudepiedj 2024-01-11 16:14:52 +00:00
2f043328e3

server : fix typo in model name (#4876) Isaac McFadyen 2024-01-11 09:33:26 -05:00
2a7c94db5f

metal : put encoder debug group behind a define (#4873) Paul Tsochantaris 2024-01-11 14:31:52 +00:00
64802ec00d

sync : ggml Georgi Gerganov 2024-01-11 09:39:08 +02:00
3267c2abc7

metal : fix deprecation warning (ggml/690) Georgi Gerganov 2024-01-11 09:34:59 +02:00
f85a973aa1

ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693) Timothy Cronin 2024-01-11 02:27:48 -05:00
5362e43962

metal : wrap each operation in debug group (ggml/690) Jack Mousseau 2024-01-10 06:19:19 -08:00
e739de7909

ggml : change GGML_MAX_NAME at compile time (ggml/682) leejet 2024-01-10 21:13:42 +08:00
c910e3c28a

Fix execlp call (ggml/689) Halalaluyafail3 2024-01-09 11:16:37 -05:00
f34432ca1e

fix : cuda order of synchronization when setting a buffer (ggml/679) Erik Scholz 2024-01-05 16:00:00 +01:00
7a9f75c38b

server : update readme to document the new /health endpoint (#4866) Behnam M 2024-01-11 02:12:05 -05:00
5c1980d8d4

server : fix build + rename enums (#4870) Georgi Gerganov 2024-01-11 09:10:34 +02:00
cd108e641d

server : add a /health endpoint (#4860) Behnam M 2024-01-10 14:56:05 -05:00
57d016ba2d

llama : add additional suffixes for model params (#4834) Brian 2024-01-11 01:09:53 +11:00
329ff61569

llama : recognize 1B phi models (#4847) Austin 2024-01-10 08:39:09 -05:00
d34633d8db

clip : support more quantization types (#4846) John 2024-01-10 14:37:09 +01:00
4f56458d34

Python script to compare commits with llama-bench (#4844) Johannes Gäßler 2024-01-10 01:04:33 +01:00
6efb8eb30e

convert.py : fix vanilla LLaMA model conversion (#4818) Austin 2024-01-09 13:46:46 -05:00
36e5a08b20

llava-cli : don't crash if --image flag is invalid (#4835) Justine Tunney 2024-01-09 09:59:14 -08:00
4dccb38d9a

metal : improve dequantize precision to match CPU (#4836) Georgi Gerganov 2024-01-09 19:37:08 +02:00
9a818f7c42

scripts : improve get-pg.sh (#4838) Georgi Gerganov 2024-01-09 19:20:45 +02:00
18adb4e9bb

readme : add 3rd party collama reference to UI list (#4840) iohub 2024-01-10 00:45:54 +08:00
d9653894df

scripts : script to get Paul Graham essays in txt format (#4838) Georgi Gerganov 2024-01-09 16:23:05 +02:00
128de3585b

server : update readme about token probs (#4777) Behnam M 2024-01-09 05:02:05 -05:00
8c58330318

server : add api-key flag to documentation (#4832) Zsapi 2024-01-09 10:12:43 +01:00
18c2e1752c

ggml : fix vld1q_s8_x4 32-bit compat (#4828) Georgi Gerganov 2024-01-09 10:42:06 +02:00
8f900abfc0

CUDA: faster softmax via shared memory + fp16 math (#4742) Johannes Gäßler 2024-01-09 08:58:55 +01:00
1fc2f265ff

common : fix the short form of --grp-attn-w, not -gat (#4825) howlger 2024-01-08 20:05:53 +01:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master