llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-04 06:33:12 +08:00

95d576b48e

metal : pad n_ctx by 32 (#6177) Georgi Gerganov 2024-03-22 09:36:03 +02:00
59c17f02de

add blog link (#6222) Neo Zhang Jianyu 2024-03-22 15:19:37 +08:00
fa046eafbc

Fix params underscore convert to dash. (#6203) DAN™ 2024-03-21 21:32:42 -04:00
be07a03217

server : update readme doc from slot_id to id_slot (#6213) Jan Boon 2024-03-22 06:41:24 +08:00
d0a71233fb

cuda : disable host register by default (#6206) slaren 2024-03-21 19:54:28 +01:00
f372c49ccd

Corrected typo to wrong file (#6199) semidark 2024-03-21 11:52:35 -06:00
924ce1dce7

tests : disable system() calls (#6198) Georgi Gerganov 2024-03-21 16:20:05 +02:00
03a8f8fafe

cuda : fix LLAMA_CUDA_F16 build (#6197) slaren 2024-03-21 13:59:53 +01:00
cfd3be76e3

ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) Kawrakow 2024-03-21 13:59:38 +01:00
5b7b0ac8df

json-schema-to-grammar improvements (+ added to server) (#5978) Olivier Chafik 2024-03-21 11:50:43 +00:00
1943c01981

ci : fix indentation error (#6195) Vaibhav Srivastav 2024-03-21 10:30:40 +01:00
5e43ba8742

build : add mac pre-build binaries (#6182) Vaibhav Srivastav 2024-03-21 10:13:12 +01:00
76aa30a263

Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) Kawrakow 2024-03-21 08:27:57 +01:00
c5b8595e3f

Add nvidia and amd backends (#6157) AidanBeltonS 2024-03-21 06:10:52 +00:00
42e21c6882

cuda : fix conflict with std::swap (#6186) slaren 2024-03-21 01:47:46 +01:00
1c51f98adc

cuda : print the returned error when CUDA initialization fails (#6185) slaren 2024-03-20 21:03:26 +01:00
f9c7ba3447

llava : update MobileVLM-README.md (#6180) Ziang Wu 2024-03-20 23:29:51 +08:00
272935b281

llava : add MobileVLM_V2 backup (#6175) Ziang Wu 2024-03-20 23:02:32 +08:00
ccf58aa3ec

cuda : refactor to remove global resources (#6170) slaren 2024-03-20 14:42:59 +01:00
91f8ad167d

Server: version bump for httplib and json (#6169) Xuan Son Nguyen 2024-03-20 13:30:36 +01:00
6b7e76d28c

gitignore : ignore curl-related files Georgi Gerganov 2024-03-20 14:17:34 +02:00
bc0baab2ea

server : allow to override -ngl in tests (#6170) Georgi Gerganov 2024-03-20 14:14:32 +02:00
d795988d9e

Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)" Georgi Gerganov 2024-03-20 13:29:49 +02:00
f8c4e745e1

llava : add a MobileVLM_V2-1.7B backup (#6152) Ziang Wu 2024-03-20 19:20:37 +08:00
47cc7a7bf9

Server: Handle n_keep parameter in the request (#6174) Karthick 2024-03-20 16:32:34 +05:30
bd60d82d0c

server tests : more pythonic process management; fix bare except: (#6146) Jared Van Bortel 2024-03-20 01:33:49 -04:00
6c0b287748

update readme sycl for new update (#6151) Neo Zhang Jianyu 2024-03-20 11:21:41 +08:00
d26e8b669d

increase igpu cluster limit (#6159) Abhilash Majumder 2024-03-20 08:28:49 +05:30
d8b009a945

Remove undeed header file. (#6158) DAN™ 2024-03-19 12:16:09 -04:00
d0d5de42e5

gguf-split: split and merge gguf per batch of tensors (#6135) Pierrick Hymbert 2024-03-19 12:05:44 +01:00
b80cf3b2d1

common : disable repeat penalties by default (#6127) Georgi Gerganov 2024-03-19 10:21:54 +02:00
970a48060a

ci : exempt some labels from being tagged as stale (#6140) slaren 2024-03-19 09:06:54 +01:00
4c28b82529

common : print usage on '-h' and '--help' (#6145) DAN™ 2024-03-19 01:59:36 -04:00
2d15886bb0 flake.lock: Update github-actions[bot] 2024-03-17 06:37:44 +00:00
d199ca79f2

mpt : implement backwards compatiblity with duped output tensor (#6139) Jared Van Bortel 2024-03-18 12:49:02 -04:00
104f5e0fc1

clip : fix memory leak (#6138) Felix 2024-03-18 16:40:22 +01:00
5e1b7f94a0

backend : set max split inputs to GGML_MAX_SRC (#6137) slaren 2024-03-18 16:33:44 +01:00
ac9ee6a4ad

ci : disable stale issue messages (#6126) Georgi Gerganov 2024-03-18 13:45:38 +02:00
4f6d1337ca

ci : temporary disable sanitizer builds (#6128) Georgi Gerganov 2024-03-18 13:45:27 +02:00
2bf8d0f7c4

backend : offload large batches to GPU (#6083) slaren 2024-03-18 11:03:04 +01:00
496bc79bc2

common : tidy-up argument parsing (#6105) DAN™ 2024-03-18 04:27:44 -04:00
9b03719ad7

convert : add support for CamembertModel architecture (#6119) Thérence 2024-03-18 09:17:00 +01:00
3a6efdd03c

convert : use f32 outtype for bf16 tensors (#6106) Romain D 2024-03-18 09:04:41 +01:00
d01b3c4c32

common: llama_load_model_from_url using --model-url (#6098) Pierrick Hymbert 2024-03-17 19:12:37 +01:00
cd776c37c9

ci : close all stale issues at once (#6115) Georgi Gerganov 2024-03-17 19:51:57 +02:00
dc0f612548

ggml:fix finding transfer queue family index error (#6094) GainLee 2024-03-18 01:12:22 +08:00
c47cf414ef

ggml : add AVX512F SIMD (#6088) AmirAli Mirian 2024-03-16 11:52:02 -04:00
b5f4ae09c3

gritlm : add initial README.md (#6086) Daniel Bevenius 2024-03-16 16:46:29 +01:00
dfbfdd60f9

readme : add wllama as a wasm binding (#6100) Xuan Son Nguyen 2024-03-16 16:42:08 +01:00
15961ec04d

common : refactor nested if causing error C1061 on MSVC (#6101) DAN™ 2024-03-16 11:39:15 -04:00
a56d09a440

ci : close inactive issue with workflow (#6053) Pierrick Hymbert 2024-03-16 13:20:53 +01:00
d84c48505f

llama : fix Baichuan2 13B (#6092) slaren 2024-03-15 22:14:16 +01:00
877b4d0c62

llama : add support for control vectors (#5970) Theia Vogel 2024-03-15 13:43:02 -07:00
12247f4c69

llama : add Command-R support (#6033) Andrew Canis 2024-03-15 16:41:22 -04:00
4e9a7f7f7f

llava : change API to pure C style for Rust FFI bindgen (#6079) Ting Lou 2024-03-15 22:31:05 +08:00
3020327f6c

cuda : disable unused cudaLaunchHostFunc code (#6078) slaren 2024-03-15 13:24:03 +01:00
46acb36767

fix set main gpu error (#6073) Neo Zhang Jianyu 2024-03-15 18:53:53 +08:00
131b058409

make : ggml-metal.o depends on ggml.h Georgi Gerganov 2024-03-15 11:36:50 +02:00
753e36f650

[SYCL] Fix non-intel device selection (#6042) AidanBeltonS 2024-03-15 09:26:20 +00:00
7ce2c77f88

gguf : add support for I64 and F64 arrays (#6062) Ondřej Čertík 2024-03-15 02:46:51 -06:00
aab606a11f

llama : add Orion chat template (#6066) Xuan Son Nguyen 2024-03-15 09:44:57 +01:00
b0bc9f4a9d

llama-bench : use random tokens to improve accuracy with mixtral (#6069) slaren 2024-03-15 09:22:24 +01:00
4755afd1cb

llama : fix integer overflow during quantization (#6063) Georgi Gerganov 2024-03-14 22:58:41 +02:00
6e0438da3c

gguf : fix resource leaks (#6061) Steve Grubb 2024-03-14 14:29:32 -04:00
727107707a

gguf-py : bump version to 0.8.0 (#6060) Ondřej Čertík 2024-03-14 11:57:31 -06:00
69ff61397d

llama : support models without vocabulary (#5798) Michael Podvitskiy 2024-03-14 17:21:56 +01:00
044ec4b2a5

embedding : add EOS token if not present (#899) Georgi Gerganov 2024-03-14 15:14:14 +02:00
77178eedc8

gguf-py : fix dtype check (#6045) Georgi Gerganov 2024-03-14 13:32:14 +02:00
15a333260a

readme : improve readme for Llava-1.6 example (#6044) Jian Liao 2024-03-14 04:18:23 -07:00
43241adf22

server: disable debug release type sanitizer, simplify trigger (#6047) Pierrick Hymbert 2024-03-14 12:15:39 +01:00
a44bc969e4

llama : fix typo Georgi Gerganov 2024-03-14 13:13:06 +02:00
2c4fb69246

llama : optimize defrag moves + fix fragmentation calculation (#6037) Michael Podvitskiy 2024-03-14 11:56:48 +01:00
3ca23481dd

gguf-py : add support for I8, I16 and I32 (#6045) Ondřej Čertík 2024-03-14 04:40:14 -06:00
3fe8d7a17f

ggml : designate enum vals for integer types (#6050) Georgi Gerganov 2024-03-14 12:38:37 +02:00
68265ebfc6

embedding : print all resulting embeddings (#899) Georgi Gerganov 2024-03-14 12:37:20 +02:00
381da2d9f0

metal : build metallib + fix embed path (#6015) Georgi Gerganov 2024-03-14 11:55:23 +02:00
0fd6c1f015

embedding : print cosine similarity (#899) Georgi Gerganov 2024-03-14 10:12:29 +02:00
19885d205e

readme : update details about running llama in Termux on Android (#6039) Linwei Wang 2024-03-14 02:34:40 +08:00
76a936c893

readme : update API changes and hot topics Georgi Gerganov 2024-03-13 20:33:56 +02:00
463628372d

grammar : handle missing "root" node (#6004) Clint Herron 2024-03-13 14:10:40 -04:00
f30ea47a87

llama : add pipeline parallelism support (#6017) slaren 2024-03-13 18:54:21 +01:00
d8fd0ccf6a

test-backend-ops : skip CPU backend by default (#6028) slaren 2024-03-13 14:58:30 +01:00
b3d978600f

Update get version (#6025) AidanBeltonS 2024-03-13 13:17:54 +00:00
99b71c068f

Server: Use multi-task for embeddings endpoint (#6001) Xuan Son Nguyen 2024-03-13 11:39:11 +01:00
306d34be7a

ci : remove tidy-review (#6021) slaren 2024-03-12 16:55:19 +01:00
8030da7afe

ggml : reuse quantum structs across backends (#5943) Georgi Gerganov 2024-03-12 14:27:20 +02:00
184215e783

ggml : fix UB in IQ2_S and IQ3_S (#6012) Georgi Gerganov 2024-03-12 13:49:55 +02:00
48358b2e5b

sycl : update IQ1_S kernels (WIP - not working!) (#5995) Georgi Gerganov 2024-03-12 11:15:05 +02:00
5cdb371731

grammar : fix unnecessarily retained pointer to rules (#6003) gliptic 2024-03-11 20:59:03 +01:00
44ca159faf

1.5 bit: we can do even better (#5999) Kawrakow 2024-03-11 16:53:15 +01:00
05b06210c9

llama : more consistent names of count variables (#5994) Georgi Gerganov 2024-03-11 17:49:47 +02:00
83796e62bc

llama : refactor unicode stuff (#5992) Georgi Gerganov 2024-03-11 17:47:47 +02:00
828defefb6

Update server docker image URLs (#5997) Jakub N 2024-03-11 14:40:42 +01:00
caa106d4e0

Server: format error to json (#5961) Xuan Son Nguyen 2024-03-11 10:56:41 +01:00
3202361c5b

ggml, ci : Windows ARM runner and build fixes (#5979) Michael Podvitskiy 2024-03-11 10:28:51 +01:00
332bdfd798

server : maintain chat completion id for streaming responses (#5988) Minsoo Cheong 2024-03-11 17:09:32 +09:00
ecab1c75de

cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) Gilad S 2024-03-11 10:00:08 +02:00
ee35600b90

llama : fix F16/F32 downcast + improve names (#5980) Georgi Gerganov 2024-03-11 09:56:47 +02:00
be858f6205

Better 1.5 bit quantization (#5971) Kawrakow 2024-03-11 07:51:49 +01:00
ef3ced26a3

[SYCL] Add q3_s and q1_s (#5886) Abhilash Majumder 2024-03-11 10:27:56 +05:30

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master