llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-02 21:53:14 +08:00

6763f713bb

readme : more lora detail in main example readme (#10064) Rich Dougherty 2024-10-31 01:22:39 +13:00
79a2bc042d

convert : more detailed convert lora usage docs (#10065) Rich Dougherty 2024-10-31 01:22:21 +13:00
fc83a9e584

ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029) xctan 2024-10-30 15:00:40 +08:00
c5b0f4b5d9

llama : refactor model loader with backend registry (#10026) Diego Devesa 2024-10-30 02:01:23 +01:00
8f275a7c45

ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) Changyeon Kim 2024-10-29 17:52:56 +09:00
8d8ff71536

llama : remove Tail-Free sampling (#10071) Georgi Gerganov 2024-10-29 10:42:05 +02:00
61715d5cc8

llama : Add IBM granite template (#10013) arch-btw 2024-10-28 10:45:33 -07:00
07028f9d74

flake.lock: Update (#10063) Georgi Gerganov 2024-10-28 17:41:24 +02:00
524afeec9d

musa: workaround for Guilty Lockup in cleaning src0 (#10042) R0CKSTAR 2024-10-28 17:02:48 +08:00
8125e6cbfc

server : don't overfill the batch during infill (#10018) Georgi Gerganov 2024-10-28 08:49:32 +02:00
8841ce3f43

llama : switch KQ multiplication to F32 precision by default (#10015) Georgi Gerganov 2024-10-27 20:59:58 +02:00
cc2983d375

sync : ggml Georgi Gerganov 2024-10-26 10:34:08 +03:00
8c60a8a462

increase cuda_cpy block size (ggml/996) bssrdf 2024-10-23 14:34:00 -04:00
9e4a2563ea

scripts : fix amx sync [no ci] Georgi Gerganov 2024-10-26 10:33:31 +03:00
668750357e

metal : support permuted matrix multiplicaions (#10033) Georgi Gerganov 2024-10-25 22:26:15 +03:00
ff252ea48e

llama : add DRY sampler (#9702) wwoodsTM 2024-10-25 10:07:34 -06:00
d80fb71f8b

llama: string_split fix (#10022) Michael Podvitskiy 2024-10-25 17:57:54 +02:00
2f8bd2b901

llamafile : extend sgemm.cpp support for Q5_0 models (#10010) Srihari-mcw 2024-10-25 12:57:41 +05:30
bc5ba007b2

server : check that the prompt fits in the slot's context (#10030) Georgi Gerganov 2024-10-25 10:13:46 +03:00
958367bf53

server : refactor slot input data, move tokenizer to HTTP thread (#10023) Xuan Son Nguyen 2024-10-24 21:51:22 +02:00
40f2555797

ci : fix cmake flags for SYCL Georgi Gerganov 2024-10-24 21:23:33 +03:00
167a515651

CUDA: fix insufficient buffer clearing for MMQ (#10032) Johannes Gäßler 2024-10-24 14:40:23 +02:00
c39665f589

CUDA: fix MMQ for non-contiguous src0, add tests (#10021) Johannes Gäßler 2024-10-24 11:09:36 +02:00
0a1c750c80

server : samplers accept the prompt correctly (#10019) wwoodsTM 2024-10-23 13:27:51 -06:00
190a37d797

sync : ggml Georgi Gerganov 2024-10-23 17:23:55 +03:00
2d3aba9ee8

llama.vim : bump generation time limit to 3s [no ci] Georgi Gerganov 2024-10-23 17:16:56 +03:00
80273a306d CUDA: fix 1D im2col, add tests (ggml/993) Johannes Gäßler 2024-10-18 09:24:44 +02:00
c19af0acb1 ggml : remove redundant set of contexts used field (ggml/978) Daniel Bevenius 2024-10-16 20:10:01 +02:00
ac113a0fee

llama.vim : add classic vim support (#9995) Michael Coppola 2024-10-23 07:09:26 -04:00
4c9388fb96

metal : add POOL2D and fix IM2COL (#9943) Jun Hee Yoo 2024-10-23 19:33:45 +09:00
873279b159 flake.lock: Update github-actions[bot] 2024-10-20 00:22:59 +00:00
c8c07d658a

llama : fix empty batch causing llama_batch_allocr to crash (#9966) Xuan Son Nguyen 2024-10-22 16:59:02 +02:00
19d900a756

llama : rename batch to ubatch (#9950) Daniel Bevenius 2024-10-22 15:31:06 +02:00
11d47057a5

Rwkv chat template fix (#10001) Molly Sophia 2024-10-22 21:22:26 +08:00
c421ac072d

lora : warn user if new token is added in the adapter (#9948) Xuan Son Nguyen 2024-10-22 13:08:41 +02:00
4ff7fe1fb3

llama : add chat template for RWKV-World + fix EOT (#9968) Molly Sophia 2024-10-22 18:33:37 +08:00
6b8447352d

[CANN] Adapt to dynamically loadable backends mechanism (#9970) leo-pony 2024-10-22 16:16:01 +08:00
674804a996

arg : fix typo in embeddings argument help [no ci] (#9994) Daniel Bevenius 2024-10-22 09:40:02 +02:00
e94a138d64

llama.vim : fix info text display [no ci] (#9787) Georgi Gerganov 2024-10-22 00:35:25 +03:00
e01c67affe

llama.vim : move info to the right of screen [no ci] (#9787) Georgi Gerganov 2024-10-21 22:52:22 +03:00
994cfb1acb

readme : update UI list (#9972) Asghar Ghorbani 2024-10-21 20:20:59 +02:00
94008cc760

arg : fix attention non-causal arg value hint (#9985) Daniel Bevenius 2024-10-21 20:12:52 +02:00
dbd5f2f573

llama.vim : plugin for Neovim (#9787) Georgi Gerganov 2024-10-21 20:25:02 +03:00
f594bc80ba

ggml : add asserts for type conversion in fattn kernels (#9971) Georgi Gerganov 2024-10-21 16:20:46 +03:00
d5ebd79c76

rpc : pack only RPC structs (#9959) Radoslav Gerganov 2024-10-21 13:35:40 +03:00
55e47786e3

llama : default sampling changes + greedy update (#9897) Georgi Gerganov 2024-10-21 09:46:40 +03:00
bc21975084

speculative : fix handling of some input params (#9963) Georgi Gerganov 2024-10-21 09:37:12 +03:00
1db8c84fc6

fix mul_mat_vec_q and *_vec_q error (#9939) Neo Zhang Jianyu 2024-10-21 14:26:09 +08:00
45f097645e

readme : update bindings list (#9951) Loïc Carrère 2024-10-20 18:25:41 +02:00
7cab2083c7

readme : update infra list (#9942) icppWorld 2024-10-20 12:01:34 -04:00
cda0e4b648

llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745) Xuan Son Nguyen 2024-10-18 23:18:01 +02:00
afd9909a64

rpc : backend refactoring (#9912) Radoslav Gerganov 2024-10-18 14:33:58 +03:00
87421a23e8

[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705) Ouadie EL FAROUKI 2024-10-18 06:46:16 +01:00
60ce97c9d8

add amx kernel for gemm (#8998) Ma Mingfei 2024-10-18 13:34:36 +08:00
8901755ba3

server : add n_indent parameter for line indentation requirement (#9929) Georgi Gerganov 2024-10-18 07:32:19 +03:00
6f55bccbb8

llama : rename batch_all to batch (#8881) Daniel Bevenius 2024-10-18 01:41:51 +02:00
17bb928080

readme : remove --memory-f32 references (#9925) Georgi Gerganov 2024-10-17 23:43:05 +03:00
9f45fc1e99

llama : change warning to debug log Georgi Gerganov 2024-10-17 23:26:32 +03:00
99bd4ac28c

llama : infill sampling handle very long tokens (#9924) Georgi Gerganov 2024-10-17 22:32:47 +03:00
3752217ed5

readme : update bindings list (#9918) Tim Wang 2024-10-17 17:57:14 +11:00
f010b77a37

vulkan : add backend registry / device interfaces (#9721) Diego Devesa 2024-10-17 02:46:58 +02:00
2194200278

fix: allocating CPU buffer with size 0 (#9917) Gilad S. 2024-10-17 02:34:22 +03:00
73afe681aa

fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) Gilad S. 2024-10-17 01:36:51 +03:00
9e04102448

llama : suppress conversion from 'size_t' to 'int' (#9046) Daniel Bevenius 2024-10-16 19:34:28 +02:00
dbf18e4de9

llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
66c2c93082

grammar : fix JSON Schema for string regex with top-level alt. (#9903) Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
10433e8b45

llama : add tensor name for "result_norm" (#9907) Molly Sophia 2024-10-16 18:10:21 +08:00
1f66b699c4

server : fix the disappearance of the end of the text (#9867) Alexey Parfenov 2024-10-16 08:35:53 +00:00
0e41b300ed

sync : ggml Georgi Gerganov 2024-10-16 11:28:14 +03:00
cd60b88bf7

ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
becfd387f6

[CANN] Fix cann compilation error (#9891) leo-pony 2024-10-16 08:51:46 +08:00
755a9b2bf0

llama : add infill sampler (#9896) Georgi Gerganov 2024-10-15 16:35:33 +03:00
223c25a72f

server : improve infill context reuse (#9894) Georgi Gerganov 2024-10-15 16:28:55 +03:00
fbc98b748e

sampling : add XTC sampler (#9742) MaggotHATE 2024-10-15 15:54:55 +05:00
dcdd535302

server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
4c42f93b22

readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
a89f75e1b7

server : handle "logprobs" field with false value (#9871) VoidIsVoid 2024-10-14 15:04:36 +08:00
13dca2a54a

Vectorize load instructions in dmmv f16 CUDA kernel (#9816) agray3 2024-10-14 01:49:08 +01:00
d4c19c0f5c

server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
c7181bd294

server : reuse cached context chunks (#9866) Georgi Gerganov 2024-10-13 18:52:48 +03:00
92be9f1216

flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
edc265661c

server : add option to time limit the generation phase (#9865) Georgi Gerganov 2024-10-12 16:14:27 +03:00
1bde94dd02

server : remove self-extend features (#9860) Georgi Gerganov 2024-10-12 16:06:31 +03:00
95c76e8e92

server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
11ac9800af

llama : improve infill support and special token detection (#9798) Georgi Gerganov 2024-10-12 08:21:51 +03:00
943d20b411

musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
96776405a1

ggml : move more prints to the ggml log system (#9839) Diego Devesa 2024-10-11 15:34:45 +02:00
7eee341bee

common : use common_ prefix for common library functions (#9805) Diego Devesa 2024-10-10 22:57:42 +02:00
0e9f760eb1

rpc : add backend registry / device interfaces (#9812) Diego Devesa 2024-10-10 20:14:55 +02:00
cf8e0a3bb9

musa: add docker image support (#9685) R0CKSTAR 2024-10-11 02:10:37 +08:00
c7499c557c

examples : do not use common library in simple example (#9803) Diego Devesa 2024-10-10 19:50:49 +02:00
c81f3bbb05

cmake : do not build common library by default when standalone (#9804) Diego Devesa 2024-10-09 18:49:52 +02:00
e7022064ab

perplexity : fix integer overflow (#9783) Georgi Gerganov 2024-10-09 17:00:18 +03:00
3dc48fe75a

examples : remove llama.vim Georgi Gerganov 2024-10-09 10:55:42 +03:00
dca1d4b58a

ggml : fix BLAS with unsupported types (#9775) Diego Devesa 2024-10-08 14:21:43 +02:00
458367a906

server : better security control for public deployments (#9776) Xuan Son Nguyen 2024-10-08 13:27:04 +02:00
fa42aa6d89

scripts : fix spelling typo in messages and comments (#9782) standby24x7 2024-10-08 15:19:53 +09:00
6374743747

ggml : add backend registry / device interfaces to BLAS backend (#9752) Diego Devesa 2024-10-07 21:55:08 +02:00
f1af42fa8c

Update building for Android (#9672) Andrew Minh Nguyen 2024-10-07 09:37:31 -07:00
6279dac039

flake.lock: Update (#9753) Georgi Gerganov 2024-10-07 19:35:42 +03:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master