llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-01 21:23:14 +08:00

198b1ec611

ggml-cpu: Fix duplicate MATMUL_INT8 (#11817) Weizhao Ouyang 2025-02-12 20:22:58 +08:00
c3d6af7cd2

CUDA: fix CUDART_VERSION checks (#11821) Johannes Gäßler 2025-02-12 13:16:39 +01:00
369be5598a

llama : fix typo in llama-grammar.h [no ci] (#11816) Daniel Bevenius 2025-02-12 08:40:01 +01:00
4078c77f98

docs: add OpenCL (#11697) lhez 2025-02-11 14:04:13 -08:00
90e4dba461

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (#11803) Sheldon Robinson 2025-02-11 10:55:45 -05:00
a18f481f99

server : use common_token_to_piece instead of common_detokenize (#11740) Daniel Bevenius 2025-02-11 14:06:45 +01:00
b9ab0a4d0b

CUDA: use arch list for compatibility check (#11775) Johannes Gäßler 2025-02-11 00:17:22 +01:00
7b891bdc86

fix: typos in documentation files (#11791) Maxim Evtush 2025-02-10 23:21:31 +01:00
81732619fd

docs: utilize the forward slash (/) as the path separator for Unix-like systems (#11770) jason_w 2025-02-11 06:17:48 +08:00
507f9174fe

server : (webui) introduce conversation branching + idb storage (#11792) Xuan-Son Nguyen 2025-02-10 21:23:17 +01:00
19b392d58d

llama-mmap: fix missing include (#11796) Wilken Gottwalt 2025-02-10 19:58:18 +01:00
0893e0114e

server : correct signal handler (#11795) Xuan-Son Nguyen 2025-02-10 18:03:28 +01:00
d7b31a9d84

sync: minja (a72057e519) (#11774) Olivier Chafik 2025-02-10 09:34:09 +00:00
9ac3457b39

Update README.md [no ci] (#11781) pascal-lc 2025-02-10 16:05:57 +08:00
c2a67efe38

vulkan: Make Vulkan optional at runtime (#11493). (#11494) Danny Milosavljevic 2025-02-10 07:17:21 +01:00
b044a0fe3c

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592) Wagner Bruna 2025-02-10 03:08:22 -03:00
19d3c8293b

There's a better way of clearing lines (#11756) Eric Curtin 2025-02-09 10:34:49 +00:00
98f6b0fd1e

vulkan: account for lookup tables when checking shared memory size (#11502) Jeff Bolz 2025-02-09 01:43:51 -06:00
55ac8c7791

server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759) Xuan-Son Nguyen 2025-02-08 21:54:50 +01:00
e6e6583199

server : (webui) increase edit textarea size (#11763) Woof Dog 2025-02-08 19:09:55 +00:00
aaa5505307

server : minor log updates (#11760) Georgi Gerganov 2025-02-08 18:08:43 +02:00
bdcf8b6a56

cont : fix mmap flag print (#11699) Georgi Gerganov 2025-02-08 16:49:38 +02:00
4d3465c5ae

ggml: Fix data race in ggml threadpool (#11736) Karol Kontny 2025-02-08 15:30:53 +01:00
d80be897ac

CUDA: fix min. version for movmatrix (#11751) Johannes Gäßler 2025-02-08 10:46:07 +01:00
3ab410f55f

readme : update front-end framework (#11753) Nikolaos Pothitos 2025-02-08 11:43:04 +02:00
0cf867160c

server : (webui) fix numeric settings being saved as string (#11739) Xuan-Son Nguyen 2025-02-08 10:42:34 +01:00
d2fe216fb2

Make logging more verbose (#11714) Eric Curtin 2025-02-07 14:42:46 +00:00
ed926d8833

llama : fix defrag logic (#11707) Georgi Gerganov 2025-02-07 16:05:34 +02:00
2d219b389e

vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729) Christian Fillion 2025-02-07 08:55:47 -05:00
333820d749

llama : fix progress dots (#11730) magicse 2025-02-07 15:48:47 +02:00
c026ba3c23

vulkan: print shared memory size (#11719) Jeff Bolz 2025-02-07 04:26:03 -06:00
7ee953a64a

llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727) Christian Fillion 2025-02-07 04:33:27 -05:00
ec3bc8270b

SYCL: remove XMX info from print devices (#11712) Akarshan Biswas 2025-02-07 14:57:53 +05:30
b7552cfcbc

common : add default embeddings presets (#11677) Daniel Bevenius 2025-02-07 09:15:22 +01:00
225bbbfa39

ggml : optimize and build warning fix for LoongArch (#11709) Jinyang He 2025-02-07 15:38:31 +08:00
855cd0734a

llama : fix old glm4 models (#11670) tv1wnd 2025-02-06 22:48:51 +01:00
8a59053f63

sync : ggml Georgi Gerganov 2025-02-06 21:23:03 +02:00
1d20e53c40

rpc: fix known RCE in rpc-server (ggml/1103) Patrick Peng 2025-02-06 09:29:13 -05:00
2fb3c32a16

server : (webui) migrate project to ReactJS with typescript (#11688) Xuan-Son Nguyen 2025-02-06 17:32:29 +01:00
9ab42dc722

docs: update fedora cuda guide for 12.8 release (#11393) Tei Home 2025-02-06 20:16:15 +08:00
194b2e69f8

SYCL: Adjust support condition for norm operators (#11674) Akarshan Biswas 2025-02-06 17:12:35 +05:30
9dd7a0390f

llama : add log about loading model tensors (#11699) Georgi Gerganov 2025-02-06 13:41:37 +02:00
c0d4843225

build : fix llama.pc (#11658) Adrien Gallouët 2025-02-06 12:08:13 +01:00
8d4d2be143

ggml : fix LoongArch compile error with 128-bit SIMD (#11701) junchao-zhao 2025-02-06 17:20:00 +08:00
2c6c8df56d

vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521) Jeff Bolz 2025-02-06 00:15:30 -06:00
8a7e3bf17a

vulkan: initial support for IQ4_XS quantization (#11501) Rémy O 2025-02-06 07:09:59 +01:00
1b598b3058

vulkan: use smaller combined allocations to avoid fragmentation (#11551) Jeff Bolz 2025-02-06 00:02:18 -06:00
902368a06b

metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690) Charles Duffy 2025-02-05 19:52:31 -06:00
c3db0480bb

readme : add link to Autopen under UIs (#11684) Matvey Soloviev 2025-02-06 01:55:25 +01:00
d774ab3acc

metal : adjust support conditions for norm operators (#11671) Georgi Gerganov 2025-02-05 10:57:42 +02:00
fa62da9b2d

CUDA: support for mat. mul. with ne03 != ne13 (#11656) Johannes Gäßler 2025-02-05 08:58:31 +01:00
1ec208083c

llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) SAMI 2025-02-05 14:45:40 +07:00
9f4cc8f8d3

sync: minja (#11641) Olivier Chafik 2025-02-05 01:00:12 +00:00
fd08255d0d

CUDA: non-contiguous (RMS) norm support (#11659) Johannes Gäßler 2025-02-04 22:21:42 +01:00
3ec9fd4b77

HIP: force max threads per block to be 1024 (#11621) fxzjshm 2025-02-05 02:18:38 +08:00
3962fc1a79

server : add try..catch to places not covered by set_exception_handler (#11620) Xuan-Son Nguyen 2025-02-04 18:25:42 +01:00
1bef571f6a

arg : list RPC devices first when using --list-devices (#11655) Radoslav Gerganov 2025-02-04 18:16:20 +02:00
db288b60cb

tool-call: command r7b fix for normal responses (#11608) Olivier Chafik 2025-02-04 15:48:53 +00:00
106045e7bb

readme : add llm_client Rust crate to readme bindings (#11628) Shelby Jenkins 2025-02-04 05:20:55 -06:00
f117d84b48

swift : fix llama-vocab api usage (#11645) Jhen-Jie Hong 2025-02-04 19:15:24 +08:00
534c46b53c

metal : use residency set for other platforms (#11648) Jhen-Jie Hong 2025-02-04 19:07:18 +08:00
387a1598ca

authors : update Georgi Gerganov 2025-02-04 13:04:10 +02:00
7c9e0ca520

sync : ggml Georgi Gerganov 2025-02-04 12:59:21 +02:00
8f8290ada9

cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) Christian Kastner 2025-02-04 00:17:15 +01:00
b34aedd558

ci : do not stale-close roadmap issues Georgi Gerganov 2025-02-04 09:30:42 +02:00
cde3833239

tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) Olivier Chafik 2025-02-03 23:49:27 +00:00
b3451785ac

server : (webui) revert hacky solution from #11626 (#11634) Xuan-Son Nguyen 2025-02-04 00:10:52 +01:00
1d1e6a90bc

server : (webui) allow typing and submitting during llm response (#11626) Woof Dog 2025-02-03 22:16:27 +00:00
5598f475be

server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622) Daniel Bevenius 2025-02-03 16:45:38 +01:00
8ec05832fa

sync : ggml Georgi Gerganov 2025-02-03 14:57:08 +02:00
21c84b5d2d

CUDA: fix Volta FlashAttention logic (#11615) Johannes Gäßler 2025-02-03 13:25:56 +01:00
d92cb67e37

server : (webui) Fix Shift+Enter handling (#11609) mashdragon 2025-02-03 09:42:55 +00:00
6eecde3cc8

HIP: fix flash_attn_stream_k_fixup warning (#11604) Johannes Gäßler 2025-02-02 23:48:29 +01:00
396856b400

CUDA/HIP: add support for selectable warp size to mmv (#11519) uvos 2025-02-02 22:40:09 +01:00
4d0598e144

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601) uvos 2025-02-02 22:08:05 +01:00
90f9b88afb

nit: more informative crash when grammar sampler fails (#11593) Olivier Chafik 2025-02-02 19:58:34 +00:00
864a0b67a6

CUDA: use mma PTX instructions for FlashAttention (#11583) Johannes Gäßler 2025-02-02 19:31:09 +01:00
84ec8a58f7

Name colors (#11573) Eric Curtin 2025-02-02 16:14:48 +01:00
bfcce4d693

tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585) Olivier Chafik 2025-02-02 09:25:38 +00:00
69804487e0

Fix exotic ci env that lacks ostringstream::str (#11581) Olivier Chafik 2025-02-02 09:10:15 +00:00
ff227703d6

sampling : support for llguidance grammars (#10224) Michał Moskal 2025-02-01 23:55:32 -08:00
0cec062a63

llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) piDack 2025-02-02 15:48:46 +08:00
53debe6f3c

ci: use sccache on windows HIP jobs (#11553) Olivier Chafik 2025-02-01 18:22:38 +00:00
cfd74c86db

sync: minja (418a2364b5) (#11574) Olivier Chafik 2025-02-01 12:24:51 +00:00
ecef206ccb

Implement s3:// protocol (#11511) Eric Curtin 2025-02-01 11:30:54 +01:00
5bbc7362cb

ci: simplify cmake build commands (#11548) Olivier Chafik 2025-02-01 00:01:20 +00:00
aa6fb13213

ci: use sccache on windows instead of ccache (#11545) Olivier Chafik 2025-01-31 17:12:40 +00:00
a83f528688

tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539) Olivier Chafik 2025-01-31 14:15:25 +00:00
b1bcd309fc

fix stop regression (#11543) Olivier Chafik 2025-01-31 13:48:31 +00:00
5783575c9d

Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) Olivier Chafik 2025-01-31 08:24:29 +00:00
4a2b196d03

server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) Olivier Chafik 2025-01-31 08:12:40 +00:00
1bd3047a93

common: Add missing va_end (#11529) Steve Grubb 2025-01-31 00:58:55 -05:00
a2df2787b3

server : update help metrics processing/deferred (#11512) Daniel Bevenius 2025-01-31 06:04:53 +01:00
553f1e46e9

ci: ccache for all github worfklows (#11516) Olivier Chafik 2025-01-30 22:01:06 +00:00
8b576b6c55

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) Olivier Chafik 2025-01-30 19:13:58 +00:00
27d135c970 HIP: require at least HIP 5.5 uvos 2025-01-29 19:36:00 +01:00
6af1ca48cb HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
c300e68ef4 CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
3d804dec76

sync: minja (#11499) Olivier Chafik 2025-01-30 10:30:27 +00:00
ffd0821c57

vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) mgroeber9110 2025-01-30 11:10:59 +01:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master