llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-01-19 13:13:38 +08:00

9c8dcefe17

CUDA: backwards pass for misc. ops, add tests (#11257) Johannes Gäßler 2025-01-16 16:43:38 +01:00
681149ced2

llama : add llama_model_load_from_splits (#11255) Xuan Son Nguyen 2025-01-16 13:54:08 +01:00
c67cc9837d

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227) fj-y-saito 2025-01-16 18:11:49 +09:00
adc5dd92e8

vulkan: scale caching for k quants + misc fixes (#11081) Eve 2025-01-15 19:50:13 +00:00
f11cfdfd7f

ci : use -no-cnv in gguf-split tests (#11254) Georgi Gerganov 2025-01-15 18:28:35 +02:00
1d8504338e

fix: ggml: fix vulkan-shaders-gen build (#10448) Junil Kim 2025-01-15 22:17:42 +09:00
432df2d5f9

RoPE: fix back, CUDA support for back + noncont. (#11240) Johannes Gäßler 2025-01-15 12:51:37 +01:00
0ccd7f3eb2

examples : add embd_to_audio to tts-outetts.py [no ci] (#11235) Daniel Bevenius 2025-01-15 05:44:38 +01:00
f446c2cf6a

SYCL: Add gated linear attention kernel (#11175) Akarshan Biswas 2025-01-15 08:50:17 +05:30
b4d92a59a2

ci : add -no-cnv for tests (#11238) Xuan Son Nguyen 2025-01-14 15:42:23 +01:00
bbf3e55e35

vocab : add dummy tokens for "no_vocab" type (#11231) Georgi Gerganov 2025-01-14 12:54:58 +02:00
c5bf0d1bd7

server : Improve code snippets direction between RTL text (#11221) ebraminio 2025-01-14 14:09:33 +03:30
091592d758

Refactor test-chat-template.cpp (#11224) Olivier Chafik 2025-01-14 10:16:41 +00:00
44d1e796d0

sync : ggml Georgi Gerganov 2025-01-14 10:39:42 +02:00
a4f3f5d8e6

scripts : sync gguf (cont) Georgi Gerganov 2025-01-14 09:40:15 +02:00
48e1ae0e61

scripts : sync gguf Georgi Gerganov 2025-01-14 09:36:58 +02:00
d00a80e89d

scripts : sync opencl Georgi Gerganov 2025-01-14 09:19:58 +02:00
504af20ee4

server : (UI) Improve messages bubble shape in RTL (#11220) ebraminio 2025-01-13 22:53:31 +03:30
84a44815f7

cli : auto activate conversation mode if chat template is available (#11214) Xuan Son Nguyen 2025-01-13 20:18:12 +01:00
39509fb082

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (#11042) Andreas Kieslinger 2025-01-13 16:45:53 +01:00
a29f0870d4

contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:59:26 +02:00
437e05f714

server : (UI) Support for RTL text as models input or output (#11208) ebraminio 2025-01-13 17:16:39 +03:30
ca001f6656

contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:08:44 +02:00
00b4c3da62

common : support tag-based --hf-repo like on ollama (#11195) Xuan Son Nguyen 2025-01-13 13:56:23 +01:00
7426a26b24

contrib : add naming guidelines (#11177) Georgi Gerganov 2025-01-13 14:46:36 +02:00
8f70fc3d1b

llama : remove 'd' from bad special token log (#11212) Daniel Bevenius 2025-01-13 13:38:20 +01:00
1244cdcf14

ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (#11211) Radoslav Gerganov 2025-01-13 13:31:41 +02:00
924518e2e5

Reset color before we exit (#11205) Eric Curtin 2025-01-12 18:23:10 +00:00
9a483999a6

llama : fix chat template gguf key (#11201) Xuan Son Nguyen 2025-01-12 13:45:14 +01:00
08f10f69c3

llama : remove notion of CLS token (#11064) Georgi Gerganov 2025-01-12 12:15:53 +02:00
afa8a9ec9b

llama : add llama_vocab, functions -> methods, naming (#11110) Georgi Gerganov 2025-01-12 11:32:42 +02:00
c05e8c9934

gguf-py: fixed local detection of gguf package (#11180) Vinesh Janarthanan 2025-01-11 03:42:31 -06:00
2739a71e4b

convert : sort print supported models [no ci] (#11179) Daniel Bevenius 2025-01-11 05:50:33 +01:00
ba8a1f9c5b

examples : add README.md to tts example [no ci] (#11155) Daniel Bevenius 2025-01-10 13:16:16 +01:00
ff3fcabc72

convert : add --print-supported-models option (#11172) Daniel Bevenius 2025-01-10 11:30:53 +01:00
c3f9d25706

Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161) 0cc4m 2025-01-10 06:39:33 +01:00
ee7136c6d1

llama: add support for QRWKV6 model architecture (#11001) Molly Sophia 2025-01-10 09:58:08 +08:00
c6860cc734

SYCL: Refactor ggml_sycl_compute_forward (#11121) Akarshan Biswas 2025-01-10 05:43:03 +05:30
1204f97270

doc: add cuda guide for fedora (#11135) Tei Home 2025-01-09 19:32:06 +08:00
8eceb888d7

server : add tooltips to settings and themes btn (#11154) Daniel Bevenius 2025-01-09 11:28:29 +01:00
f8feb4b01a

model: Add support for PhiMoE arch (#11003) Pierrick Hymbert 2025-01-09 11:21:41 +01:00
be0e950c91

media : remove old img [no ci] Georgi Gerganov 2025-01-09 11:15:15 +02:00
d9feae1c06

llama-chat : add phi 4 template (#11148) Xuan Son Nguyen 2025-01-09 10:07:33 +01:00
8d59d91171

fix: add missing msg in static_assert (#11143) hydai 2025-01-09 04:03:28 +08:00
8a1d9c25fa

gguf-py : move scripts directory (#11116) Vinesh Janarthanan 2025-01-08 12:54:58 -06:00
1bf839b1e8

Enhance user input handling for llama-run (#11138) Eric Curtin 2025-01-08 18:47:05 +00:00
f7cd13301c

ci : use actions from ggml-org (#11140) Xuan Son Nguyen 2025-01-08 16:09:20 +01:00
4d2b3d8804

lora : improve compat with mergekit-extract-lora (#11131) Xuan Son Nguyen 2025-01-08 15:59:53 +01:00
c07d437bbd

llama : avoid hardcoded QK_K (#11061) Georgi Gerganov 2025-01-08 16:19:36 +02:00
99a3755a3c

sync : ggml Georgi Gerganov 2025-01-08 13:40:30 +02:00
c792dcf488

ggml : allow loading backend with env variable (ggml/1059) Radoslav Gerganov 2025-01-05 09:50:37 +02:00
80ccf5d725

ci : pin dependency to specific version (#11137) Xuan Son Nguyen 2025-01-08 12:07:20 +01:00
a3c1232c3f

arg : option to exclude arguments from specific examples (#11136) Georgi Gerganov 2025-01-08 12:55:36 +02:00
8cef75c743

llamafile : ppc64le MMA INT8 implementation (#10912) amritahs-ibm 2025-01-08 16:24:19 +05:30
0d52a69e4b

ci : fix cmake option (#11125) Georgi Gerganov 2025-01-08 11:29:34 +02:00
02f0430141

Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117) Mathieu Baudier 2025-01-08 09:18:13 +01:00
bec2183f2c

fix: Vulkan shader gen binary path when Cross-compiling (#11096) ag2s20150909 2025-01-08 16:17:29 +08:00
53ff6b9b9f

GGUF: C++ refactor, backend support, misc fixes (#11030) Johannes Gäßler 2025-01-07 18:01:58 +01:00
017cc5f446

ggml-backend : only offload from host buffers (fix) (#11124) Diego Devesa 2025-01-07 16:11:57 +01:00
a3d50bc022

ggml-backend : only offload from host buffers (#11120) Diego Devesa 2025-01-07 12:38:05 +01:00
a4dd490069

rpc : code cleanup (#11107) Radoslav Gerganov 2025-01-07 08:37:02 +02:00
c0d6f790d0

SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087) Akarshan Biswas 2025-01-07 11:56:07 +05:30
dc7cef9f37

llama-run : fix context size (#11094) Eric Curtin 2025-01-06 22:45:28 +00:00
ecebbd292d

llama : remove unused headers (#11109) Georgi Gerganov 2025-01-06 17:52:35 +02:00
96be8c3264

github : add cmd line field to bug report (#11090) Xuan Son Nguyen 2025-01-06 16:34:49 +01:00
e6e7c75d94

server : fix extra BOS in infill endpoint (#11106) Georgi Gerganov 2025-01-06 15:36:08 +02:00
09186fabbe

llama : remove check flash_attn with lora (#11104) Xuan Son Nguyen 2025-01-06 13:41:12 +01:00
96a1dc27c3

llama : prevent system info string accumulation across calls (#11101) Asghar Ghorbani 2025-01-06 12:21:46 +01:00
6369f867a4

llama : rename missed batch params/vars to ubatch (#10059) Daniel Bevenius 2025-01-06 10:28:17 +01:00
47182dd03f

llama : update llama_model API names (#11063) Georgi Gerganov 2025-01-06 10:55:18 +02:00
3e6e7a6bc2

tokenize : escape the prompt (#11058) Georgi Gerganov 2025-01-06 10:54:25 +02:00
ae2f606bb5

mmap : fix fileno macro clash (#11076) Georgi Gerganov 2025-01-06 10:52:38 +02:00
727368c60f

llama : use LLAMA_TOKEN_NULL (#11062) Georgi Gerganov 2025-01-06 10:52:15 +02:00
5047dd3546

llama : use _impl suffix instead of _internal (#11060) Georgi Gerganov 2025-01-06 10:52:01 +02:00
46e3556e01

CUDA: add BF16 support (#11093) Johannes Gäßler 2025-01-06 02:33:52 +01:00
b56f079e28

Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074) 0cc4m 2025-01-04 21:09:59 +01:00
9394bbd484

llama : Add support for DeepSeek V3 (#11049) fairydreaming 2025-01-04 21:06:11 +01:00
f922a9c542

[GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047) matt23654 2025-01-04 16:10:30 +00:00
46be942214

llama : add support for the cohere2 model architecture (#10900) DAN™ 2025-01-04 09:33:31 -05:00
78c6785175 sync : ggml Georgi Gerganov 2025-01-04 10:54:01 +02:00
5e3b08d606 ggml : do not install metal source when embed library (ggml/1054) Georgi Gerganov 2025-01-04 10:53:54 +02:00
db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) Daniel Bevenius 2024-12-19 03:50:12 +01:00
c31fc8b966

fix: Vulkan shader gen binary path (#11037) Gilad S. 2025-01-04 10:17:31 +02:00
4b0c638b9a

common : disable KV cache shifting automatically for unsupported models (#11053) Molly Sophia 2025-01-03 20:13:18 +08:00
e7da954ecc

metal : avoid uint (#11019) Georgi Gerganov 2025-01-03 11:26:14 +02:00
f66f582927

llama : refactor src/llama.cpp (#10902) Georgi Gerganov 2025-01-03 10:18:53 +02:00
2f0ee84b9b

server: bench: minor fixes (#10765) Pierrick Hymbert 2025-01-02 18:06:12 +01:00
0da5d86026

server : allow using LoRA adapters per-request (#10994) Xuan Son Nguyen 2025-01-02 15:05:18 +01:00
a45433ba20

readme : add llama-swap to infrastructure section (#11032) Benson Wong 2025-01-01 23:14:54 -08:00
0827b2c1da

ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) Srihari-mcw 2024-12-31 19:53:33 +05:30
45095a61bf

server : clean up built-in template detection (#11026) Xuan Son Nguyen 2024-12-31 15:22:01 +01:00
5896c65232

server : add OAI compat for /v1/completions (#10974) Xuan Son Nguyen 2024-12-31 12:34:13 +01:00
bc7b1f8632

convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) ymcki 2024-12-31 19:04:48 +08:00
6e1531aca5

common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) Peter 2024-12-31 11:46:06 +11:00
716bd6dec3

vulkan: optimize mul_mat for small values of N (#10991) Jeff Bolz 2024-12-30 11:27:11 -06:00
c250ecb315

android : fix llama_batch free (#11014) ag2s20150909 2024-12-30 20:35:13 +08:00
a813badbbd

vulkan: im2col and matmul optimizations for stable diffusion (#10942) Jeff Bolz 2024-12-29 03:16:34 -06:00
fdd2188912

vulkan: Use push constant offset to handle misaligned descriptors (#10987) Jeff Bolz 2024-12-29 02:35:11 -06:00
f865ea149d

server: added more docs for response_fields field (#10995) Isaac McFadyen 2024-12-28 10:09:19 -05:00
16cdce7b68

server : fix token duplication when streaming with stop strings (#10997) Alexey Parfenov 2024-12-28 15:08:54 +00:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master