llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-02 21:53:14 +08:00

4047be74da

scripts: update compare-llama-bench.py (#10319) Johannes Gäßler 2024-11-15 21:19:03 +01:00
883d206fbd ggml : fix some build issues slaren 2024-11-15 20:20:54 +01:00
09ecbcb596 cmake : fix ppc64 check (whisper/0) Georgi Gerganov 2024-11-15 15:35:22 +02:00
3225008973 ggml : vulkan logs (whisper/2547) thewh1teagle 2024-11-15 15:33:53 +02:00
cbf5541a82 sync : ggml Georgi Gerganov 2024-11-15 15:31:16 +02:00
18429220bd

AVX BF16 and single scale quant optimizations (#10212) Eve 2024-11-15 11:47:58 +00:00
f0204a0ec7

ci: build test musa with cmake (#10298) R0CKSTAR 2024-11-15 19:47:25 +08:00
57f8355b29

sycl: Update Intel docker images to use DPC++ 2025.0 (#10305) Romain Biessy 2024-11-15 12:10:45 +01:00
9901068ac7

server : (web UI) add copy button for code block, fix api key (#10242) Xuan Son Nguyen 2024-11-15 05:48:49 -04:00
231f9360d9

cann: dockerfile and doc adjustment (#10302) Chenguang Li 2024-11-15 15:09:35 +08:00
4802ad350b

scripts : fix regex in sync [no ci] Georgi Gerganov 2024-11-15 08:38:43 +02:00
5a54af4d4f

sycl: Use syclcompat::dp4a (#10267) Romain Biessy 2024-11-15 04:09:12 +01:00
1607a5e5b0

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) Charles Xu 2024-11-15 01:28:50 +01:00
ae8de6d50a

ggml : build backends as libraries (#10256) Diego Devesa 2024-11-14 18:04:35 +01:00
4a8ccb37ad

CUDA: no -sm row for very small matrices (#10185) Johannes Gäßler 2024-11-14 13:00:15 +01:00
2a82891a85

speculative : fix out-of-bounds access (#10289) Georgi Gerganov 2024-11-14 11:44:15 +02:00
af148c9386

vulkan: Optimize binary ops (#10270) Jeff Bolz 2024-11-13 23:22:55 -06:00
66798e42fb

vulkan: Use macros to make the mat mul pipeline creation more concise (#10259) Jeff Bolz 2024-11-13 14:59:47 -06:00
fb4a0ec083

llama : propagate the results of graph_compute (#9525) Michael Podvitskiy 2024-11-13 20:00:35 +02:00
5ea926dad7

sync : ggml Georgi Gerganov 2024-11-13 18:11:54 +02:00
1ee9eea094

docs : update bindings list (#10261) Small Grass Forest 2024-11-13 19:17:10 +08:00
ff7fb670d0

server : add missing docs (#10269) Alexey Parfenov 2024-11-13 11:16:30 +00:00
0e712a5acb

server : fix incorrect res in validate_model_chat_template (#10272) Jhen-Jie Hong 2024-11-13 19:15:23 +08:00
a0ec17b32e

metadata: Detailed Dataset Authorship Metadata (#8875) Brian 2024-11-13 21:10:38 +11:00
2e82ffa4af

sycl : Fixes to broken builds and test-backend-ops (#10257) Alberto Cabrera Pérez 2024-11-13 09:40:57 +00:00
80dd7ff22f

vulkan: Optimize contiguous copies (#10254) Jeff Bolz 2024-11-13 00:58:57 -06:00
54ef9cfc72

vulkan: Throttle the number of shader compiles during the build step. (#10222) Jeff Bolz 2024-11-11 11:13:51 -06:00
b0cefea58a

metal : more precise Q*K in FA vec kernel (#10247) Georgi Gerganov 2024-11-11 08:39:13 +02:00
b141e5f6ef

server : enable KV cache defrag by default (#10233) Georgi Gerganov 2024-11-11 08:38:43 +02:00
4b3a9212b6

flake.lock: Update (#10243) Georgi Gerganov 2024-11-10 21:45:25 +02:00
505f33274d

server : (web UI) Add back sampler settings (#10239) MaggotHATE 2024-11-11 00:42:25 +05:00
160687b3ed

vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226) Jeff Bolz 2024-11-10 05:37:56 -06:00
6423c65aa8

metal : reorder write loop in mul mat kernel + style (#10231) Georgi Gerganov 2024-11-09 11:53:13 +02:00
39a334a9aa

metal : fix build and some more comments (#10229) Georgi Gerganov 2024-11-09 11:53:02 +02:00
bb38cdd8ba

metal : fix F32 accumulation in FA vec kernel (#10232) Georgi Gerganov 2024-11-09 11:52:45 +02:00
f018acba22

llama : fix Qwen model type strings Georgi Gerganov 2024-11-09 11:26:34 +02:00
46323fa9ef

metal : hide debug messages from normal log Georgi Gerganov 2024-11-09 11:21:49 +02:00
5b359bb1e3

ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) SXX 2024-11-09 15:35:46 +08:00
e89213492d

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) amritahs-ibm 2024-11-09 12:47:50 +05:30
8fc393f246

scripts : fix pattern and get n_tokens in one go (#10221) haopeng 2024-11-09 15:06:54 +08:00
ec450d3bbf

metal : opt-in compile flag for BF16 (#10218) Georgi Gerganov 2024-11-08 21:59:46 +02:00
695ad752b2

metal : improve clarity (minor) (#10171) Georgi Gerganov 2024-11-08 18:37:41 +02:00
841f27abdb

metal : optimize FA kernels (#10171) Georgi Gerganov 2024-11-08 13:47:22 +02:00
d05b3127bd

swift : exclude ggml-metal-embed.metal (#10211) Jhen-Jie Hong 2024-11-08 17:34:06 +08:00
76c6e7f105

server : minor UI fix (#10207) Xuan Son Nguyen 2024-11-07 18:44:38 -04:00
a71d81cf8c

server : revamp chat UI with vuejs and daisyui (#10175) Xuan Son Nguyen 2024-11-07 17:31:10 -04:00
eec4d71737

scripts : add amx to sync-ggml.sh [no ci] Georgi Gerganov 2024-11-07 23:11:36 +02:00
3b08828674

sync : ggml Georgi Gerganov 2024-11-07 23:08:24 +02:00
a2c6fd747c

scripts : sync update Georgi Gerganov 2024-11-07 23:07:55 +02:00
97404c4a03

ggml : add ggml-cpu.h to the public headers (#10204) Diego Devesa 2024-11-07 18:16:08 +01:00
60e17ce23c

Remove identical wte/etw logic for jais (#10203) Faisal Zaghloul 2024-11-07 11:46:12 -05:00
5107e8cea3

DRY: Fixes clone functionality (#10192) wwoodsTM 2024-11-07 08:20:25 -07:00
2319126a70

fix q4_0_8_8 format for corrupted tokens issue (#10198) snadampal 2024-11-07 02:02:08 -06:00
3bcd40b3c5

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) Zhiyuan Li 2024-11-07 18:19:10 +11:00
5c333e0140

metal : add BF16 support (#8439) Georgi Gerganov 2024-11-06 19:53:51 +02:00
b11f9ba9b8

server : remove hack for extra parallel slot (#10187) Georgi Gerganov 2024-11-06 13:29:01 +02:00
94d8cb8be1

metal : fix from ptr buffer name (#10189) Diego Devesa 2024-11-06 12:10:07 +01:00
1dc04b2dee

ggml : adjust is_first_call init value (#10193) Georgi Gerganov 2024-11-06 11:20:10 +02:00
a1eaf6a960

metal : add quantized FA support (#10149) Georgi Gerganov 2024-11-06 10:24:23 +02:00
b8deef0ec0

llama : add <|tool_call|> formatting to Granite template (#10177) Gabe Goodhart 2024-11-05 05:23:04 -07:00
a9e8a9a030

ggml : fix arch check in bf16_to_fp32 (#10164) Diego Devesa 2024-11-04 23:17:01 +01:00
3407364776

Q6_K AVX improvements (#10118) Eve 2024-11-04 22:06:31 +00:00
d5a409e57f

ggml : fix gelu tables initialization (#10172) Diego Devesa 2024-11-04 20:06:58 +01:00
401558b7ba

ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) Diego Devesa 2024-11-04 17:34:08 +01:00
9e0ecfb697

server : clarify /slots endpoint, add is_processing (#10162) Xuan Son Nguyen 2024-11-04 16:33:29 +01:00
6a066b9978

fix build break on arm64 linux (#10166) snadampal 2024-11-04 09:08:33 -06:00
ea02c753eb

cuda : clear error after changing peer access (#10153) Diego Devesa 2024-11-04 13:10:23 +01:00
05697f670b

metal : simplify f16 and f32 dequant kernels (#0) Georgi Gerganov 2024-11-04 13:49:34 +02:00
f8e58135cf

metal : move dequantize templates to beginning of MSL source (#0) Georgi Gerganov 2024-11-04 13:43:32 +02:00
329ed914c9

CANN: adjust backend registry refactor. (#10158) leo-pony 2024-11-04 19:08:22 +08:00
ce027adfb3

sync : ggml Georgi Gerganov 2024-11-04 10:33:37 +02:00
284e5b0275

cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
e2292aaa17

metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
9f40989351

ggml : move CPU backend to a separate file (#10144) Diego Devesa 2024-11-03 19:34:08 +01:00
08828a6d7d

metal : minor fixup in FA kernel (#10143) Georgi Gerganov 2024-11-03 15:18:40 +02:00
1839f69130

flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
9830b6923b

Add apple arm to presets (#10134) Christian Köhnenkamp 2024-11-02 23:35:31 +01:00
42cadc74bd

server : fix slot selection by lru (#10126) sasha0552 2024-11-02 16:34:56 +00:00
45950415ed

server : fix endpoint checks (#10135) Georgi Gerganov 2024-11-02 18:34:00 +02:00
1926d6e39d

llama : adjust default context size + print warnings (#10136) Georgi Gerganov 2024-11-02 15:18:56 +02:00
b634f8a26f

simple-chat : only add bos on first prompt (#10129) Diego Devesa 2024-11-02 13:08:53 +01:00
7554aa4655

convert-lora : make --base optional (#10110) Xuan Son Nguyen 2024-11-02 12:53:17 +01:00
a6744e43e8

llama : add simple-chat example (#10124) Diego Devesa 2024-11-01 23:50:59 +01:00
e991e3127f

llama : use smart pointers for ggml resources (#10117) Diego Devesa 2024-11-01 23:48:26 +01:00
418f5eef26

vulkan : improve ggml_vk_create_buffer error handling (#9898) Shupei Fan 2024-11-02 02:33:14 +08:00
ba6f62eb79

readme : update hot topics Georgi Gerganov 2024-11-01 17:31:51 +02:00
d865d1478c

server : fix smart selection of available slot (#10120) sasha0552 2024-11-01 13:33:14 +00:00
1804adb0cf

ggml : remove ggml_scratch (#10121) Georgi Gerganov 2024-11-01 12:58:45 +02:00
815fe72adc

sync : ggml Georgi Gerganov 2024-11-01 10:28:24 +02:00
f221d56220

ggml : alloc ggml_contexts on the heap (whisper/2525) Georgi Gerganov 2024-11-01 10:23:05 +02:00
e597e50794

build: fix build error in Windows env with OneAPI setup (#10107) Zhenwei Jin 2024-11-01 11:09:59 +08:00
85679d37f3

llama : improve output buffer type selection (#10098) Diego Devesa 2024-11-01 00:49:53 +01:00
1e9f94994e

quantize : fix --keep-split (#10114) Diego Devesa 2024-11-01 00:45:34 +01:00
c02e5ab2a6

llama : fix buffer checks for mamba and rwk (#10111) Diego Devesa 2024-10-31 22:54:23 +01:00
ab3d71f97f

loader: refactor tensor weights storage (#9935) Zhenwei Jin 2024-11-01 02:50:39 +08:00
0a683e8088

server : include scheme when printing URL (#10106) Kevin Gibbons 2024-10-31 06:02:35 -07:00
dea5e86051

ggml : check tensor name lengths in gguf files (#10100) Diego Devesa 2024-10-31 11:40:59 +01:00
1329c0a75e

kompute: add mul_mat_q4_k shader (#10097) Sergio López 2024-10-31 10:09:52 +01:00
61408e7fad

kompute: add backend registry / device interfaces (#10045) Sergio López 2024-10-30 17:01:52 +01:00
b9e02e8184

ggml : fix memory leaks when loading invalid gguf files (#10094) Diego Devesa 2024-10-30 14:51:21 +01:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master