llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-03 14:13:46 +08:00

dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) Johannes Gäßler 2024-08-31 14:35:42 +02:00
ba1cf846ed cann : fix doxy (ggml/0) Georgi Gerganov 2024-08-28 18:45:01 +03:00
d2d3200b38 cann : add Ascend NPU support (whisper/2336) Mengqing Cao 2024-08-09 20:21:56 +08:00
51d964a4ef cuda : mark BF16 CONT as unsupported Georgi Gerganov 2024-08-28 17:08:03 +03:00
efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) Salvatore Mesoraca 2024-08-28 10:23:02 +02:00
fbb7fcffbc

llama : set attrs of mislabelled EOT/EOM tokens (#9348) Kevin Gibbons 2024-09-07 22:51:00 -07:00
a5b5d9a101

llama.android : fix build (#9350) Georgi Gerganov 2024-09-08 00:33:50 +03:00
f12295b8a9

llama : fix empty ring buffer push (#9358) Georgi Gerganov 2024-09-08 00:33:33 +03:00
faf69d4237

llama : sanitize invalid tokens (#9357) Georgi Gerganov 2024-09-08 00:33:13 +03:00
e536426ded

llamafile : disable sgemm for batch-size 1 (#9330) Eve 2024-09-07 19:02:26 +00:00
1b9ae5189c

common : refactor arg parser (#9308) Xuan Son Nguyen 2024-09-07 20:43:51 +02:00
e32d0816ed

ggml : always check bounds on get_rows operations (#9354) slaren 2024-09-07 20:23:07 +02:00
df270ef745

llama : refactor sampling v2 (#9294) Georgi Gerganov 2024-09-07 15:16:19 +03:00
947538acb8

ggml : fix missing cpu_set_t on emscripten (#9336) Xuan Son Nguyen 2024-09-07 12:01:34 +02:00
6c89eb0b47

ci : disable rocm image creation (#9340) slaren 2024-09-07 09:48:54 +02:00
9b2c24c099

server : simplify state machine for slot (#9283) Xuan Son Nguyen 2024-09-06 23:21:29 +02:00
134bc38ecf

llama-bench : log benchmark progress (#9287) Aarni Koskela 2024-09-07 00:03:01 +03:00
815b1fb20a

batched-bench : add --output-format jsonl option (#9293) Aarni Koskela 2024-09-06 18:59:58 +03:00
409dc4f8bb

ggml : fix build break for the vulkan-debug (#9265) Changyeon Kim 2024-09-06 21:54:50 +09:00
4a1411b4f1

server : fix missing lock (#9334) Xuan Son Nguyen 2024-09-06 14:06:04 +02:00
8ebe8ddebd

Improve Vulkan shader build system (#9239) Markus Tavenrath 2024-09-06 08:56:17 +02:00
9bc6db28d0

ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) compilade 2024-09-05 21:48:47 -04:00
32b2ec88bc

Update build.yml (#9184) awatuna 2024-09-06 06:34:36 +08:00
1031771faa

CMake fix: host for msvc compiler can only be x86 or x64 (#8624) Michael Podvitskiy 2024-09-06 00:14:12 +02:00
4db04784f9

cuda : fix defrag with quantized KV (#9319) slaren 2024-09-05 11:13:11 +02:00
bdf314f38a

llama-bench : fix NUL terminators in CPU name (#9313) slaren 2024-09-05 02:19:39 +02:00
581c305186

ggml : AVX2 support for Q4_0_8_8 (#8713) Srihari-mcw 2024-09-04 22:21:22 +05:30
5910ea9427

[SYCL] Fix DMMV dequantization (#9279) Ouadie EL FAROUKI 2024-09-04 16:26:33 +01:00
c8671ae282

Fix broken links in docker.md (#9306) 杨朱 · Kiki 2024-09-04 19:45:28 +08:00
82e3b03c11

rpc : make RPC servers come first in the device list (#9296) Radoslav Gerganov 2024-09-04 11:08:32 +03:00
9379d3cc17

readme : rename result_format to response_format (#9300) Pascal Patry 2024-09-04 02:45:40 -04:00
7605ae7daf

flake.lock: Update (#9261) Georgi Gerganov 2024-09-04 02:36:43 +03:00
8962422b1c

llama-bench : add JSONL (NDJSON) output mode (#9288) Aarni Koskela 2024-09-03 20:58:54 +03:00
b69a480af4

readme : refactor API section + remove old hot topics Georgi Gerganov 2024-09-03 10:00:36 +03:00
48baa61ecc

server : test script : add timeout for all requests (#9282) Xuan Son Nguyen 2024-09-02 22:08:38 +02:00
f1485161e5

src: make tail invalid when kv cell is intersection for mamba (#9249) Zhenwei Jin 2024-09-03 01:53:23 +08:00
048de848ee

docker : fix missing binaries in full-cuda image (#9278) slaren 2024-09-02 18:11:13 +02:00
f771d064a9

ggml : add pthread includes on FreeBSD (#9258) yuri@FreeBSD 2024-09-02 08:25:30 -07:00
6e7d133a5f

server : refactor multitask handling (#9274) Xuan Son Nguyen 2024-09-02 17:11:51 +02:00
b60074f1c2

llama-cli : remove duplicated log message (#9275) Guoliang Hua 2024-09-02 20:36:43 +08:00
9c1ba55733

build(nix): Package gguf-py (#5664) Tushar 2024-09-02 16:51:01 +05:30
c6d4cb4655

llama : minor style Georgi Gerganov 2024-09-02 11:52:04 +03:00
8f1d81a0b6

llama : support RWKV v6 models (#8980) Molly Sophia 2024-09-01 22:38:17 +08:00
a47667cff4 nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook Echo Nolan 2024-08-22 17:19:14 -04:00
ea5d7478b1

sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908) Srihari-mcw 2024-08-31 13:50:35 +05:30
49271efbaf

llama : fix typo in xcda_array_view comment [no ci] (#9132) Daniel Bevenius 2024-08-31 09:50:22 +02:00
0ab30f8d82

llama : fix llama_split_mode enum values in main_gpu document (#9057) Sutou Kouhei 2024-08-31 03:08:10 +09:00
cddae4884c

Correct typo run_llama2.sh > run-llama2.sh (#9149) 蕭澧邦 2024-08-30 20:10:01 +08:00
7ea8d80d53

llava : the function "clip" should be int (#9237) tc-mb 2024-08-30 13:21:57 +08:00
42c76d1358

Threadpool: take 2 (#8672) Faisal Zaghloul 2024-08-29 19:20:53 -04:00
9f7d4bcf5c server : fix crash when error handler dumps invalid utf-8 json (#9195) Jan Boon 2024-08-27 18:28:06 +08:00
1d1ccce676

flake.lock: Update (#9162) Georgi Gerganov 2024-08-29 07:28:14 +03:00
9fe94ccac9

docker : build images only once (#9225) slaren 2024-08-28 17:28:00 +02:00
66b039a501

docker : update CUDA images (#9213) slaren 2024-08-28 13:20:36 +02:00
20f1789dfb vulkan : fix build (#0) Georgi Gerganov 2024-08-27 22:10:58 +03:00
231cff5f6f sync : ggml Georgi Gerganov 2024-08-27 22:01:45 +03:00
3246fe84d7

Fix minicpm example directory (#9111) Xie Yanbo 2024-08-27 20:33:08 +08:00
78eb487bb0

llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156) compilade 2024-08-27 06:09:23 -04:00
a77feb5d71

server : add some missing env variables (#9116) Xuan Son Nguyen 2024-08-27 11:07:01 +02:00
2e59d61c1b

llama : fix ChatGLM4 wrong shape (#9194) CausalLM 2024-08-27 14:58:22 +08:00
75e1dbbaab

llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141) Carsten Kragelund Jørgensen 2024-08-27 08:53:40 +02:00
ad76569f8e

common : Update stb_image.h to latest version (#9161) arch-btw 2024-08-26 22:58:50 -07:00
7d787ed96c

ggml : do not crash when quantizing q4_x_x with an imatrix (#9192) slaren 2024-08-26 19:44:43 +02:00
06658ad7c3

metal : separate scale and mask from QKT in FA kernel (#9189) Georgi Gerganov 2024-08-26 18:31:02 +03:00
fc18425b6a

ggml : add SSM Metal kernels (#8546) Georgi Gerganov 2024-08-26 17:55:36 +03:00
879275ac98

tests : fix compile warnings for unreachable code (#9185) Georgi Gerganov 2024-08-26 16:30:25 +03:00
7a3df798fc

ci : add VULKAN support to ggml-ci (#9055) Georgi Gerganov 2024-08-26 12:19:39 +03:00
e5edb210cd

server : update deps (#9183) Georgi Gerganov 2024-08-26 12:16:57 +03:00
0c41e03ceb

metal : gemma2 flash attention support (#9159) slaren 2024-08-26 11:08:59 +02:00
f12ceaca0c

ggml-ci : try to improve build time (#9160) slaren 2024-08-26 11:03:30 +02:00
436787f170

llama : fix time complexity of string replacement (#9163) Justine Tunney 2024-08-25 23:09:53 -07:00
93bc3839f9

common: fixed not working find argument --n-gpu-layers-draft (#9175) Herman Semenov 2024-08-25 22:54:37 +00:00
f91fc5639b

CUDA: fix Gemma 2 numerical issues for FA (#9166) Johannes Gäßler 2024-08-25 22:11:48 +02:00
e11bd856d5

CPU/CUDA: Gemma 2 FlashAttention support (#8542) Johannes Gäßler 2024-08-24 21:34:59 +02:00
8f824ffe8e

quantize : fix typo in usage help of quantize.cpp (#9145) João Dinis Ferreira 2024-08-24 07:22:45 +01:00
3ba780e2a8

lora : fix llama conversion script with ROPE_FREQS (#9117) Xuan Son Nguyen 2024-08-23 12:58:53 +02:00
a07c32ea54

llama : use F32 precision in GLM4 attention and no FA (#9130) piDack 2024-08-23 15:27:17 +08:00
11b84eb457

[SYCL] Add a space to supress a cmake warning (#9133) Akarshan Biswas 2024-08-22 19:39:47 +05:30
1731d4238f

[SYCL] Add oneDNN primitive support (#9091) luoyu-intel 2024-08-22 12:50:10 +08:00
a1631e53f6

llama : simplify Mamba with advanced batch splits (#8526) compilade 2024-08-21 17:58:11 -04:00
fc54ef0d1c

server : support reading arguments from environment variables (#9105) Xuan Son Nguyen 2024-08-21 11:04:34 +02:00
b40eb84895

llama : support for falcon-mamba architecture (#9074) Younes Belkada 2024-08-21 12:06:36 +04:00
f63f603c87

llava : zero-initialize clip_ctx structure fields with aggregate initialization 908) fairydreaming 2024-08-21 09:45:49 +02:00
8455340b87

llama : std::move llm_bigram_bpe from work_queue (#9062) Daniel Bevenius 2024-08-21 09:32:58 +02:00
2f3c1466ff

llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984) Changyeon Kim 2024-08-21 04:00:00 +09:00
50addec9a5

[SYCL] fallback mmvq (#9088) Meng, Hengyu 2024-08-20 23:50:17 +08:00
4f8d19ff17

[SYCL] Fix SYCL im2col and convert Overflow with Large Dims (#9052) zhentaoyu 2024-08-20 23:06:51 +08:00
90db8146d5

tests : add missing comma in grammar integration tests (#9099) fairydreaming 2024-08-20 11:09:55 +02:00
cfac111e2b

cann: add doc for cann backend (#8867) wangshuai09 2024-08-19 16:46:38 +08:00
1b6ff90ff8

rpc : print error message when failed to connect endpoint (#9042) Radoslav Gerganov 2024-08-19 10:11:45 +03:00
18eaf29f4c

rpc : prevent crashes on invalid input (#9040) Radoslav Gerganov 2024-08-19 10:10:21 +03:00
554b049068

flake.lock: Update (#9068) Georgi Gerganov 2024-08-18 17:43:32 +03:00
2339a0be1c

tests : add integration test for lora adapters (#8957) ltoniazzi 2024-08-18 10:58:04 +01:00
2fb9267887

Fix incorrect use of ctx_split for bias tensors (#9063) Yoshi Suhara 2024-08-17 06:34:21 -07:00
8b3befc0e2

server : refactor middleware and /health endpoint (#9056) Xuan Son Nguyen 2024-08-16 17:19:05 +02:00
d565bb2fd5

llava : support MiniCPM-V-2.6 (#8967) tc-mb 2024-08-16 21:34:41 +08:00
ee2984bdaf

py : fix wrong input type for raw_dtype in ggml to gguf scripts (#8928) Farbod Bijary 2024-08-16 14:06:30 +03:30
c8ddce8560

Fix inference example lacks required parameters (#9035) Aisuko 2024-08-16 19:08:59 +10:00
23fd453544

gguf-py : bump version from 0.9.1 to 0.10.0 (#9051) compilade 2024-08-16 02:36:11 -04:00
c679e0cb5c

llama : add EXAONE model support (#9025) Minsoo Cheong 2024-08-16 15:35:18 +09:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master