llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-02 13:44:42 +08:00

4314e56c4f

server : use lambda instead of std::bind (#11507) Daniel Bevenius 2025-01-30 11:05:00 +01:00
496e5bf46b

server : (docs) added response format for /apply-template [no ci] (#11503) Isaac McFadyen 2025-01-30 04:11:53 -05:00
7919256c57

readme : reference examples relative links (#11505) Guspan Tanadi 2025-01-30 12:58:02 +07:00
e0449763a4

server : update json snippets in README.md [no ci] (#11492) Daniel Bevenius 2025-01-30 05:48:14 +01:00
eb7cf15a80

server : add /apply-template endpoint for additional use cases of Minja functionality (#11489) Nigel Bosch 2025-01-29 12:45:44 -06:00
66ee4f297c

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) Rémy Oudompheng 2025-01-29 18:29:39 +01:00
e51c47b401

server : update auto gen files comments [no ci] (#11484) Daniel Bevenius 2025-01-29 16:34:18 +01:00
2711d0215f

vulkan: Catch pipeline creation failure and print an error message (#11436) Jeff Bolz 2025-01-29 09:26:50 -06:00
f0d4b29edf

Parse https://ollama.com/library/ syntax (#11480) Eric Curtin 2025-01-29 12:23:10 +01:00
815857791d

sync : ggml Georgi Gerganov 2025-01-29 11:25:29 +02:00
1a0e87d291

ggml : add option to not print stack on abort (ggml/1081) William Tambellini 2025-01-23 11:59:08 -08:00
d2e518e9b4

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) issixx 2025-01-17 21:29:08 +09:00
b636228c0a

embedding : enable --no-warmup option (#11475) Daniel Bevenius 2025-01-29 09:38:54 +01:00
325afb370a

llama: fix missing k_cache store for rwkv6qwen2 (#11445) Molly Sophia 2025-01-29 12:07:21 +08:00
794fe23f29

cmake: add hints for locating ggml on Windows using Llama find-package (#11466) Emreerdog 2025-01-29 02:22:06 +03:00
cf8cc856d7

server : Fixed wrong function name in llamacpp server unit test (#11473) peidaqi 2025-01-28 16:03:42 -07:00
d0c08040b6

ci : fix build CPU arm64 (#11472) Xuan-Son Nguyen 2025-01-29 00:02:56 +01:00
be5ef7963f

HIP: Supress transformation warning in softmax.cu uvos 2025-01-28 23:06:32 +01:00
cae9fb4361

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080) Nikita Sarychev 2025-01-28 07:42:20 -08:00
7fee2889e6

Add github protocol pulling and http:// (#11465) Eric Curtin 2025-01-28 15:45:41 +01:00
d7d1eccacc

docker: allow installing pip packages system-wide (#11437) Nuno 2025-01-28 15:17:25 +01:00
4bf3119d61

cmake : don't fail on GGML_CPU=OFF (#11457) someone13574 2025-01-28 09:15:34 -05:00
f643120bad

docker: add perplexity and bench commands to full image (#11438) Nuno 2025-01-28 11:42:32 +01:00
6e84b0ab8e

SYCL : SOFTMAX F16 mask support and other fixes (#11261) Akarshan Biswas 2025-01-28 15:26:58 +05:30
2b8525d5c8

Handle missing model in CLI parameters for llama-run (#11399) Michael Engel 2025-01-28 09:32:40 +01:00
a4417ddda9

Add new hf protocol for ollama (#11449) Eric Curtin 2025-01-27 19:36:10 +01:00
d6d24cd9ed

AMD: parse the architecture as supplied by gcnArchName (#11244) Haus1 2025-01-27 08:58:17 -05:00
a5203b4465

llama : minor fixes for up llama load model speed (#11448) lexasub 2025-01-27 17:42:09 +04:00
df984e0147

llama: refactor llama_decode_impl (#11381) Johannes Gäßler 2025-01-27 12:07:12 +01:00
acd38efee3

metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441) Ihar Hrachyshka 2025-01-27 02:41:59 -05:00
caf773f249

docker : fix ARM build and Vulkan build (#11434) Xuan Son Nguyen 2025-01-26 22:45:32 +01:00
178a7eb952

metal : use residency sets (#11427) Georgi Gerganov 2025-01-26 20:06:16 +02:00
6f53d8a6b4

docker: add missing vulkan library to base layer and update to 24.04 (#11422) Nuno 2025-01-26 18:22:43 +01:00
19f65187cb

cmake: add ggml find package (#11369) bandoti 2025-01-26 12:07:48 -04:00
1d8ee06000

rpc: fix register position (#11424) Frank Mai 2025-01-26 23:20:34 +08:00
2cc9b8c32c

readme : update hot topics Georgi Gerganov 2025-01-26 14:30:15 +02:00
f35726c2fb

build: apply MSVC /bigobj option to c/cpp files only (#11423) Jeff Bolz 2025-01-25 20:10:03 -06:00
4a75d19376

vulkan: compile shaders on-demand (#11406) Jeff Bolz 2025-01-25 15:29:57 -06:00
26771a1491

Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) uvos 2025-01-25 21:01:12 +01:00
ca6baf76c1

build: add /bigobj to MSVC build (#11407) Jeff Bolz 2025-01-25 11:26:37 -06:00
6e264a905b

docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) Diego Devesa 2025-01-25 17:22:41 +01:00
49b0e3cec4

server : fix cleaning up stream task (#11418) Xuan Son Nguyen 2025-01-25 16:36:44 +01:00
20a758155b

docker : fix CPU ARM build (#11403) Diego Devesa 2025-01-25 15:22:29 +01:00
00c24acb2a

ci : fix line breaks on windows builds (#11409) Georgi Gerganov 2025-01-25 13:36:48 +02:00
466ea66f33

CANN: Add Ascend CANN build ci (#10217) jiahao su 2025-01-25 07:26:01 +08:00
5f0db9522f

hip : Add hipGraph and VMM support to ROCM (#11362) uvos 2025-01-25 00:02:23 +01:00
c5d9effb49

CUDA: fix FP16 cuBLAS GEMM (#11396) Johannes Gäßler 2025-01-24 21:02:43 +01:00
9fbadaef4f

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356) uvos 2025-01-24 17:50:49 +01:00
9755129c27

release : pack /lib in the packages (#11392) Georgi Gerganov 2025-01-24 18:41:30 +02:00
a07c2c8a52

docs : Update readme to build targets for local docker build (#11368) Jafar Uruç 2025-01-24 13:30:13 +00:00
8137b4bb2b

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380) Johannes Gäßler 2025-01-24 12:38:31 +01:00
1af6945eb0

cmake : avoid -march=native when reproducible build is wanted (#11366) Bernhard M. Wiedemann 2025-01-24 12:21:35 +01:00
01f37edf1a

Update llama-run README.md (#11386) Eric Curtin 2025-01-24 09:39:24 +00:00
c07e87f38b

server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364) stduhpf 2025-01-24 09:02:38 +01:00
564804b79b

tests: fix some mul_mat test gaps (#11375) Jeff Bolz 2025-01-23 14:51:24 -06:00
05f63cc9ee

Update documentation (#11373) Eric Curtin 2025-01-23 20:04:31 +00:00
f7fb43cd0b

Add -ngl (#11372) Eric Curtin 2025-01-23 16:16:18 +00:00
5845661640

server : add more clean up when cancel_tasks is called (#11340) Xuan Son Nguyen 2025-01-23 13:56:05 +01:00
f211d1dc10

Treat hf.co/ prefix the same as hf:// (#11350) Eric Curtin 2025-01-23 10:38:20 +00:00
955a6c2d91

Vulkan-run-test: fix mmq_wg_denoms (#11343) amd-dwang 2025-01-23 15:14:28 +08:00
1971adf55e

vulkan: sort shaders for more deterministic binary (#11315) Jeff Bolz 2025-01-23 01:07:50 -06:00
5245729e33

vulkan: fix diag_mask_inf (#11323) Jeff Bolz 2025-01-23 01:01:17 -06:00
6152129d05

main : update README documentation for batch size (#11353) Diego Devesa 2025-01-22 19:22:20 +01:00
16d3df7ab0

readme : add plugin links (#11355) Georgi Gerganov 2025-01-22 19:44:26 +02:00
12c2bdf2de

server : fix draft context not being released (#11354) Diego Devesa 2025-01-22 17:44:40 +01:00
c64d2becb1

minja: sync at 0f5f7f2b37 (#11352) Olivier Chafik 2025-01-22 16:16:27 +00:00
96f4053934

Adding logprobs to /v1/completions (#11344) Jiří Podivín 2025-01-22 12:51:32 +01:00
a94f3b2727

common: utils to split / join / repeat strings (from json converter) (#11342) Olivier Chafik 2025-01-22 09:51:44 +00:00
3e3357fd77

llava : support Minicpm-omni (#11289) tc-mb 2025-01-22 15:35:48 +08:00
6171c9d258

Add Jinja template support (#11016) Olivier Chafik 2025-01-21 13:18:51 +00:00
e28245f35f

export-lora : fix tok_embd tensor (#11330) Xuan Son Nguyen 2025-01-21 14:07:12 +01:00
6da5bec81c

rpc : better caching of the base buffer pointer (#11331) Radoslav Gerganov 2025-01-21 15:06:41 +02:00
2e2f8f093c

linenoise.cpp refactoring (#11301) Eric Curtin 2025-01-21 09:32:35 +00:00
2139667ec4

metal : fix out-of-bounds write (#11314) Georgi Gerganov 2025-01-21 08:48:13 +02:00
80d0d6b4b7

common : add -hfd option for the draft model (#11318) Georgi Gerganov 2025-01-20 22:29:43 +02:00
aea8ddd516

vulkan: fix coopmat2 validation failures (#11284) Jeff Bolz 2025-01-20 10:38:32 -06:00
9f7add1cde

examples : fix add_special conditions (#11311) Georgi Gerganov 2025-01-20 16:36:08 +02:00
90d987b105

mmap: add include for cerrno (#11296) Christopher Nielsen 2025-01-20 09:02:43 -05:00
a4251edd6f

cmake: fix shell command quoting in build-info script (#11309) Michael Podvitskiy 2025-01-20 15:02:15 +01:00
ec7f3ac9ab

llama : add support for Deepseek-R1-Qwen distill model (#11310) Xuan Son Nguyen 2025-01-20 14:35:07 +01:00
ef6dada60c

cont : fix whitespaces (#11305) Georgi Gerganov 2025-01-20 09:29:32 +02:00
ae3c1db2f9

llama : re-add LLM_ARCH_PHIMOE (#11305) Kyle Bruene 2025-01-20 01:21:01 -06:00
92bc493917

tests : increase timeout when sanitizers are enabled (#11300) Georgi Gerganov 2025-01-19 20:22:30 +02:00
b9daaffe02

simple-chat : fix BOS being added to each message (#11278) Georgi Gerganov 2025-01-19 18:12:09 +02:00
99487b57d4

SYCL: Introducing memory host pool (#11251) Nicolò Scipione 2025-01-19 14:33:34 +01:00
a1649cc13f

Adding linenoise.cpp to llama-run (#11252) Eric Curtin 2025-01-18 14:42:31 +00:00
4dd34ff831

cmake : add sanitizer flags for llama.cpp (#11279) Georgi Gerganov 2025-01-18 16:18:15 +02:00
f30f099228

server : implement cancellable request (#11285) Xuan Son Nguyen 2025-01-18 14:12:05 +01:00
f26c874179

scripts : restore hf.sh (#11288) Georgi Gerganov 2025-01-18 13:18:32 +02:00
6390a998bf

tts : add guide tokens support (#11186) LostRuins Concedo 2025-01-18 18:20:57 +08:00
44e18ef939

vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281) Jeff Bolz 2025-01-18 02:26:50 -06:00
3edfa7d375

llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) codezjx 2025-01-17 20:57:56 +08:00
667d72846c

rpc : early register backend devices (#11262) Radoslav Gerganov 2025-01-17 10:57:09 +02:00
a133566d34

vocab : fix double-eos check (#11273) Georgi Gerganov 2025-01-17 09:28:00 +02:00
960ec65273

llama : fix deprecation message: vocabable -> vocab (#11269) David Renshaw 2025-01-17 02:12:01 -05:00
7a689c415e

README : added kalavai to infrastructure list (#11216) musoles 2025-01-17 00:10:49 +00:00
bd38ddea01

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166) Jeff Bolz 2025-01-16 15:47:10 -06:00
466300fe14

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206) Jeff Bolz 2025-01-16 15:23:49 -06:00
206bc53422

vulkan: optimize coopmat2 q2_k dequant function (#11130) Jeff Bolz 2025-01-16 15:16:39 -06:00
4dbc8b9cb7

llama : add internlm3 support (#11233) RunningLeon 2025-01-17 02:10:38 +08:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master