llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-02 13:44:42 +08:00

71a64989a5

vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506) Jeff Bolz 2024-11-27 01:08:54 -06:00
4a57d362e1

vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459) Jeff Bolz 2024-11-27 01:00:50 -06:00
c9b00a70b0

ci : fix cuda releases (#10532) Diego Devesa 2024-11-26 22:12:10 +01:00
de5097351c

Add OLMo 2 model in docs (#10530) Shane A 2024-11-26 12:55:29 -08:00
5a349f2809

ci : remove nix workflows (#10526) Diego Devesa 2024-11-26 21:13:54 +01:00
30ec398321

llama : disable warnings for 3rd party sha1 dependency (#10527) Diego Devesa 2024-11-26 21:01:47 +01:00
be0e350c8b

Fix HIP flag inconsistency & build docs (#10524) Tristan Druyen 2024-11-26 19:27:28 +01:00
249cd93da3

mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#10516) R0CKSTAR 2024-11-27 00:00:41 +08:00
904109ed0d

vulkan: fix group_norm (#10496) Jeff Bolz 2024-11-26 09:45:05 -06:00
45abe0f74e

server : replace behave with pytest (#10416) Xuan Son Nguyen 2024-11-26 16:20:18 +01:00
0bbd2262a3

restore the condistion to build & update pacakge when merge (#10507) Neo Zhang Jianyu 2024-11-26 21:43:47 +08:00
ab96610b1e

cmake : enable warnings in llama (#10474) Georgi Gerganov 2024-11-26 14:18:08 +02:00
7db3846a94

ci : publish the docker images created during scheduled runs (#10515) Diego Devesa 2024-11-26 13:05:20 +01:00
c6807b3f28

ci : add ubuntu cuda build, build with one arch on windows (#10456) Diego Devesa 2024-11-26 13:05:07 +01:00
25669aa92c

ggml-cpu: cmake add arm64 cpu feature check for macos (#10487) Charles Xu 2024-11-26 12:37:05 +01:00
84e1c33cde

server : fix parallel speculative decoding (#10513) Georgi Gerganov 2024-11-26 13:36:40 +02:00
811872a59d

speculative : simplify the implementation (#10504) Georgi Gerganov 2024-11-26 12:29:38 +02:00
9a4b79bcfa

CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454) Shanshan Shen 2024-11-26 18:08:37 +08:00
7066b4cce2

CANN: RoPE and CANCAT operator optimization (#10488) Chenguang Li 2024-11-26 17:31:05 +08:00
0eb4e12bee

vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484) Junil Kim 2024-11-26 10:47:20 +09:00
0cc63754b8

Introduce llama-run (#10291) Eric Curtin 2024-11-25 16:56:24 -05:00
50d5cecbda

ci : build docker images only once daily (#10503) Diego Devesa 2024-11-25 22:05:39 +01:00
9fd8c2687f

server : add more information about error (#10455) Georgi Gerganov 2024-11-25 22:28:27 +02:00
47f931c8f9

server : enable cache_prompt by default (#10501) Georgi Gerganov 2024-11-25 21:50:07 +02:00
106964e3d2

metal : enable mat-vec kernels for bs <= 4 (#10491) Georgi Gerganov 2024-11-25 21:49:31 +02:00
80acb7b430

Rename Olmo1124 to Olmo2 (#10500) Shane A 2024-11-25 10:36:09 -08:00
10bce0450f

llama : accept a list of devices to use to offload a model (#10497) Diego Devesa 2024-11-25 19:30:06 +01:00
1f922254f0

Github: update issue templates [no ci] (#10489) Johannes Gäßler 2024-11-25 19:18:37 +01:00
a9a678a6b2

Add download chat feature to server chat (#10481) brucepro 2024-11-25 08:11:55 -08:00
9ca2e67762

server : add speculative decoding support (#10455) Georgi Gerganov 2024-11-25 16:31:38 +02:00
5931c1f233

ggml : add support for dynamic loading of backends (#10469) Diego Devesa 2024-11-25 15:13:39 +01:00
f6d12e7df8

tests : fix compile warning Georgi Gerganov 2024-11-25 15:17:32 +02:00
b756441104

metal : minor code formatting Georgi Gerganov 2024-11-25 15:08:04 +02:00
5a8987793f

[SYCL] Fix building Win package for oneAPI 2025.0 update (#10483) Neo Zhang Jianyu 2024-11-25 17:31:10 +08:00
d9d54e498d

speculative : refactor and add a simpler example (#10362) Georgi Gerganov 2024-11-25 09:58:41 +02:00
cce5a90075

flake.lock: Update (#10470) Georgi Gerganov 2024-11-24 18:03:25 +02:00
dc39012cba

llama : fix op mul check with command-r-plus (#10476) Diego Devesa 2024-11-24 16:10:26 +01:00
9336db462c

convert : XLMRoberta Type Vocab Size (#10458) Gabe Goodhart 2024-11-24 02:02:34 -07:00
96fa2c5e2d

fix gguf-py: Conversion error when multiple licenses are configured (#9807) momonga 2024-11-24 09:09:22 +09:00
55ed008b2d

ggml : do not use ARM features not included in the build (#10457) Diego Devesa 2024-11-23 14:41:12 +01:00
6dfcfef078

ci: Update oneAPI runtime dll packaging (#10428) 蕭澧邦 2024-11-22 17:44:08 +08:00
599b3e0cd4

GitHub: ask for more info in issue templates (#10426) Johannes Gäßler 2024-11-22 08:32:40 +01:00
c18610b4ee

CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216) leo-pony 2024-11-22 14:07:20 +08:00
a5e47592b6

cuda : optimize argmax (#10441) Diego Devesa 2024-11-21 18:18:50 +01:00
1bb30bf28c

llama : handle KV shift for recurrent models (#10402) Georgi Gerganov 2024-11-21 10:22:47 +02:00
87a533be57

sync : ggml Georgi Gerganov 2024-11-21 09:22:11 +02:00
59b9172822

ggml/sched : do not skip views in pre-assignments slaren 2024-11-20 13:25:08 +01:00
02e4eaf22f

ggml-opt: fix data corruption (ggml/1022) Johannes Gäßler 2024-11-20 14:56:04 +01:00
9abe9eeae9

vulkan: predicate max operation in soft_max shaders/soft_max (#10437) Jeff Bolz 2024-11-20 13:47:36 -06:00
f95caa7954

cmake: add link dependencies to cmake find pkg (#10433) bandoti 2024-11-20 12:22:19 -04:00
fab5d30ff6

llama : add .clang-format file (#10415) Diego Devesa 2024-11-20 12:57:53 +01:00
8fd4b7fa29

vulkan: copy iq4_nl LUT into shared memory (#10409) Jeff Bolz 2024-11-20 01:40:18 -06:00
1bacb9f625

vulkan: further optimize mul_mat_vec using larger loads (#10387) Jeff Bolz 2024-11-20 01:11:00 -06:00
ad21c9e1f1

update rel to 4040 (#10395) Neo Zhang Jianyu 2024-11-20 13:54:25 +08:00
3952a221af

Fix missing file renames in Makefile due to changes in commit ae8de6d50a (#10413) Anthony Van de Gejuchte 2024-11-19 23:18:17 +01:00
42ae10bbcd

add cmake rvv support (#10411) haopeng 2024-11-20 04:10:31 +08:00
9fe0fb0626 sync : ggml Georgi Gerganov 2024-11-19 19:15:50 +02:00
611fabd792 metal : fox offset integer overflows in im2col (ggml/1015) Plamen Minev 2024-11-18 15:02:27 +02:00
12b0ad953a metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) PAB 2024-11-18 10:02:49 +01:00
342397dc7e

cmake: force MSVC compiler charset to utf-8 (#9989) 蕭澧邦 2024-11-20 01:42:00 +08:00
2a11b6b094

Add required ggml-base and backend libs to cmake pkg (#10407) bandoti 2024-11-19 12:10:30 -04:00
3ee6382d48

cuda : fix CUDA_FLAGS not being applied (#10403) Diego Devesa 2024-11-19 14:29:38 +01:00
8e752a777b

llama : add check for KV cache shifts (#10401) Georgi Gerganov 2024-11-19 13:29:26 +02:00
a88ad007de

llama : add OLMo November 2024 support (#10394) Shane A 2024-11-19 01:04:08 -08:00
2a1507c162

sycl : Add option to set the SYCL architecture for all targets (#10266) Romain Biessy 2024-11-19 09:02:23 +01:00
b3e585988f

vulkan: Optimize soft_max (#10301) Jeff Bolz 2024-11-19 01:25:17 -06:00
557924f222

sycl: Revert MUL_MAT_OP support changes (#10385) Alberto Cabrera Pérez 2024-11-19 00:50:04 +00:00
d3481e6316

cuda : only use native when supported by cmake (#10389) Diego Devesa 2024-11-18 18:43:40 +01:00
531cb1c233

Skip searching root path for cross-compile builds (#10383) bandoti 2024-11-18 11:23:58 -04:00
f139d2ea61

vulkan: remove use of null initializer (#10372) Jeff Bolz 2024-11-18 08:28:42 -06:00
2eb76b2a5e

flake.lock: Update (#10346) Georgi Gerganov 2024-11-18 16:08:20 +02:00
9b75f03cd2

Vulkan: Fix device info output format specifiers (#10366) 0cc4m 2024-11-18 11:02:43 +01:00
75207b3a88

docker: use GGML_NATIVE=OFF (#10368) Johannes Gäßler 2024-11-18 00:21:53 +01:00
76e9e58b78

CUDA: fix MMV kernel being used for FP16 src1 (#10357) Johannes Gäßler 2024-11-17 23:20:42 +01:00
ce2e59ba10

CMake: fix typo in comment [no ci] (#10360) Johannes Gäßler 2024-11-17 12:59:38 +01:00
be5caccef9

llama : only use default buffer types for the KV cache (#10358) Diego Devesa 2024-11-17 12:25:45 +01:00
20a780c7b6

gitignore : ignore local run scripts [no ci] Georgi Gerganov 2024-11-17 13:12:22 +02:00
cf32a9b93a

metal : refactor kernel args into structs (#10238) Georgi Gerganov 2024-11-17 11:23:01 +02:00
a43178299c

ggml : fix undefined reference to 'getcpu' (#10354) FirstTimeEZ 2024-11-17 21:39:22 +13:00
c3ea58aca4

CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) Johannes Gäßler 2024-11-17 09:09:55 +01:00
467576b6cc

CMake: default to -arch=native for CUDA build (#10320) Johannes Gäßler 2024-11-17 09:06:34 +01:00
eda7e1d4f5

ggml : fix possible buffer use after free in sched reserve (#9930) Diego Devesa 2024-11-17 07:31:17 +01:00
24203e9dd7 ggml : inttypes.h -> cinttypes (#0) Georgi Gerganov 2024-11-16 23:40:39 +02:00
5d9e59979c ggml : adapt AMX to tensor->grad removal (#0) Georgi Gerganov 2024-11-16 21:38:01 +02:00
a4200cafad make : add ggml-opt (#0) Georgi Gerganov 2024-11-16 21:35:31 +02:00
84274a10c3 tests : remove test-grad0 Georgi Gerganov 2024-11-16 21:34:03 +02:00
68fcb4759c ggml : fix compile warnings (#0) Georgi Gerganov 2024-11-16 21:32:41 +02:00
8a43e940ab ggml: new optimization interface (ggml/988) Johannes Gäßler 2024-11-16 22:17:59 +02:00
5c9a8b22b1 scripts : update sync Georgi Gerganov 2024-11-16 22:16:04 +02:00
0fff7fd798

docs : vulkan build instructions to use git bash mingw64 (#10303) FirstTimeEZ 2024-11-17 12:29:18 +13:00
4e54be0ec6

llama/ex: remove --logdir argument (#10339) Johannes Gäßler 2024-11-16 23:00:41 +01:00
db4cfd5dbc llamafile : fix include path (#0) Georgi Gerganov 2024-11-16 17:58:56 +02:00
8ee0d09ae6 make : auto-determine dependencies (#0) Georgi Gerganov 2024-11-16 17:58:32 +02:00
bcdb7a2386

server: (web UI) Add samplers sequence customization (#10255) MaggotHATE 2024-11-16 18:26:54 +05:00
f245cc28d4

scripts : fix missing key in compare-llama-bench.py (#10332) Georgi Gerganov 2024-11-16 10:32:50 +02:00
772703c8ff

vulkan: Optimize some mat-vec mul quant shaders (#10296) Jeff Bolz 2024-11-16 00:26:57 -06:00
dd3a6ce9f8

vulkan : add cmake preset debug/release (#10306) FirstTimeEZ 2024-11-16 14:59:33 +13:00
1e58ee1318

ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) Dan Johansson 2024-11-16 01:53:37 +01:00
89e4caaaf0

llama : save number of parameters and the size in llama_model (#10286) FirstTimeEZ 2024-11-16 13:42:13 +13:00
74d73dc85c

Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314) Srihari-mcw 2024-11-16 02:57:00 +05:30

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master