llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-04 06:33:12 +08:00

08c5ee87e4

llama : remove deprecated API (#5770) Georgi Gerganov 2024-02-28 18:43:38 +02:00
78aacf3634

awq-py : remove (#5768) Georgi Gerganov 2024-02-28 17:36:53 +02:00
8c0e8f4e73

sync : ggml Georgi Gerganov 2024-02-28 11:17:32 +02:00
2774b0c974

add google magika inference example (ggml/748) slaren 2024-02-25 20:41:35 +01:00
5f70671856

Introduce backend GUIDs (ggml/743) UEXTM.com 2024-02-24 11:27:36 -05:00
a693bea1e6

server : hit Ctrl+C twice to exit (#5734) Xuan Son Nguyen 2024-02-28 09:55:37 +01:00
adcb12a9ba

llama : fix non-quantization of expert gating tensors (#5754) compilade 2024-02-28 03:52:56 -05:00
177628bfd8

llama : improve BERT tokenization (#5740) Douglas Hanley 2024-02-28 02:51:11 -06:00
6c4416868d

readme : add link to LLaVA 1.6 models (#5758) Daniel Bevenius 2024-02-28 09:39:39 +01:00
efc72253f7

server : add "/chat/completions" alias for "/v1/...` (#5722) Jorge A 2024-02-28 01:39:15 -07:00
7c4263d426

ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) Kawrakow 2024-02-28 10:37:02 +02:00
cb49e0f8c9

Attempt to fix android build (#5752) Kawrakow 2024-02-27 19:16:49 +02:00
0becb22ac0

IQ4_XS: a 4.25 bpw quantization (#5747) Kawrakow 2024-02-27 16:34:24 +02:00
c24a2a6e60

cuda : replace remaining shfl_xor with calls to warp_reduce functions (#5744) Engininja2 2024-02-27 07:22:45 -06:00
1f30b7a9f1

ggml-quants : fix avx2 iq1_s vec_dot when compiled with gcc (#5742) Engininja2 2024-02-27 06:50:18 -06:00
9d533a77d0

llama : fix defrag bugs + add parameter (#5735) Georgi Gerganov 2024-02-27 14:35:51 +02:00
cbbd1efa06

Makefile: use variables for cublas (#5689) le.chang 2024-02-27 10:03:06 +08:00
b11a93df41

fix server hangs on empty prompt (#5733) Xuan Son Nguyen 2024-02-26 23:15:48 +01:00
a33e6a0d2a

Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721) Kawrakow 2024-02-26 18:28:38 +02:00
47bb7b48c7

CUDA: fix DEBUG_CUDA_MALLOC (#5729) Johannes Gäßler 2024-02-26 15:36:38 +01:00
c4d7f81786

readme : update ui list (#5731) Artem 2024-02-26 17:15:28 +03:00
e849078c6e

[SYCL] Add support for soft_max ALiBi (#5639) AidanBeltonS 2024-02-26 14:02:11 +00:00
67fd33132f

unicode : reuse iterator (#5726) Georgi Gerganov 2024-02-26 14:02:12 +02:00
4804215cb8

server: CI fix trailing space (#5728) Pierrick Hymbert 2024-02-26 11:41:34 +01:00
8a533f0d90

server: CI tests reduce build matrix (#5725) Pierrick Hymbert 2024-02-26 09:56:10 +01:00
269de86ba0

llama : fix Gemma rope type (#5691) Georgi Gerganov 2024-02-26 08:30:17 +02:00
c393733988 flake.lock: Update github-actions[bot] 2024-02-25 00:17:11 +00:00
e3965cf35a

server: tests - slow inference causes timeout on the CI (#5715) Pierrick Hymbert 2024-02-25 22:48:33 +01:00
8b350356b2

server: docs - refresh and tease a little bit more the http server (#5718) Pierrick Hymbert 2024-02-25 21:46:29 +01:00
bf08e00643

llama : refactor k-shift implementation + KV defragmentation (#5691) Georgi Gerganov 2024-02-25 22:12:24 +02:00
f7625019c5

server : fix crash when system prompt is bigger than batch size (#5714) compilade 2024-02-25 13:43:50 -05:00
abbabc5e51

ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711) Radosław Gryta 2024-02-25 19:43:00 +01:00
f1a98c5254

make : fix nvcc version is empty (#5713) kwin1412 2024-02-26 00:46:49 +08:00
7d548a1827

readme : add Msty to UI list (#5618) Ashok Gelal 2024-02-25 10:57:34 -05:00
930b178026

server: logs - unified format and --log-format option (#5700) Pierrick Hymbert 2024-02-25 13:50:32 +01:00
d52d7819b8

server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708) Pierrick Hymbert 2024-02-25 13:49:43 +01:00
1289408817

cmake : fix compilation for Android armeabi-v7a (#5702) Radosław Gryta 2024-02-25 11:53:11 +01:00
ab336a9d5e

code : normalize enum names (#5697) Georgi Gerganov 2024-02-25 12:09:09 +02:00
69917dfa55

py : fix StableLM conversion after config.json changes (#5703) Anas Ahouzi 2024-02-25 10:54:04 +01:00
9e359a4f47

server: continue to update other slots on embedding concurrent request (#5699) Pierrick Hymbert 2024-02-24 19:16:04 +01:00
4c4cb30736

IQ3_S: a much better alternative to Q3_K (#5676) Kawrakow 2024-02-24 16:23:52 +02:00
525213d2f5

server: init functional tests (#5566) Pierrick Hymbert 2024-02-24 12:28:55 +01:00
fd43d66f46

server : add KV cache quantization options (#5684) AlpinDale 2024-02-23 19:31:54 +00:00
54fbcd2ce6

convert : fix missing ftype for gemma (#5690) Jared Van Bortel 2024-02-23 13:39:14 -05:00
15499eb942

mpt : do not duplicate token_embd.weight on disk (#5670) Jared Van Bortel 2024-02-22 17:05:23 -05:00
96633eeca1

gemma : use more bits for the token_embd.weight tensor (#5650) Georgi Gerganov 2024-02-22 23:23:46 +02:00
847eedbdb2

py : add Gemma conversion from HF models (#5647) Georgi Gerganov 2024-02-22 23:22:48 +02:00
7e4f339c40

ggml : always define ggml_fp16_t as uint16_t (#5666) Georgi Gerganov 2024-02-22 23:21:39 +02:00
334f76fa38

sync : ggml Georgi Gerganov 2024-02-22 23:21:05 +02:00
efd56b1c21

ggml : 32-bit arm compat (whisper/1891) Georgi Gerganov 2024-02-22 18:31:40 +02:00
201294ae17

nix: init singularity and docker images (#5056) Someone 2024-02-22 19:44:10 +00:00
5a9e2f60ba

py : minor fixes (#5668) Georgi Gerganov 2024-02-22 20:13:25 +02:00
373ee3fbba

Add Gemma chat template (#5665) Xuan Son Nguyen 2024-02-22 19:10:21 +01:00
4cb4d8b22d

workflows: nix: hardcode cachix ids, build unconditionally (#5663) Someone 2024-02-22 16:32:09 +00:00
3a03541ced

minor : fix trailing whitespace (#5638) Georgi Gerganov 2024-02-22 13:54:03 +02:00
56d03d92be

readme : update hot topics Georgi Gerganov 2024-02-22 10:35:54 +02:00
a46f50747b

server : fallback to chatml, add AlphaMonarch chat template (#5628) Xuan Son Nguyen 2024-02-22 09:33:24 +01:00
c5688c6250

server : clarify some params in the docs (#5640) Alexey Parfenov 2024-02-22 08:27:32 +00:00
4ef245a92a

mpt : add optional bias tensors (#5638) Dat Quoc Nguyen 2024-02-22 18:15:13 +10:00
973053d8b0

llama : fix loading models with shared tok_embd and output (#5651) slaren 2024-02-22 00:42:09 +01:00
7c8bcc11dc

Add docs for llama_chat_apply_template (#5645) Xuan Son Nguyen 2024-02-22 00:31:00 +01:00
7fe4678b02

llama : fix session save/load with quantized KV (#5649) slaren 2024-02-21 22:52:39 +01:00
ba2135ccae

gemma : allow offloading the output tensor (#5646) slaren 2024-02-21 22:18:23 +01:00
89febfed93

examples : do not assume BOS when shifting context (#5622) Jared Van Bortel 2024-02-21 10:33:54 -05:00
5022cf242d

sync : ggml Georgi Gerganov 2024-02-21 16:52:39 +02:00
1ecea255eb

server: health: fix race condition on slots data using tasks queue (#5634) Pierrick Hymbert 2024-02-21 15:47:48 +01:00
a00a35cef9

readme : add LocalAI to the availables UI (#5629) Ettore Di Giacinto 2024-02-21 15:39:10 +01:00
eccd7a26dd

sync : ggml (#5633) Georgi Gerganov 2024-02-21 16:17:10 +02:00
c14f72db9c

readme : update hot topics Georgi Gerganov 2024-02-21 15:39:54 +02:00
cc6cac08e3

llava : add --skip-unknown to 1.6 convert.py (#5632) Daniel Bevenius 2024-02-21 14:36:57 +01:00
580111d42b

llama : add gemma model (#5631) postmasters 2024-02-21 05:08:22 -08:00
88c46cbdac

[SYCL] conext add name (#5624) Meng, Hengyu 2024-02-21 17:52:06 +08:00
a14679cc30

IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590) Kawrakow 2024-02-21 11:39:52 +02:00
6560bed3f0

server : support llava 1.6 (#5553) CJ Pais 2024-02-20 11:07:22 -08:00
06bf2cf8c4

make : fix debug build with CUDA (#5616) slaren 2024-02-20 20:06:17 +01:00
4ed8e4fbef

llava : add explicit instructions for llava-1.6 (#5611) Daniel Bevenius 2024-02-20 18:30:27 +01:00
9c405c9f9a

Server: use llama_chat_apply_template (#5593) Xuan Son Nguyen 2024-02-20 15:58:27 +01:00
5207b3fbc5

readme : update UI list (#5605) Dane Madsen 2024-02-20 21:00:23 +11:00
8dbbd75754

metal : add build system support for embedded metal library (#5604) Haoxiang Fei 2024-02-19 22:58:36 -11:00
c0a8c6db37

server : health endpoint configurable failure on no slot (#5594) Pierrick Hymbert 2024-02-20 08:48:19 +01:00
b9111bd209

Update ggml_sycl_op_mul_mat_vec_q (#5502) AidanBeltonS 2024-02-20 07:01:25 +00:00
633782b8d9 nix: now that we can do so, allow MacOS to build Vulkan binaries Mathijs de Bruin 2024-02-13 20:28:02 +00:00
22f83f0c38 Enable Vulkan MacOS CI 0cc4m 2024-02-10 22:18:33 +01:00
bb9dcd560a Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init() 0cc4m 2024-02-14 20:57:17 +01:00
f50db6ae0b Add check for VK_KHR_portability_enumeration for MoltenVK support 0cc4m 2024-02-10 22:14:52 +01:00
d8c054517d Add preprocessor checks for Apple devices. Mathijs de Bruin 2024-02-06 14:39:22 +00:00
42f664a382 Resolve ErrorIncompatibleDriver with Vulkan on MacOS. Mathijs de Bruin 2024-02-03 18:00:11 +00:00
5dde540897 Allow for Vulkan build with Accelerate. Mathijs de Bruin 2024-02-03 17:56:46 +00:00
40c3a6c1e1

cuda : ignore peer access already enabled errors (#5597) slaren 2024-02-19 23:40:26 +01:00
f24ed14ee0

make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598) Jared Van Bortel 2024-02-19 15:54:12 -05:00
9d679f0fcc

examples : support minItems/maxItems in JSON grammar converter (#5039) nopperl 2024-02-19 14:14:07 +00:00
1387cf60f7

llava : remove extra cont (#5587) Georgi Gerganov 2024-02-19 15:23:17 +02:00
6fd413791a llava : replace ggml_cpy with ggml_cont slaren 2024-02-19 14:02:36 +01:00
337c9cbd52 sync : ggml Georgi Gerganov 2024-02-19 14:54:21 +02:00
a3145bdc30 ggml-alloc : apply ggml/731 Georgi Gerganov 2024-02-19 14:53:48 +02:00
890559ab28 metal : option to embed MSL source into compiled binary (whisper/1842) Didzis Gosko 2024-02-11 16:41:41 +02:00
d0e3ce51f4

ci : enable -Werror for CUDA builds (#5579) Georgi Gerganov 2024-02-19 14:45:41 +02:00
68a6b98b3c

make : fix CUDA build (#5580) Georgi Gerganov 2024-02-19 13:41:51 +02:00
70d45af0ef

readme : fix typo in README-sycl.md (#5353) valiray 2024-02-19 02:37:10 -08:00
13e2c771aa

cmake : remove obsolete sycl compile flags (#5581) Abhilash Majumder 2024-02-19 14:45:18 +05:30

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master