llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-03 22:23:13 +08:00

3dfda05956

llama : de-duplicate deepseek2 norm Georgi Gerganov 2024-07-15 14:10:39 +03:00
bda62d7999

Vulkan MMQ Fix (#8479) 0cc4m 2024-07-15 09:38:52 +02:00
090fca7a07

pydantic : replace uses of __annotations__ with get_type_hints (#8474) compilade 2024-07-14 19:51:21 -04:00
aaab2419ea

flake.lock: Update (#8475) Georgi Gerganov 2024-07-14 18:54:02 +03:00
73cf442e7b

llama : fix Gemma-2 Query scaling factors (#8473) Georgi Gerganov 2024-07-14 14:05:09 +03:00
e236528e76

gguf_hash.py: Add sha256 (#8470) Brian 2024-07-14 16:47:14 +10:00
fa79495bb4

llama : fix pre-tokenization of non-special added tokens (#8228) compilade 2024-07-13 23:35:10 -04:00
17eb6aa8a9

vulkan : cmake integration (#8119) bandoti 2024-07-13 13:12:39 -03:00
c917b67f06

metal : template-ify some of the kernels (#8447) Georgi Gerganov 2024-07-13 18:32:33 +03:00
4e24cffd8c

server : handle content array in chat API (#8449) Georgi Gerganov 2024-07-12 14:48:15 +03:00
6af51c0d96

main : print error on empty input (#8456) Georgi Gerganov 2024-07-12 14:48:04 +03:00
f53226245f

llama : suppress unary minus operator warning (#8448) Daniel Bevenius 2024-07-12 11:05:21 +02:00
c3ebcfa148

server : ensure batches are either all embed or all completion (#8420) Douglas Hanley 2024-07-12 03:14:12 -05:00
8a4441ea1a

docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441) Armen Kaleshian 2024-07-12 04:08:19 -04:00
5aefbce27a

convert : remove fsep token from GPTRefactForCausalLM (#8237) Jiří Podivín 2024-07-12 10:06:33 +02:00
71c1121d11

examples : sprintf -> snprintf (#8434) Georgi Gerganov 2024-07-12 10:46:14 +03:00
370b1f7e7a

ggml : minor naming changes (#8433) Georgi Gerganov 2024-07-12 10:46:02 +03:00
b549a1bbef

[SYCL] fix the mul_mat_id ut issues (#8427) Chen Xi 2024-07-12 00:52:04 +00:00
368645698a

ggml : add NVPL BLAS support (#8329) (#8425) Nicholai Tukanov 2024-07-11 11:49:15 -05:00
b078c619aa

cuda : suppress 'noreturn' warn in no_device_code (#8414) Daniel Bevenius 2024-07-11 17:53:42 +02:00
808aba3916

CUDA: optimize and refactor MMQ (#8416) Johannes Gäßler 2024-07-11 16:47:47 +02:00
a977c11544

gitignore : deprecated binaries Georgi Gerganov 2024-07-11 11:20:40 +03:00
9a55ffe6fb

tokenize : add --no-parse-special option (#8423) compilade 2024-07-11 03:41:48 -04:00
7a221b672e

llama : use F32 precision in Qwen2 attention and no FA (#8412) Georgi Gerganov 2024-07-11 10:21:30 +03:00
278d0e1846

Initialize default slot sampling parameters from the global context. (#8418) Clint Herron 2024-07-10 20:08:17 -04:00
dd07a123b7

Name Migration: Build the deprecation-warning 'main' binary every time (#8404) Clint Herron 2024-07-10 12:35:18 -04:00
f4444d992c

[SYCL] Use multi_ptr to clean up deprecated warnings (#8256) AidanBeltonS 2024-07-10 16:10:49 +01:00
6b2a849d1f

ggml : move sgemm sources to llamafile subfolder (#8394) Georgi Gerganov 2024-07-10 15:23:29 +03:00
0f1a39f343

ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) Dibakar Gope 2024-07-10 07:14:51 -05:00
83321c6958

gguf-py rel pipeline (#8410) M. Yusuf Sarıgöz 2024-07-10 15:12:35 +03:00
cc61948b1f

llama : C++20 compatibility for u8 strings (#8408) Borislav Stanimirov 2024-07-10 14:45:44 +03:00
7a80710d93

msvc : silence codecvt c++17 deprecation warnings (#8395) Borislav Stanimirov 2024-07-10 14:40:53 +03:00
a8be1e6f59

llama : add assert about missing llama_encode() call (#8400) fairydreaming 2024-07-10 13:38:58 +02:00
e4dd31ff89

py : fix converter for internlm2 (#8321) RunningLeon 2024-07-10 19:26:40 +08:00
8f0fad42b9

py : fix extra space in convert_hf_to_gguf.py (#8407) laik 2024-07-10 19:19:10 +08:00
a59f8fdc85

Server: Enable setting default sampling parameters via command-line (#8402) Clint Herron 2024-07-09 18:26:40 -04:00
fd560fe680

Update README.md to fix broken link to docs (#8399) Andy Salerno 2024-07-09 11:58:44 -07:00
e500d6135a

Deprecation warning to assist with migration to new binary names (#8283) Clint Herron 2024-07-09 11:54:43 -04:00
a03e8dd99d

make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) Johannes Gäßler 2024-07-09 17:11:07 +02:00
5b0b8d8cfb

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372) Alberto Cabrera Pérez 2024-07-09 15:03:15 +01:00
9925ca4087

cmake : allow external ggml (#8370) Borislav Stanimirov 2024-07-09 11:38:00 +03:00
9beb2dda03

readme : fix typo [no ci] (#8389) daghanerdonmez 2024-07-09 09:16:00 +03:00
7d0e23d72e

gguf-py : do not use internal numpy types (#7472) compilade 2024-07-09 01:04:49 -04:00
7fdb6f73e3

flake.lock: Update (#8342) Georgi Gerganov 2024-07-09 01:36:38 +03:00
a130eccef4

labeler : updated sycl to match docs and code refactor (#8373) Alberto Cabrera Pérez 2024-07-08 21:35:17 +01:00
c4dd11d1d3

readme : fix web link error [no ci] (#8347) b4b4o 2024-07-08 22:19:24 +08:00
2ec846d558

sycl : fix powf call in device code (#8368) Alberto Cabrera Pérez 2024-07-08 14:22:41 +01:00
3f2d538b81

scripts : fix sync for sycl Georgi Gerganov 2024-07-08 13:51:31 +03:00
2ee44c9a18 sync : ggml Georgi Gerganov 2024-07-08 10:39:50 +03:00
6847d54c4f tests : fix whitespace (#0) Georgi Gerganov 2024-07-08 10:39:36 +03:00
fde13b3bb9 feat: cuda implementation for ggml_conv_transpose_1d (ggml/854) John Balis 2024-07-02 11:09:52 -05:00
470939d483

common : preallocate sampling token data vector (#8363) Kevin Wang 2024-07-08 03:26:53 -04:00
6f0dbf6ab0

infill : assert prefix/suffix tokens + remove old space logic (#8351) Georgi Gerganov 2024-07-08 09:34:35 +03:00
ffd00797d8

common : avoid unnecessary logits fetch (#8358) Kevin Wang 2024-07-08 02:31:55 -04:00
04ce3a8b19

readme : add supported glm models (#8360) toyer 2024-07-08 13:57:19 +08:00
3fd62a6b1c

py : type-check all Python scripts with Pyright (#8341) compilade 2024-07-07 15:04:39 -04:00
a8db2a9ce6

Update llama-cli documentation (#8315) Denis Spasyuk 2024-07-07 09:08:28 -06:00
4090ea5501

ci : add checks for cmake,make and ctest in ci/run.sh (#8200) Alex Tuddenham 2024-07-07 15:59:14 +01:00
f1948f1e10

readme : update bindings list (#8222) Andy Tai 2024-07-07 06:21:37 -07:00
f7cab35ef9

gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048) Brian 2024-07-07 22:58:43 +10:00
905942abdb

llama : support glm3 and glm4 (#8031) toyer 2024-07-07 20:52:10 +08:00
b5040086d4

llama : fix n_rot default (#8348) Georgi Gerganov 2024-07-07 14:59:02 +03:00
d39130a398

py : use cpu-only torch in requirements.txt (#8335) compilade 2024-07-07 07:23:38 -04:00
b81ba1f96b

finetune: Rename command name in README.md (#8343) standby24x7 2024-07-07 19:38:02 +09:00
210eb9ed0a

finetune: Rename an old command name in finetune.sh (#8344) standby24x7 2024-07-07 19:37:47 +09:00
cb4d86c4d7

server: Retrieve prompt template in /props (#8337) Bjarke Viksøe 2024-07-07 11:10:38 +02:00
86e7299ef5

added support for Authorization Bearer tokens when downloading model (#8307) Derrick T. Woolworth 2024-07-06 15:32:04 -05:00
60d83a0149

update main readme (#8333) Xuan Son Nguyen 2024-07-06 19:01:23 +02:00
87e25a1d1b

llama : add early return for empty range (#8327) Daniel Bevenius 2024-07-06 09:22:16 +02:00
213701b51a

Detokenizer fixes (#8039) jaime-m-p 2024-07-05 19:01:35 +02:00
be20e7f49d

Reorganize documentation pages (#8325) Xuan Son Nguyen 2024-07-05 18:08:32 +02:00
7ed03b8974

llama : fix compile warning (#8304) Georgi Gerganov 2024-07-05 17:32:09 +03:00
1d894a790e

cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281) Natsu 2024-07-05 22:29:35 +08:00
1f3e1b66e2

Enabled more data types for oneMKL gemm_batch (#8236) Ouadie EL FAROUKI 2024-07-05 13:23:25 +01:00
148ec970b6

convert : remove AWQ remnants (#8320) Georgi Gerganov 2024-07-05 10:15:36 +03:00
2cccbaa008

llama : minor indentation during tensor loading (#8304) Georgi Gerganov 2024-07-05 10:15:24 +03:00
8e558309dc

CUDA: MMQ support for iq4_nl, iq4_xs (#8278) Johannes Gäßler 2024-07-05 09:06:31 +02:00
0a423800ff

CUDA: revert part of the RDNA1 optimizations (#8309) Daniele 2024-07-05 07:06:09 +00:00
d12f781074

llama : streamline embeddings from "non-embedding" models (#8087) Douglas Hanley 2024-07-05 02:05:56 -05:00
bcefa03bc0

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311) Johannes Gäßler 2024-07-05 09:05:34 +02:00
5a7447c569

readme : fix minor typos [no ci] (#8314) Pieter Ouwerkerk 2024-07-05 02:58:41 -04:00
61ecafa390

passkey : add short intro to README.md [no-ci] (#8317) Daniel Bevenius 2024-07-05 08:14:24 +02:00
aa5898dc53

llama : prefer n_ over num_ prefix (#8308) Georgi Gerganov 2024-07-05 09:10:03 +03:00
6c05752c50

contributing : update guidelines (#8316) Georgi Gerganov 2024-07-05 09:09:47 +03:00
a9554e20b6

[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) luoyu-intel 2024-07-05 05:06:13 +00:00
e235b267a2

py : switch to snake_case (#8305) Georgi Gerganov 2024-07-05 07:53:33 +03:00
f09b7cb609

rm get_work_group_size() by local cache for performance (#8286) Neo Zhang Jianyu 2024-07-05 10:32:29 +08:00
a38b884c6c

cli: add EOT when user hit Ctrl+C (#8296) Xuan Son Nguyen 2024-07-04 20:55:03 +02:00
d7fd29fff1

llama : add OpenELM support (#7359) Icecream95 2024-07-05 05:14:21 +12:00
6f63d646c1

tokenize : add --show-count (token) option (#8299) Daniel Bevenius 2024-07-04 18:38:58 +02:00
51d2ebadbb build: Export hf-to-gguf as snakecase ditsuke 2024-07-04 20:54:35 +05:30
1e920018d3 doc: Add context for why we add an explicit pytorch source ditsuke 2024-07-03 01:02:56 +05:30
01a5f06550 chore: Remove rebase artifacts ditsuke 2024-07-02 15:48:13 +05:30
07786a61a2 chore: Fixup requirements and build ditsuke 2024-07-02 15:35:43 +05:30
de14e2ea2b chore: ignore all __pychache__ ditsuke 2024-07-02 15:18:13 +05:30
821922916f fix: Update script paths in CI scripts ditsuke 2024-03-10 23:21:46 +05:30
b1c3f26e5e fix: Actually include scripts in build ditsuke 2024-02-29 01:47:15 +05:30
b0a46993df build(python): Package scripts with pip-0517 compliance ditsuke 2024-02-27 12:01:02 +05:30
807b0c49ff

Inference support for T5 and FLAN-T5 model families (#5763) fairydreaming 2024-07-04 15:46:11 +02:00
f8c4c0738d

tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231) Daniel Bevenius 2024-07-04 12:53:42 +02:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master