llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-04 06:33:12 +08:00

902184dd3a

fix missing slash in fs_get_cache_directory() (#7503) Xuan Son Nguyen 2024-05-25 05:30:59 +02:00
57684331fc

Make tokenize CLI tool have nicer command line arguments. (#6188) Mikko Juola 2024-05-24 18:14:42 -07:00
b83bab15a5

gguf-py : fix and simplify quantized shape round-trip (#7483) compilade 2024-05-24 21:11:48 -04:00
d041d2ceaa

flake.lock: Update (#7232) Georgi Gerganov 2024-05-24 18:59:06 +03:00
27891f6db0

docker.yml: disable light-intel and server-intel test (#7515) Brian 2024-05-24 23:47:56 +10:00
fbca2f27fc

Add support for ArcticForCausalLM (#7020) fairydreaming 2024-05-24 14:31:13 +02:00
0df0aa8e43

add build shared lib in win release package (#7438) Neo Zhang 2024-05-24 10:06:56 +08:00
74f33adf5f

readme : remove trailing space (#7469) Georgi Gerganov 2024-05-23 17:43:18 +03:00
1debe72737

ggml : silence UB sanitizer error during iq2_xxs quantization (#0) Georgi Gerganov 2024-05-23 17:17:43 +03:00
007489e895

Fix phi3 chat template confusion with zephyr (#7449) Tristan Druyen 2024-05-23 16:15:15 +02:00
8b94e799df

readme : add Bunny in supported models [no ci] (#7469) Raj Hammeer Singh Hada 2024-05-23 18:00:13 +05:30
3015851c5a

llama : add getters for n_threads/n_threads_batch (#7464) Daniel Bevenius 2024-05-23 14:29:26 +02:00
55ac3b7aea

ci : use Pythia models instead of OpenLlama (#7470) Georgi Gerganov 2024-05-23 15:28:14 +03:00
dacfcebd60

readme : add GPT-NeoX + Pythia to the list of supported models (#7491) Victor Nogueira 2024-05-23 15:12:43 +03:00
9b82476ee9

Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461) fairydreaming 2024-05-23 11:49:53 +02:00
a61a94e543

llama : rename n_ctx -> cache.size, less confusing (#0) Georgi Gerganov 2024-05-23 12:38:18 +03:00
152da28ae5

labeler.yml: add embedding label detector [no ci] (#7482) Brian 2024-05-23 17:40:43 +10:00
d48c88cbd5

ggml : remove ggml_flash_attn and ggml_flash_ff (#7463) Georgi Gerganov 2024-05-23 10:00:44 +03:00
e84b71c2c6

ggml : drop support for QK_K=64 (#7473) Georgi Gerganov 2024-05-23 10:00:21 +03:00
1b1e27cb49

Update vulkan rope implementation to support frequency factors (#7475) 0cc4m 2024-05-23 08:59:59 +02:00
fbf777d2b9

main : minor (#7462) Georgi Gerganov 2024-05-23 09:43:24 +03:00
cd93a28cb1

CUDA: fix FA out-of-bounds reads (#7479) Johannes Gäßler 2024-05-23 00:31:20 +02:00
1e374365d1

SimpleChat: a simple and dumb web front end for testing /chat/completions and /completions end points and try chat (#7350) HanishKVC 2024-05-22 23:23:21 +05:30
197ff91462

build : remove zig (#7471) Georgi Gerganov 2024-05-22 20:05:38 +03:00
6ff13987ad

common : normalize naming style (#7462) Georgi Gerganov 2024-05-22 20:04:20 +03:00
38c03478a3

CUDA: fix FA out-of-bounds writes (#7465) Johannes Gäßler 2024-05-22 17:58:25 +02:00
b18532a4ef

phi3 : duplicate rope factors in each layer (#7447) slaren 2024-05-22 16:10:46 +02:00
fcda1128bc

vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) k.h.lai 2024-05-22 20:53:21 +08:00
03d8900ebe

llama : add missing model type names (#7445) Justine Tunney 2024-05-22 07:08:18 -04:00
9b3d833189

cuda : fix compile warning (#7454) Georgi Gerganov 2024-05-22 12:36:37 +03:00
95fb0aefab

CUDA: remove incorrect precision check (#7454) Johannes Gäßler 2024-05-22 10:24:29 +02:00
3e5faa8503

cuda : fix rope + add tests (#7452) Georgi Gerganov 2024-05-22 11:01:35 +03:00
201cc11afa

llama : add phi3 128K model support (#7225) liuwei-git 2024-05-22 04:28:32 +08:00
6369bf0433

metal : handle F16 inf values, fix FA partial offload (#7434) Georgi Gerganov 2024-05-21 23:03:42 +03:00
e402de364b

grammars: fix resampling logic regression (#7424) Olivier Chafik 2024-05-21 20:40:00 +01:00
fcf6538ba6

CUDA: fix unused warning in mmq.cu (#7442) Johannes Gäßler 2024-05-21 19:27:12 +02:00
c3f8d58356

tests : test-tokenizer-0.sh print more info (#7402) Georgi Gerganov 2024-05-21 19:53:48 +03:00
11474e756d

examples: cache hf model when --model not provided (#7353) Amir 2024-05-21 17:13:12 +03:00
d8ee902227

CUDA: deduplicate mmq code (#7397) Johannes Gäßler 2024-05-21 16:02:12 +02:00
d7e852c1bc

Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425) jaime-m-p 2024-05-21 14:39:48 +02:00
917dc8cfa6

Tokenizer SPM fixes for phi-3 and llama-spm (#7375) jaime-m-p 2024-05-20 20:15:57 +02:00
fabf30b4c4

llama : remove Persimmon (#7408) Georgi Gerganov 2024-05-20 19:35:28 +03:00
20385cebcc

perplexity: update README FP16 results [no ci] (#7413) Johannes Gäßler 2024-05-20 18:15:38 +02:00
db10f01310

rpc : track allocated buffers (#7411) Radoslav Gerganov 2024-05-20 16:36:55 +03:00
3bc10cb485

server : fix temperature + disable some tests (#7409) Georgi Gerganov 2024-05-20 15:10:03 +03:00
6bf9b66fa3

[SYCL] Update SYCL upscale operation (#7321) AidanBeltonS 2024-05-20 12:08:23 +01:00
26cd4237bc

Update README.md (#7410) Bingan 2024-05-20 17:55:34 +08:00
213e90ed73

ggml-opencl, llama: using reserve() if count already known (#7272) Herman Semenov 2024-05-20 07:33:21 +00:00
65c58207ec

ggml : add loongarch lsx and lasx support (#6454) junchao-loongson 2024-05-20 15:19:21 +08:00
1cc0155d04

server : tuning tests (#7388) Georgi Gerganov 2024-05-20 10:16:41 +03:00
e932094d58

server : return error on too large embedding input (#7389) Georgi Gerganov 2024-05-20 08:56:05 +03:00
2789baf480

tests : fix --keep_split -> --keep-split (#7374) Georgi Gerganov 2024-05-20 08:55:09 +03:00
33c8d50acc

Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) Srihari-mcw 2024-05-19 19:18:39 -07:00
d359f30921

llama : remove MPI backend (#7395) slaren 2024-05-20 01:17:03 +02:00
1ea2a0036e

quantize : fix --keep-split check (#7374) Fred Douglas 2024-05-19 11:37:04 -05:00
f030ec1f7a

Vulkan Embedding Fix (#7360) 0cc4m 2024-05-19 17:19:53 +02:00
e4e6f67be6

ggml : fix another case of quants nans (#7387) slaren 2024-05-19 17:08:46 +02:00
5ca49cbecd

ggml: implement quantized KV cache for FA (#7372) Johannes Gäßler 2024-05-19 16:46:13 +02:00
1b01f06db0

server: add test for token probs (#7347) Johannes Gäßler 2024-05-19 16:26:02 +02:00
41858392e1

server: fix seed being reported back (#7382) Johannes Gäßler 2024-05-19 16:06:33 +02:00
6aade19ee7

Add StableLM2 pre-tokenizer (#7349) Anas Ahouzi 2024-05-19 14:46:46 +02:00
ab33f7a338

cuda : clear error after buffer allocation failure (#7376) slaren 2024-05-19 14:19:37 +02:00
e23b974f4c

labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363) Brian 2024-05-19 20:51:03 +10:00
854d365aba

cmake : update android comments (#7341) Georgi Gerganov 2024-05-19 11:01:01 +03:00
f5bf761747

Capture CUDA logging output (#7298) fraxy-v 2024-05-19 01:44:42 +03:00
059031b8c4

ci : re-enable sanitizer runs (#7358) Georgi Gerganov 2024-05-18 18:55:54 +03:00
511182eabb

android : use "ci-android" branch for CI (#7341) Georgi Gerganov 2024-05-18 13:40:39 +03:00
133d99c599

CUDA: deduplicate FlashAttention code (#7352) Johannes Gäßler 2024-05-18 12:36:25 +02:00
cb42c29427

server: correct --threads documentation [no ci] (#7362) Johannes Gäßler 2024-05-18 11:10:47 +02:00
d233b507cd

cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263) Engininja2 2024-05-18 02:05:17 -06:00
0f98acfac6

llama : add support for larger Granite Code Models (20B, 34B) (#7324) Steffen Röcker 2024-05-18 10:04:55 +02:00
ca57e0f35e

perplexity : ndot progress and show stats with < 100 tasks (#7348) strawberrymelonpanda 2024-05-18 00:57:08 -07:00
c1b295eea5

Update and fix Vulkan soft_max and argsort implementations (#7237) 0cc4m 2024-05-18 08:10:58 +02:00
de73196344

github-actions-labeler: initial commit (#7330) Brian 2024-05-18 16:04:23 +10:00
b49a13dd2f

convert : fix set_vocab_sentencepiece (#6866) Georgi Gerganov 2024-05-18 08:46:20 +03:00
05834841dc

ggml : fix quants nans when all the group weights are very close to zero (#7313) slaren 2024-05-18 02:39:54 +02:00
ef277de2ad

cmake : fix typo in AMDGPU_TARGETS (#7356) Engininja2 2024-05-17 18:39:25 -06:00
b43272afa2

Unicode codepoint flags for custom regexs (#7245) jaime-m-p 2024-05-18 01:09:13 +02:00
0fc1e820a9

CUDA: faster large batch FA without tensor cores (#7314) Johannes Gäßler 2024-05-17 18:54:52 +02:00
82ca83db3c

ROCm: use native CMake HIP support (#5966) Gavin Zhao 2024-05-17 11:03:03 -04:00
f4bd8b3d26

rpc : set SO_REUSEADDR for the server socket (#7320) Radoslav Gerganov 2024-05-17 17:25:44 +03:00
51e9d02599

Added a single test function script and fix debug-test.sh to be more robust (#7279) Brian 2024-05-17 22:40:14 +10:00
d273c1402b

py : convert-hf-to-gguf-update improvements (#7340) Aarni Koskela 2024-05-17 15:11:45 +03:00
27b040691c

llama : use n_embd_head_v when reshaping kqv (#7327) fairydreaming 2024-05-17 13:24:38 +02:00
29c60d8cdd

tokenization: add warning for double BOS (#7332) Johannes Gäßler 2024-05-17 09:59:57 +02:00
359cbe3f46

ggml-quants, llama : removed excess checks (#7274) Herman Semenov 2024-05-17 07:08:49 +00:00
e18bc6aaf3

convert : fix Qwen/Qwen-7b conversion (#7308) amd-lalithnc 2024-05-17 12:31:58 +05:30
ee94172d33

server : add support for the RPC backend (#7305) Radoslav Gerganov 2024-05-17 10:00:17 +03:00
934266c0e0

ggml : rewrite silu and softmax for cpu (#7154) Justine Tunney 2024-05-17 02:58:52 -04:00
9c4fdcbec8

[Server] Added --verbose option to README [no ci] (#7335) Leon Knauer 2024-05-17 02:11:03 +02:00
24ecb58168

Revert "server bench: fix bench not waiting for model load (#7284)" (#7334) Pierrick Hymbert 2024-05-16 20:43:45 +02:00
9afdffe70e rpc : get available mem for the CPU backend Radoslav Gerganov 2024-05-15 16:04:40 +03:00
3b3963c55c rpc : add command line arg for specifying backend memory Radoslav Gerganov 2024-05-15 15:29:07 +03:00
dda64fc17c

convert : get general.name from model dir, not its parent (#5615) Jared Van Bortel 2024-05-16 02:15:23 -04:00
0350f58152

grammar, json, llama: replace push on emplace if it possible (#7273) Herman Semenov 2024-05-16 06:14:24 +00:00
ad52d5c259

doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288) Vaibhav Srivastav 2024-05-16 07:38:43 +02:00
172b78210a

ci: fix bin/Release path for windows-arm64 builds (#7317) Max Krasnyansky 2024-05-15 22:36:43 -07:00
13ad16af12

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191) Max Krasnyansky 2024-05-15 19:47:36 -07:00
8f7080bf48

readme : remove stray double quote (#7310) Daniel Bevenius 2024-05-15 23:41:03 +02:00
e1b40ac3b9

ggml : use dynamic thread scheduling for matrix multiplication (#6915) kunnis 2024-05-15 12:59:12 -05:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master