llama.cpp

RYDE-WORK/llama.cpp

Fork 0

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-03 22:23:13 +08:00

b9ec82d262

grammar : check the full vocab only if necessary (opt) (#4306) kalomaze 2023-12-23 03:27:07 -06:00
e0a4002273

CUDA: fixed row rounding for 0 tensor splits (#4594) Johannes Gäßler 2023-12-23 09:16:33 +01:00
7082d24cec

lookup : add prompt lookup decoding example (#4484) LeonEricsson 2023-12-22 17:05:56 +01:00
ba66175132

sync : ggml (fix im2col) (#4591) Georgi Gerganov 2023-12-22 17:53:43 +02:00
a55876955b

cuda : fix jetson compile error (#4560) FantasyGmm 2023-12-22 23:11:12 +08:00
6724ef1657

Fix CudaMemcpy direction (#4599) Henrik Forstén 2023-12-22 15:34:05 +02:00
48b7ff193e

llama : fix platforms without mmap (#4578) slaren 2023-12-22 12:12:53 +01:00
48b24b170e

ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203) Herman Semenov 2023-12-22 09:26:49 +00:00
28cb35a0ec

make : add LLAMA_HIP_UMA option (#4587) Michael Kesper 2023-12-22 09:03:25 +01:00
f31b984898

ci : tag docker image with build number (#4584) rhuddleston 2023-12-21 23:56:34 -07:00
2bb98279c5

readme : add zig bindings (#4581) Deins 2023-12-22 08:49:54 +02:00
0137ef88ea

ggml : extend enum ggml_log_level with GGML_LOG_LEVEL_DEBUG (#4579) bobqianic 2023-12-22 06:47:01 +00:00
c7e9701f86

llama : add ability to cancel model loading (#4462) crasm 2023-12-22 01:19:36 -05:00
afefa319f1

ggml : change ggml_scale to take a float instead of tensor (#4573) Georgi Gerganov 2023-12-21 23:20:49 +02:00
769a7bc85e

gguf-py : fix broken link Georgi Gerganov 2023-12-21 23:20:36 +02:00
32259b2dad

gguf : simplify example dependencies Georgi Gerganov 2023-12-21 23:07:58 +02:00
4a5f9d629e

ci : add jlumbroso/free-disk-space to docker workflow (#4150) Samuel Maynard 2023-12-21 22:36:26 +02:00
d232aca5a7

llama : initial ggml-backend integration (#4520) slaren 2023-12-21 21:07:46 +01:00
31f27758fa

llama : allow getting n_batch from llama_context in c api (#4540) Marcus Dunn 2023-12-21 11:57:48 -08:00
56fa50819f

metal : fix ggml_metal_log vargs (#4373) Finn Voorhees 2023-12-21 14:55:02 -05:00
0f630fbc92

cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) Erik Garrison 2023-12-21 13:45:32 -06:00
562cf222b5

ggml-cuda: Fix HIP build by adding define for __trap (#4569) arlo-phoenix 2023-12-21 20:13:25 +01:00
8fe03ffdda

common : remove incorrect --model-draft default (#4568) Jared Van Bortel 2023-12-21 12:55:34 -05:00
9154494808

CUDA: mul_mat_id always on GPU for batches >= 32 (#4553) Johannes Gäßler 2023-12-21 18:42:59 +01:00
c083718c89

readme : update coding guidelines Georgi Gerganov 2023-12-21 19:27:14 +02:00
880e352277

py : open merges file as 'utf-8' (#4566) howlger 2023-12-21 18:07:34 +01:00
66f35a2f48

cuda : better error message for ggml_get_rows (#4561) bobqianic 2023-12-21 17:06:44 +00:00
1398823922

cuda : replace asserts in wrong architecture checks with __trap (#4556) slaren 2023-12-21 18:02:30 +01:00
d3223afdad

llama : disable per-tensor info prints on model load (#4562) Johannes Gäßler 2023-12-21 17:34:17 +01:00
1d7a1912ce

Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554) LoganDark 2023-12-21 01:59:27 -08:00
799fc22689

CUDA: Faster Mixtral prompt processing (#4538) Johannes Gäßler 2023-12-20 15:41:22 +01:00
328b83de23

ggml : fixed check for _MSC_VER (#4535) Eric Sommerlade 2023-12-19 16:17:01 +00:00
a7aee47b98

ggml-cuda: Fix HIP build (#4528) arlo-phoenix 2023-12-18 22:33:45 +01:00
0e18b2e7d0

llama.swiftui : add tinyllama 1.1B F16 Georgi Gerganov 2023-12-18 20:17:43 +02:00
6ff39b129d

llama.swiftui : add more models Georgi Gerganov 2023-12-18 20:05:12 +02:00
b9e74f9bca

llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490) Ebey Abraham 2023-12-18 17:27:47 +00:00
3c04bf6da8

llama : fix try_override for bool_value which always return true (#4519) hankcs 2023-12-18 05:14:58 -08:00
2994f0c5a2

decode : fix logits_valid for legacy API (#4516) Jared Van Bortel 2023-12-17 19:39:02 -05:00
b1306c4394

readme : update hot topics Georgi Gerganov 2023-12-17 20:16:23 +02:00
800a489e4a

llama.swiftui : add bench functionality (#4483) Georgi Gerganov 2023-12-17 19:38:41 +02:00
f7f468a97d

gguf-py : fail fast on nonsensical special token IDs (#4489) Jared Van Bortel 2023-12-17 10:45:46 -05:00
919c40660f

build : Check the ROCm installation location (#4485) Matheus Gabriel Alves Silva 2023-12-17 12:23:33 -03:00
45668633fd

finetune : keep allocs alive until all allocations are done (#4486) slaren 2023-12-17 16:05:56 +01:00
0ffc92d2d2

server : disable llm logs if SERVER_VERBOSE is off (#3792) olexiyb 2023-12-17 17:02:16 +02:00
8edd2b40fd

server : fix grammar being ignored (#4494) AdithyanI 2023-12-17 15:57:56 +01:00
eb16dae7e7

server : fix possible ambiguity in content type charset (#4501) Alexey Parfenov 2023-12-17 14:56:09 +00:00
62bd52b7bf

server : allow requests larger than 8K (#4500) mzcu 2023-12-17 15:54:37 +01:00
5daa5f54fd

Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) Bach Le 2023-12-17 18:57:33 +08:00
c6c4fc081c

lora : add support for non-llama models (#3333) slaren 2023-12-16 18:58:46 +01:00
8a5be3bd58

llama : sanity checks for access to logits (#4274) Jared Van Bortel 2023-12-15 22:16:15 -05:00
88ae8952b6

server : add optional API Key Authentication example (#4441) ShadovvBeast 2023-12-15 13:49:01 +02:00
ee4725a686

ggml : group mul_mat_id rows by matrix (cpu only) (#4480) slaren 2023-12-15 12:45:50 +01:00
6744dbe924

ggml : use ggml_row_size where possible (#4472) slaren 2023-12-14 20:05:21 +01:00
cafcd4f895

ggml : remove n_dims from ggml_tensor (#4469) slaren 2023-12-14 16:52:08 +01:00
c50e400163

py : add protobuf dependency (#4466) wonjun Jang 2023-12-14 21:44:49 +09:00
20a68a7030

ggml : add ggml_row_size() (fixes llama out of space) (#4461) LostRuins 2023-12-14 20:13:33 +08:00
55e87c3749

ggml : fix OpenCL broadcast requirement for ggml_mul (close #4453) Georgi Gerganov 2023-12-14 10:35:29 +02:00
873637afc7

convert : support loading vocab from fast tokenizer config (#3633) wonjun Jang 2023-12-14 17:09:34 +09:00
0353a18401

readme : update supported model list (#4457) BarfingLemurs 2023-12-14 02:38:49 -05:00
948ff137ec

server : fix handling of characters that span multiple tokens when streaming (#4446) shibe2 2023-12-13 23:57:15 +04:00
4d98d9a656

sync : ggml (SD ops, tests, kernels) (#4444) Georgi Gerganov 2023-12-13 21:54:54 +02:00
70f806b821

build : detect host compiler and cuda compiler separately (#4414) Jared Van Bortel 2023-12-13 12:10:10 -05:00
9fb13f9584

common : add --version option to show build info in CLI (#4433) Siwen Yu 2023-12-13 20:50:14 +08:00
113f9942fc

readme : update hot topics Georgi Gerganov 2023-12-13 14:05:38 +02:00
799a1cb13b

llama : add Mixtral support (#4406) slaren 2023-12-13 13:04:25 +01:00
fecac45658

server : tweak default sampling parameters (#4367) kalomaze 2023-12-12 04:12:35 -06:00
9494d7c477

english : use typos to fix comments and logs (#4354) Richard Kiss 2023-12-12 01:53:36 -08:00
6138963fb2

build : target Windows 8 for standard mingw-w64 (#4405) Jared Van Bortel 2023-12-12 04:27:26 -05:00
6391817cd1

llama : document logits_all deprecation (#4418) crasm 2023-12-12 04:25:57 -05:00
d9d4cfef64

server : fix local model name in server (#4420) Vladimir Zorin 2023-12-12 11:25:29 +02:00
41a11aaf99

ggml : increased GGML_MAX_PARAMS to allow finetuning of 70b models (#4424) Taikono-Himazin 2023-12-12 18:24:32 +09:00
8a7b2fa528

Update README.md (#4388) Yueh-Po Peng 2023-12-11 06:27:38 +08:00
e18f7345a3

grammar : revert the replacement of llama_token_to_piece with id_to_token (#4396) Xiang (Kevin) Li 2023-12-09 16:29:27 -05:00
fe680e3d10

sync : ggml (new ops, tests, backend, etc.) (#4359) Georgi Gerganov 2023-12-07 22:26:54 +02:00
bcc0eb4591

llama : per-layer KV cache + quantum K cache (#4309) Georgi Gerganov 2023-12-07 13:03:17 +02:00
81bc9214a3

train : fix #4227 (double free in examples/train-text-from-scratch/train-text-from-scratch.cpp) (#4351) Hongyu Ouyang 2023-12-07 02:25:22 -08:00
05cd6e5036

server : recognize cache_prompt parameter in OAI API (#4347) Georgi Gerganov 2023-12-06 20:21:59 +02:00
caa9249217

common : fix compile warning Georgi Gerganov 2023-12-06 10:41:03 +02:00
da5eaef1f3

speculative : support --color (#4343) stduhpf 2023-12-06 09:08:17 +01:00
5f6e0c0dff

grammar : pre-computed pieces + reserve mem + less string copies (#4330) Marcus Dunn 2023-12-05 10:55:12 -10:00
5aa365d88f

llama : allow overriding GGUF metadata when loading model (#4092) Kerfuffle 2023-12-05 10:19:18 -07:00
52c8bc3cf3

sampling : custom samplers order (#4285) MaggotHATE 2023-12-05 15:05:51 +05:00
e4b76bbe31

swift : revert compiler checks for swift package (#4332) kchro3 2023-12-04 23:29:46 -08:00
23b5e12eb5

simple : update error message for KV cache check (#4324) Daniel Bevenius 2023-12-04 17:04:21 +01:00
d208995c6d

swift : fix concatenation method to avoid invalid UTF8 stringfication (#4325) Miwa / Ensan 2023-12-05 01:03:49 +09:00
5c9f90cba1

swift : fix prompt tokenization logic (#4321) Miwa / Ensan 2023-12-04 22:43:45 +09:00
4fa44e84ad

grammar-parser : fix typo (#4318) Ikko Eltociear Ashimine 2023-12-04 16:57:35 +09:00
fbbc42827b

ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308) Georgi Gerganov 2023-12-03 15:56:35 +02:00
adf3de4f69

ggml : fix soft max out-of-bounds access (#4307) Georgi Gerganov 2023-12-03 15:56:22 +02:00
33e171d1e9

server : fix OpenAI API stop field to be optional (#4299) Ed Lee 2023-12-03 01:10:43 -08:00
6949b50df5

py : add grammar to oai like api (#4294) Rickard Edén 2023-12-03 10:03:25 +01:00
d7b800b8bc

llama : pad KV cache size (#4280) Georgi Gerganov 2023-12-03 10:58:16 +02:00
5a7d3125e7

llama : avoid using "optional" keyword (#4283) Georgi Gerganov 2023-12-01 20:39:12 +02:00
d5a1cbde60

llama : support optional tensors (#4283) Georgi Gerganov 2023-12-01 20:35:03 +02:00
b220222a64

swift : fix token_to_piece implementation (#4278) Miwa / Ensan 2023-12-02 03:19:45 +09:00
511f52c334

build : enable libstdc++ assertions for debug builds (#4275) Jared Van Bortel 2023-12-01 13:18:35 -05:00
03562f3a86

llama : support attention bias on LLaMA architecture (#4283) CausalLM 2023-12-02 02:17:06 +08:00
37c746d687

llama : add Qwen support (#4281) Shijie 2023-12-02 02:16:31 +08:00
880f57973b

llama : fix integer overflow during quantization (#4284) Georgi Gerganov 2023-12-01 18:42:11 +02:00
8d6d9f033b

py : add requirements file for convert-hf-to-gguf.py (#4277) Daniel Bevenius 2023-12-01 10:41:56 +01:00

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master