Commit Graph

  • b9ec82d262
    grammar : check the full vocab only if necessary (opt) (#4306) kalomaze 2023-12-23 03:27:07 -06:00
  • e0a4002273
    CUDA: fixed row rounding for 0 tensor splits (#4594) Johannes Gäßler 2023-12-23 09:16:33 +01:00
  • 7082d24cec
    lookup : add prompt lookup decoding example (#4484) LeonEricsson 2023-12-22 17:05:56 +01:00
  • ba66175132
    sync : ggml (fix im2col) (#4591) Georgi Gerganov 2023-12-22 17:53:43 +02:00
  • a55876955b
    cuda : fix jetson compile error (#4560) FantasyGmm 2023-12-22 23:11:12 +08:00
  • 6724ef1657
    Fix CudaMemcpy direction (#4599) Henrik Forstén 2023-12-22 15:34:05 +02:00
  • 48b7ff193e
    llama : fix platforms without mmap (#4578) slaren 2023-12-22 12:12:53 +01:00
  • 48b24b170e
    ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203) Herman Semenov 2023-12-22 09:26:49 +00:00
  • 28cb35a0ec
    make : add LLAMA_HIP_UMA option (#4587) Michael Kesper 2023-12-22 09:03:25 +01:00
  • f31b984898
    ci : tag docker image with build number (#4584) rhuddleston 2023-12-21 23:56:34 -07:00
  • 2bb98279c5
    readme : add zig bindings (#4581) Deins 2023-12-22 08:49:54 +02:00
  • 0137ef88ea
    ggml : extend enum ggml_log_level with GGML_LOG_LEVEL_DEBUG (#4579) bobqianic 2023-12-22 06:47:01 +00:00
  • c7e9701f86
    llama : add ability to cancel model loading (#4462) crasm 2023-12-22 01:19:36 -05:00
  • afefa319f1
    ggml : change ggml_scale to take a float instead of tensor (#4573) Georgi Gerganov 2023-12-21 23:20:49 +02:00
  • 769a7bc85e
    gguf-py : fix broken link Georgi Gerganov 2023-12-21 23:20:36 +02:00
  • 32259b2dad
    gguf : simplify example dependencies Georgi Gerganov 2023-12-21 23:07:58 +02:00
  • 4a5f9d629e
    ci : add jlumbroso/free-disk-space to docker workflow (#4150) Samuel Maynard 2023-12-21 22:36:26 +02:00
  • d232aca5a7
    llama : initial ggml-backend integration (#4520) slaren 2023-12-21 21:07:46 +01:00
  • 31f27758fa
    llama : allow getting n_batch from llama_context in c api (#4540) Marcus Dunn 2023-12-21 11:57:48 -08:00
  • 56fa50819f
    metal : fix ggml_metal_log vargs (#4373) Finn Voorhees 2023-12-21 14:55:02 -05:00
  • 0f630fbc92
    cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) Erik Garrison 2023-12-21 13:45:32 -06:00
  • 562cf222b5
    ggml-cuda: Fix HIP build by adding define for __trap (#4569) arlo-phoenix 2023-12-21 20:13:25 +01:00
  • 8fe03ffdda
    common : remove incorrect --model-draft default (#4568) Jared Van Bortel 2023-12-21 12:55:34 -05:00
  • 9154494808
    CUDA: mul_mat_id always on GPU for batches >= 32 (#4553) Johannes Gäßler 2023-12-21 18:42:59 +01:00
  • c083718c89
    readme : update coding guidelines Georgi Gerganov 2023-12-21 19:27:14 +02:00
  • 880e352277
    py : open merges file as 'utf-8' (#4566) howlger 2023-12-21 18:07:34 +01:00
  • 66f35a2f48
    cuda : better error message for ggml_get_rows (#4561) bobqianic 2023-12-21 17:06:44 +00:00
  • 1398823922
    cuda : replace asserts in wrong architecture checks with __trap (#4556) slaren 2023-12-21 18:02:30 +01:00
  • d3223afdad
    llama : disable per-tensor info prints on model load (#4562) Johannes Gäßler 2023-12-21 17:34:17 +01:00
  • 1d7a1912ce
    Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554) LoganDark 2023-12-21 01:59:27 -08:00
  • 799fc22689
    CUDA: Faster Mixtral prompt processing (#4538) Johannes Gäßler 2023-12-20 15:41:22 +01:00
  • 328b83de23
    ggml : fixed check for _MSC_VER (#4535) Eric Sommerlade 2023-12-19 16:17:01 +00:00
  • a7aee47b98
    ggml-cuda: Fix HIP build (#4528) arlo-phoenix 2023-12-18 22:33:45 +01:00
  • 0e18b2e7d0
    llama.swiftui : add tinyllama 1.1B F16 Georgi Gerganov 2023-12-18 20:17:43 +02:00
  • 6ff39b129d
    llama.swiftui : add more models Georgi Gerganov 2023-12-18 20:05:12 +02:00
  • b9e74f9bca
    llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490) Ebey Abraham 2023-12-18 17:27:47 +00:00
  • 3c04bf6da8
    llama : fix try_override for bool_value which always return true (#4519) hankcs 2023-12-18 05:14:58 -08:00
  • 2994f0c5a2
    decode : fix logits_valid for legacy API (#4516) Jared Van Bortel 2023-12-17 19:39:02 -05:00
  • b1306c4394
    readme : update hot topics Georgi Gerganov 2023-12-17 20:16:23 +02:00
  • 800a489e4a
    llama.swiftui : add bench functionality (#4483) Georgi Gerganov 2023-12-17 19:38:41 +02:00
  • f7f468a97d
    gguf-py : fail fast on nonsensical special token IDs (#4489) Jared Van Bortel 2023-12-17 10:45:46 -05:00
  • 919c40660f
    build : Check the ROCm installation location (#4485) Matheus Gabriel Alves Silva 2023-12-17 12:23:33 -03:00
  • 45668633fd
    finetune : keep allocs alive until all allocations are done (#4486) slaren 2023-12-17 16:05:56 +01:00
  • 0ffc92d2d2
    server : disable llm logs if SERVER_VERBOSE is off (#3792) olexiyb 2023-12-17 17:02:16 +02:00
  • 8edd2b40fd
    server : fix grammar being ignored (#4494) AdithyanI 2023-12-17 15:57:56 +01:00
  • eb16dae7e7
    server : fix possible ambiguity in content type charset (#4501) Alexey Parfenov 2023-12-17 14:56:09 +00:00
  • 62bd52b7bf
    server : allow requests larger than 8K (#4500) mzcu 2023-12-17 15:54:37 +01:00
  • 5daa5f54fd
    Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) Bach Le 2023-12-17 18:57:33 +08:00
  • c6c4fc081c
    lora : add support for non-llama models (#3333) slaren 2023-12-16 18:58:46 +01:00
  • 8a5be3bd58
    llama : sanity checks for access to logits (#4274) Jared Van Bortel 2023-12-15 22:16:15 -05:00
  • 88ae8952b6
    server : add optional API Key Authentication example (#4441) ShadovvBeast 2023-12-15 13:49:01 +02:00
  • ee4725a686
    ggml : group mul_mat_id rows by matrix (cpu only) (#4480) slaren 2023-12-15 12:45:50 +01:00
  • 6744dbe924
    ggml : use ggml_row_size where possible (#4472) slaren 2023-12-14 20:05:21 +01:00
  • cafcd4f895
    ggml : remove n_dims from ggml_tensor (#4469) slaren 2023-12-14 16:52:08 +01:00
  • c50e400163
    py : add protobuf dependency (#4466) wonjun Jang 2023-12-14 21:44:49 +09:00
  • 20a68a7030
    ggml : add ggml_row_size() (fixes llama out of space) (#4461) LostRuins 2023-12-14 20:13:33 +08:00
  • 55e87c3749
    ggml : fix OpenCL broadcast requirement for ggml_mul (close #4453) Georgi Gerganov 2023-12-14 10:35:29 +02:00
  • 873637afc7
    convert : support loading vocab from fast tokenizer config (#3633) wonjun Jang 2023-12-14 17:09:34 +09:00
  • 0353a18401
    readme : update supported model list (#4457) BarfingLemurs 2023-12-14 02:38:49 -05:00
  • 948ff137ec
    server : fix handling of characters that span multiple tokens when streaming (#4446) shibe2 2023-12-13 23:57:15 +04:00
  • 4d98d9a656
    sync : ggml (SD ops, tests, kernels) (#4444) Georgi Gerganov 2023-12-13 21:54:54 +02:00
  • 70f806b821
    build : detect host compiler and cuda compiler separately (#4414) Jared Van Bortel 2023-12-13 12:10:10 -05:00
  • 9fb13f9584
    common : add --version option to show build info in CLI (#4433) Siwen Yu 2023-12-13 20:50:14 +08:00
  • 113f9942fc
    readme : update hot topics Georgi Gerganov 2023-12-13 14:05:38 +02:00
  • 799a1cb13b
    llama : add Mixtral support (#4406) slaren 2023-12-13 13:04:25 +01:00
  • fecac45658
    server : tweak default sampling parameters (#4367) kalomaze 2023-12-12 04:12:35 -06:00
  • 9494d7c477
    english : use typos to fix comments and logs (#4354) Richard Kiss 2023-12-12 01:53:36 -08:00
  • 6138963fb2
    build : target Windows 8 for standard mingw-w64 (#4405) Jared Van Bortel 2023-12-12 04:27:26 -05:00
  • 6391817cd1
    llama : document logits_all deprecation (#4418) crasm 2023-12-12 04:25:57 -05:00
  • d9d4cfef64
    server : fix local model name in server (#4420) Vladimir Zorin 2023-12-12 11:25:29 +02:00
  • 41a11aaf99
    ggml : increased GGML_MAX_PARAMS to allow finetuning of 70b models (#4424) Taikono-Himazin 2023-12-12 18:24:32 +09:00
  • 8a7b2fa528
    Update README.md (#4388) Yueh-Po Peng 2023-12-11 06:27:38 +08:00
  • e18f7345a3
    grammar : revert the replacement of llama_token_to_piece with id_to_token (#4396) Xiang (Kevin) Li 2023-12-09 16:29:27 -05:00
  • fe680e3d10
    sync : ggml (new ops, tests, backend, etc.) (#4359) Georgi Gerganov 2023-12-07 22:26:54 +02:00
  • bcc0eb4591
    llama : per-layer KV cache + quantum K cache (#4309) Georgi Gerganov 2023-12-07 13:03:17 +02:00
  • 81bc9214a3
    train : fix #4227 (double free in examples/train-text-from-scratch/train-text-from-scratch.cpp) (#4351) Hongyu Ouyang 2023-12-07 02:25:22 -08:00
  • 05cd6e5036
    server : recognize cache_prompt parameter in OAI API (#4347) Georgi Gerganov 2023-12-06 20:21:59 +02:00
  • caa9249217
    common : fix compile warning Georgi Gerganov 2023-12-06 10:41:03 +02:00
  • da5eaef1f3
    speculative : support --color (#4343) stduhpf 2023-12-06 09:08:17 +01:00
  • 5f6e0c0dff
    grammar : pre-computed pieces + reserve mem + less string copies (#4330) Marcus Dunn 2023-12-05 10:55:12 -10:00
  • 5aa365d88f
    llama : allow overriding GGUF metadata when loading model (#4092) Kerfuffle 2023-12-05 10:19:18 -07:00
  • 52c8bc3cf3
    sampling : custom samplers order (#4285) MaggotHATE 2023-12-05 15:05:51 +05:00
  • e4b76bbe31
    swift : revert compiler checks for swift package (#4332) kchro3 2023-12-04 23:29:46 -08:00
  • 23b5e12eb5
    simple : update error message for KV cache check (#4324) Daniel Bevenius 2023-12-04 17:04:21 +01:00
  • d208995c6d
    swift : fix concatenation method to avoid invalid UTF8 stringfication (#4325) Miwa / Ensan 2023-12-05 01:03:49 +09:00
  • 5c9f90cba1
    swift : fix prompt tokenization logic (#4321) Miwa / Ensan 2023-12-04 22:43:45 +09:00
  • 4fa44e84ad
    grammar-parser : fix typo (#4318) Ikko Eltociear Ashimine 2023-12-04 16:57:35 +09:00
  • fbbc42827b
    ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308) Georgi Gerganov 2023-12-03 15:56:35 +02:00
  • adf3de4f69
    ggml : fix soft max out-of-bounds access (#4307) Georgi Gerganov 2023-12-03 15:56:22 +02:00
  • 33e171d1e9
    server : fix OpenAI API stop field to be optional (#4299) Ed Lee 2023-12-03 01:10:43 -08:00
  • 6949b50df5
    py : add grammar to oai like api (#4294) Rickard Edén 2023-12-03 10:03:25 +01:00
  • d7b800b8bc
    llama : pad KV cache size (#4280) Georgi Gerganov 2023-12-03 10:58:16 +02:00
  • 5a7d3125e7
    llama : avoid using "optional" keyword (#4283) Georgi Gerganov 2023-12-01 20:39:12 +02:00
  • d5a1cbde60
    llama : support optional tensors (#4283) Georgi Gerganov 2023-12-01 20:35:03 +02:00
  • b220222a64
    swift : fix token_to_piece implementation (#4278) Miwa / Ensan 2023-12-02 03:19:45 +09:00
  • 511f52c334
    build : enable libstdc++ assertions for debug builds (#4275) Jared Van Bortel 2023-12-01 13:18:35 -05:00
  • 03562f3a86
    llama : support attention bias on LLaMA architecture (#4283) CausalLM 2023-12-02 02:17:06 +08:00
  • 37c746d687
    llama : add Qwen support (#4281) Shijie 2023-12-02 02:16:31 +08:00
  • 880f57973b
    llama : fix integer overflow during quantization (#4284) Georgi Gerganov 2023-12-01 18:42:11 +02:00
  • 8d6d9f033b
    py : add requirements file for convert-hf-to-gguf.py (#4277) Daniel Bevenius 2023-12-01 10:41:56 +01:00