Commit Graph

  • ef47ec18da
    ggml : add ggml_soft_max_ext (#4256) Georgi Gerganov 2023-12-01 10:51:24 +02:00
  • 1d144112c0
    server : add --log-disable to disable logging to file (#4260) Ziad Ben Hadj-Alouane 2023-11-30 17:25:49 -05:00
  • f43f09366d
    server : add single-client multi-prompt support (#4232) Ziad Ben Hadj-Alouane 2023-11-30 17:25:04 -05:00
  • d2809a3ba2
    make : fix Apple clang determination bug (#4272) WillCorticesAI 2023-11-30 17:23:44 -05:00
  • 15f5d96037
    build : fix build info generation and cleanup Makefile (#3920) Jared Van Bortel 2023-11-30 17:23:08 -05:00
  • 33c9892af5
    llava : ShareGPT4V compatibility (vision encoder only loading) (#4172) John 2023-11-30 23:11:14 +01:00
  • 8efa0f6ebe
    main : pass LOG_TEE callback to llama.cpp log (#4033) Andrew Godfrey 2023-11-30 13:56:19 -08:00
  • 524907aa76
    readme : fix (#4135) vodkaslime 2023-12-01 05:49:21 +08:00
  • 3bd2c7ce1b
    docker : add finetune option (#4211) Juraj Bednar 2023-11-30 22:46:01 +01:00
  • bde629bb53
    batched.swift : update README.md (#4214) Miwa / Ensan 2023-12-01 06:45:17 +09:00
  • f7f9e06212
    cmake : fix the metal file foder path (#4217) Li Tan 2023-11-30 13:44:11 -08:00
  • 74daabae69
    readme : fix typo (#4253) Dawid Wysocki 2023-11-30 22:43:32 +01:00
  • b18c66ca6e
    llama : fix alignment of general.name in print meta (#4254) Daniel Bevenius 2023-11-30 22:43:08 +01:00
  • f4d973cecb
    convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258) slaren 2023-11-30 22:42:23 +01:00
  • 954e22858c
    llama : fix typical sampling (#4261) tarcey 2023-11-30 22:40:23 +01:00
  • e2bd725f4b
    py : fix oai proxy (#3972) rhjdvsgsgks 2023-11-30 20:50:40 +00:00
  • 1f5cd83275
    examples : add readme files Georgi Gerganov 2023-11-29 11:00:17 +02:00
  • 4fea3420ee
    readme : add FreeChat (#4248) Peter Sugihara 2023-11-28 23:16:34 -08:00
  • 64e64aa255
    ggml : restore abort() in GGML_ASSERT (#4242) Jared Van Bortel 2023-11-28 04:51:11 -05:00
  • 8406b0924b
    ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offload checks in llama.cpp (#4240) Georgi Gerganov 2023-11-28 10:32:03 +02:00
  • b38a16dfcf
    cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970) bandoti 2023-11-27 15:25:42 -04:00
  • 0dab8cd7cc
    readme : add Amica to UI list (#4230) Kasumi 2023-11-28 01:39:42 +08:00
  • bb03290c17
    examples : iOS example with swift ui (#4159) Bailey Chittle 2023-11-27 09:56:52 -05:00
  • f3b269813f
    ggml : fix -Warray-bounds warning with gcc (#4231) Jared Van Bortel 2023-11-26 22:58:43 -05:00
  • 3e73d31d9c
    lookahead : support -n -1 infinite generation Georgi Gerganov 2023-11-26 21:51:46 +02:00
  • 9656026b53
    readme : update hot topics Georgi Gerganov 2023-11-26 20:42:51 +02:00
  • 922754a8d6
    lookahead : add example for lookahead decoding (#4207) Georgi Gerganov 2023-11-26 20:33:07 +02:00
  • 22da05536f
    metal : fix yarn (#4220) Xiao-Yong Jin 2023-11-26 02:30:02 -06:00
  • 1ddb52ec38
    scripts : Use mmap in torch load (#4202) Galunid 2023-11-25 22:45:02 +01:00
  • f837c3a992
    llama : grammar reserve space in decode_utf8 (#4210) Marcus Dunn 2023-11-25 08:58:23 -08:00
  • 3014b5415d
    Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189) crasm 2023-11-25 10:47:07 -05:00
  • 04814e718e
    readme : update hot topics Georgi Gerganov 2023-11-25 12:02:13 +02:00
  • af19d35734
    server : OAI API compatibility (#4198) Georgi Gerganov 2023-11-25 11:29:06 +02:00
  • e9c13ff781
    llama : set metal log callback correctly (#4204) slaren 2023-11-24 18:10:01 +01:00
  • 8a052c131e
    ggml-cuda : support stablelm rope (#4156) slaren 2023-11-24 18:04:31 +01:00
  • 189d68446e
    convert : fix tensors using grad in some models (#4173) Galunid 2023-11-24 15:02:49 +01:00
  • 2568a4bf54
    main.swift : fix eos checking (#4197) eastriver 2023-11-24 18:25:10 +09:00
  • b35f3d0def
    readme : use PATH for Windows ROCm (#4195) Aaryaman Vasishta 2023-11-24 16:52:39 +09:00
  • 55978ce09b
    Fix incorrect format strings and uninitialized variables. (#4133) Haohui Mai 2023-11-23 13:56:53 -08:00
  • 6b0a7420d0
    llama : KV cache view API + better KV cache management (#4170) Georgi Gerganov 2023-11-23 19:07:56 +02:00
  • d103d935c0
    readme : update hot topics Georgi Gerganov 2023-11-23 13:51:22 +02:00
  • 9d5949f04b
    examples : fix typo in parallel example doc comment (#4181) Daniel Bevenius 2023-11-23 12:34:20 +01:00
  • ff8238f71d
    docs : add llama-star arch idea Georgi Gerganov 2023-11-23 11:35:04 +02:00
  • 8e672efe63
    stablelm : simplify + speedup generation (#4153) Galunid 2023-11-21 16:22:30 +01:00
  • 0b871f1a04
    finetune - update readme to mention llama support only (#4148) Galunid 2023-11-20 19:30:00 +01:00
  • dfc7cd48b1
    readme : update ROCm Windows instructions (#4122) Aaryaman Vasishta 2023-11-21 00:02:46 +09:00
  • 881800d1f0
    main : Add ChatML functionality to main example (#4046) Seb C 2023-11-21 00:26:59 +10:30
  • f23c0359a3
    ci : add flake8 to github actions (python linting) (#4129) Galunid 2023-11-20 11:35:47 +01:00
  • 40a34fe8d0
    speculative : fix prompt tokenization in speculative example (#4025) Branden Butler 2023-11-20 03:50:04 -06:00
  • dae06c06e5
    Revert "finetune : add --n-gpu-layers flag info to --help (#4128)" Georgi Gerganov 2023-11-19 19:16:07 +02:00
  • 05e8301e45
    finetune : add --n-gpu-layers flag info to --help (#4128) Clark Saben 2023-11-19 11:56:38 -05:00
  • 936c79b227
    server : relay error messages (#4131) SoftwareRenderer 2023-11-19 11:54:10 -05:00
  • 262005ad9d
    common : comma should be semicolon (#4137) kchro3 2023-11-19 08:52:57 -08:00
  • 35985acffa
    gitignore : tokenize Georgi Gerganov 2023-11-19 18:50:49 +02:00
  • e937066420
    gguf-py : export chat templates (#4125) slaren 2023-11-19 11:10:52 +01:00
  • 28a2e6e7d4
    tokenize example: Respect normal add BOS token behavior (#4126) Kerfuffle 2023-11-18 14:48:17 -07:00
  • 0b5c3b0457
    scripts : Remove missed baichuan convert script (#4127) Galunid 2023-11-18 21:08:33 +01:00
  • 2923f17f6f
    Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124) Kerfuffle 2023-11-18 08:11:18 -07:00
  • bbecf3f415
    llama : increase max nodes (#4115) slaren 2023-11-17 20:39:11 +01:00
  • 8e9361089d
    build : support ppc64le build for make and CMake (#3963) Roger Meier 2023-11-17 17:11:23 +01:00
  • 5ad387e994
    tokenize : fix trailing whitespace Georgi Gerganov 2023-11-17 18:01:38 +02:00
  • 2fa02b4b3d
    examples : add tokenize (#4039) zakkor 2023-11-17 17:36:44 +02:00
  • 2ab0707acb
    convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089) Don Mahurin 2023-11-17 07:32:34 -08:00
  • 11173c92d6
    py : Falcon HF compatibility (#4104) John 2023-11-17 16:24:30 +01:00
  • 9e87ef60e1
    common : improve yaml log escaping (#4080) Jannis Schönleber 2023-11-17 16:24:07 +01:00
  • c7cce1246e
    llava : fix compilation warning that fread return value is not used (#4069) Huawei Lin 2023-11-17 10:22:56 -05:00
  • f7d5e97542
    py : remove superfluous import statements (#4076) Jiří Podivín 2023-11-17 16:20:53 +01:00
  • ba4cf5c0bf
    train : move number of gpu layers argument parsing to common/train.cpp (#4074) Jiří Podivín 2023-11-17 16:19:16 +01:00
  • e85bb1a8e7
    llama : add functions to get the model's metadata (#4013) slaren 2023-11-17 16:17:37 +01:00
  • 3e916a07ac
    finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079) gwjr 2023-11-17 14:48:19 +00:00
  • 947f64f163
    finetune : zero the loraB initial vectors (#4082) Andrew Godfrey 2023-11-17 02:23:11 -08:00
  • b83e149ec6
    cuda : get_row_rounding F32 (#4095) Andrew Godfrey 2023-11-17 00:01:15 -08:00
  • 4f447a4833
    llama : fix data units (#4101) Georgi Gerganov 2023-11-17 10:00:15 +02:00
  • 91f6499393
    Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) Kerfuffle 2023-11-16 19:14:37 -07:00
  • 8da46278e1
    gguf : fix potential infinite loops while parsing (#4100) texmex76 2023-11-16 16:01:48 +01:00
  • a6fc554e26
    llama : restore prefix space in llama tokenizer (#4081) Jared Van Bortel 2023-11-15 11:34:47 -05:00
  • 1cf2850d52
    ggml-cuda : increase max graph size (#4084) slaren 2023-11-15 13:58:13 +01:00
  • 6bb4908a17
    Fix MacOS Sonoma model quantization (#4052) Michael Potter 2023-11-14 09:34:41 -08:00
  • 36eed0c42c
    stablelm : StableLM support (#3586) Galunid 2023-11-14 11:17:12 +01:00
  • b46d12f86d
    convert.py: also look for plain model.safetensors (#4043) afrideva 2023-11-13 17:03:40 -08:00
  • bd90eca237
    llava : fix regression for square images in #3613 (#4056) M. Yusuf Sarıgöz 2023-11-13 18:20:52 +03:00
  • 3d68f364f1
    ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060) Georgi Gerganov 2023-11-13 16:55:52 +02:00
  • c049b37d7b
    readme : update hot topics Georgi Gerganov 2023-11-13 14:18:08 +02:00
  • 4760e7cc0b
    sync : ggml (backend v2) (#3912) Georgi Gerganov 2023-11-13 14:16:23 +02:00
  • bb50a792ec
    Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041) Kerfuffle 2023-11-13 01:58:15 -07:00
  • 21fd874c8d
    gguf-py: gguf_writer: Use bytearray to build metadata (#4051) Kerfuffle 2023-11-12 16:39:37 -07:00
  • 532dd74e38
    Fix some documentation typos/grammar mistakes (#4032) Richard Kiss 2023-11-11 22:04:58 -08:00
  • e86fc56f75
    Fix gguf-convert-endian script (#4037) M. Yusuf Sarıgöz 2023-11-11 18:35:31 +03:00
  • d96ca7ded7
    server : fix crash when prompt exceeds context size (#3996) Alexey Parfenov 2023-11-11 05:48:21 +00:00
  • 34b0a08207
    gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) Kerfuffle 2023-11-10 22:04:50 -07:00
  • 4a4fd3eefa
    server : allow continue edit on completion mode (#3950) Jhen-Jie Hong 2023-11-11 06:49:33 +08:00
  • df9d1293de
    Unbreak persimmon after #3837 (#4010) Galunid 2023-11-10 14:24:54 +01:00
  • a75fa576ab
    scripts: Generalize convert scripts (#3838) Galunid 2023-11-09 11:09:29 +01:00
  • 57ad015dc3
    server : add min_p param (#3877) Mihai 2023-11-09 04:00:34 +02:00
  • 875fb42871
    ggml-alloc : fix backend assignments of views (#3982) slaren 2023-11-08 13:15:14 +01:00
  • 0a7c980b6f
    gguf : track writer state, free unneeded tensors, cleanup (#3871) Jared Van Bortel 2023-11-07 12:43:04 -05:00
  • 413503d4b9
    make : do not add linker flags when compiling static llava lib (#3977) Georgi Gerganov 2023-11-07 19:25:32 +02:00
  • e9c1cecb9d
    ggml : fix backward rope after YaRN (#3974) xaedes 2023-11-07 09:04:51 +01:00
  • 54b4df8886
    Use params when loading models in llava-cli (#3976) Matthew Tejo 2023-11-06 23:43:59 -08:00
  • 46876d2a2c
    cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) Meng Zhang 2023-11-06 22:49:08 -08:00