Commit Graph

  • f53119cec4
    minor : fix trailing whitespace (#5538) Georgi Gerganov 2024-02-19 10:34:10 +02:00
  • 7084755396
    llava : avoid changing the original BakLLaVA model (#5577) Daniel Bevenius 2024-02-19 09:31:59 +01:00
  • 4480542b22
    baby-llama : allocate graphs in ggml_context (#5573) NawafAlansari 2024-02-19 03:25:38 -05:00
  • 11b12de39b
    llama : add llama_chat_apply_template() (#5538) Xuan Son Nguyen 2024-02-19 09:23:37 +01:00
  • 3a9cb4ca64
    cuda, metal : fix nans in soft_max (#5574) slaren 2024-02-19 09:04:45 +01:00
  • 769a716e30
    readme : update (#5572) Mirko185 2024-02-19 08:39:31 +01:00
  • f0d1fafc02
    ggml : android and old glibc NUMA incompatibility bugfixes (#5557) bmwl 2024-02-18 23:38:32 -08:00
  • a0c2dad9d4
    build : pass all warning flags to nvcc via -Xcompiler (#5570) Jared Van Bortel 2024-02-18 16:21:52 -05:00
  • 14278f55d2
    ggml : restore vec dot stride arg names (#5453) Georgi Gerganov 2024-02-18 22:58:57 +02:00
  • b1de96824b
    ci : fix wikitext url + compile warnings (#5569) Georgi Gerganov 2024-02-18 22:39:30 +02:00
  • 7ad554f90e
    metal : fix unused warnings (#0) Georgi Gerganov 2024-02-18 21:39:58 +02:00
  • 5ee99c32f5
    common, server : surface min_keep as its own parameter (#5567) Robey Holderith 2024-02-18 11:11:16 -08:00
  • c145f8a132
    server : slots monitoring endpoint (#5550) Pierrick Hymbert 2024-02-18 18:39:57 +01:00
  • 689a091bbe
    sampling : do not set min_keep to n_probs (#5564) Georgi Gerganov 2024-02-18 19:38:06 +02:00
  • f3f28c5395
    cmake : fix GGML_USE_SYCL typo (#5555) Georgi Gerganov 2024-02-18 19:17:00 +02:00
  • e75c6279d1
    server : enhanced health endpoint (#5548) Pierrick Hymbert 2024-02-18 17:31:28 +01:00
  • 36376abe05
    server : --n-predict option document and cap to max value (#5549) Pierrick Hymbert 2024-02-18 17:30:09 +01:00
  • 66c1968f7a
    server : graceful server shutdown (#5244) Daniel Hiltgen 2024-02-18 08:23:16 -08:00
  • 1dcc3fde00
    common : fix ub (#5530) Georgi Gerganov 2024-02-18 18:21:52 +02:00
  • 5d3de51f97
    ggml, common, examples, tests : fixed type arguments in printf (#5528) Herman Semenov 2024-02-18 16:20:12 +00:00
  • fc0c8d286a
    llava : update surgery script to not remove tensors (#5536) Daniel Bevenius 2024-02-18 17:19:23 +01:00
  • bd2d4e393b
    1.5 bit quantization (#5453) Kawrakow 2024-02-18 18:16:55 +02:00
  • c8e0d7efeb flake.lock: Update github-actions[bot] 2024-02-18 00:17:07 +00:00
  • 8f1be0d42f
    ggml : add ALiBi support for ggml_soft_max_ext (#5488) Georgi Gerganov 2024-02-17 23:04:16 +02:00
  • 6e4e973b26
    ci : add an option to fail on compile warning (#3952) Ananta Bastola 2024-02-17 16:03:14 -05:00
  • d250c9d61d
    gitignore : update for CLion IDE (#5544) clibdev 2024-02-17 18:28:37 +02:00
  • 5bf2b94dd4
    cmake : fix VULKAN and ROCm builds (#5525) Georgi Gerganov 2024-02-16 19:05:56 +02:00
  • d2819d5577
    scripts : add helpers script for bench comparing commits (#5521) Georgi Gerganov 2024-02-16 15:14:40 +02:00
  • 4cb0727698
    llava : removed excess free(NULL) operation (#5531) Herman Semenov 2024-02-16 12:43:23 +00:00
  • 65085c713e
    llama : minor fixed return int value (#5529) Herman Semenov 2024-02-16 11:45:48 +00:00
  • 6dcc02d244
    server : add "samplers" param to control the samplers order (#5494) Alexey Parfenov 2024-02-16 11:33:25 +00:00
  • 5f5808ca7b
    server : fix system prompt cli (#5516) Rőczey Barnabás 2024-02-16 11:00:56 +01:00
  • f486f6e1e5
    ggml : add numa options (#5377) bmwl 2024-02-16 01:31:07 -08:00
  • 60ed04cf82
    llava : fix clip-model-is-vision flag in README.md (#5509) Daniel Bevenius 2024-02-16 10:24:39 +01:00
  • 594845aab1
    ci : fix BERT model download and convert Georgi Gerganov 2024-02-16 09:57:55 +02:00
  • 4524290e87
    Use correct type of pooling for embedding models (#5500) Douglas Hanley 2024-02-15 11:21:49 -06:00
  • c06e45d729
    clip : fix wrong loop condition Georgi Gerganov 2024-02-15 18:49:08 +02:00
  • 9060a1e9df
    cuda : print message when initialization fails (#5512) slaren 2024-02-15 16:49:01 +01:00
  • 9350a1cf21
    scripts : add hf.sh helper script (#5501) Georgi Gerganov 2024-02-15 15:41:15 +02:00
  • 73122473ff
    fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487) Michaël de Vries 2024-02-15 14:14:37 +01:00
  • 0d4177126b
    llava : fix memory management bug (#5491) Elbios 2024-02-15 09:01:57 +01:00
  • 7930a8a6e8
    llaba : hotfix for llava-1.6 image number (#5495) John 2024-02-15 08:59:18 +01:00
  • 704359e299
    vulkan: Find optimal memory type but with fallback (#5381) Neuman Vong 2024-02-15 17:11:15 +11:00
  • 594fca3fef
    readme : fix typo (#5490) Rune 2024-02-14 16:15:49 +01:00
  • ccbb277f46
    llava : update README.md (#5489) John 2024-02-14 15:49:42 +01:00
  • 8084d55440
    cmake : ARM intrinsics detection for MSVC (#5401) Michael Podvitskiy 2024-02-14 11:49:01 +03:00
  • aa23412989
    llava : support v1.6 (#5267) John 2024-02-14 08:38:35 +01:00
  • f5ca054855
    Early return for zero size calls to get_tensor. (#5482) AT 2024-02-13 15:44:25 -06:00
  • 6c00a06692
    gguf : add python reader example (#5216) John 2024-02-13 18:56:38 +01:00
  • ea9c8e1143
    llama : add support for Nomic Embed (#5468) Jared Van Bortel 2024-02-13 12:03:53 -05:00
  • c4e6dd59e4
    llama : allow raw byte in SPM vocabs; don't crash on nl 404 (#5478) Aarni Koskela 2024-02-13 18:18:16 +02:00
  • 037259be68
    llama : make load error reporting more granular (#5477) Aarni Koskela 2024-02-13 15:24:50 +02:00
  • 263978904c
    finetune : rename feed-forward tensors (w1/w2/w3) (#4839) Daniel Bevenius 2024-02-13 14:15:42 +01:00
  • cf45252a7c
    tests : multi-thread the tokenizer tests (#5474) Georgi Gerganov 2024-02-13 15:14:22 +02:00
  • 03bf161eb6
    llama : support batched embeddings (#5466) Douglas Hanley 2024-02-13 06:06:58 -06:00
  • ad014bba97
    make: add error message for bad CUDA version (#5444) Johannes Gäßler 2024-02-13 12:38:37 +01:00
  • 49cc1f7d67
    bert : add tests + fix quantization (#5475) Georgi Gerganov 2024-02-13 13:01:29 +02:00
  • 99b8b43d7b
    tests : disable moe test (#5473) Georgi Gerganov 2024-02-13 11:20:24 +02:00
  • 895407f31b
    ggml-quants : fix compiler warnings (shadow variable) (#5472) Kawrakow 2024-02-13 09:07:57 +02:00
  • 099afc6274
    llama : fix quantization when tensors are missing (#5423) Georgi Gerganov 2024-02-12 20:14:39 +02:00
  • df334a1125
    swift : package no longer use ggml dependency (#5465) Georgi Gerganov 2024-02-12 19:54:29 +02:00
  • dbd8828eb0
    py : fix persimmon n_rot conversion (#5460) Lee 2024-02-13 01:29:57 +08:00
  • 43fe07c1a4
    ggml-sycl: Replace 3d ops with macro (#5458) Abhilash Majumder 2024-02-12 20:22:05 +05:30
  • 4a46d2b792
    llava : remove prog parameter from ArgumentParser (#5457) Daniel Bevenius 2024-02-12 09:38:44 +01:00
  • 3b169441df
    sync : ggml (#5452) Georgi Gerganov 2024-02-12 09:16:06 +02:00
  • 3bdc4cd0f5
    CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434) Johannes Gäßler 2024-02-11 19:08:39 +01:00
  • 2891c8aa9a
    Add support for BERT embedding models (#5423) Douglas Hanley 2024-02-11 10:21:38 -06:00
  • 97a336507e flake.lock: Update github-actions[bot] 2024-02-11 00:17:31 +00:00
  • c88c74f967
    vulkan: only use M-sized matmul on Apple GPUs (#5412) Sergio López 2024-02-11 15:12:00 +01:00
  • a803333a4e
    common : use enums for sampler types (#5418) Alexey Parfenov 2024-02-11 13:43:31 +00:00
  • 684780141a
    server : allow to specify tokens as strings in logit_bias (#5003) Alexey Parfenov 2024-02-11 13:38:14 +00:00
  • 85910c5b30
    main : ctrl+C print timing in non-interactive mode (#3873) Georgi Gerganov 2024-02-11 15:35:50 +02:00
  • 139b62a839
    common : fix compile warning Georgi Gerganov 2024-02-11 15:33:43 +02:00
  • 0f2411f154
    ggml : fix compile warnings (unused vars) (#4966) Georgi Gerganov 2024-02-11 15:33:01 +02:00
  • a07d0fee1f
    ggml : add mmla kernels for quantized GEMM (#4966) snadampal 2024-02-11 07:22:33 -06:00
  • e4640d8fdf
    lookup: add print for drafting performance (#5450) Johannes Gäßler 2024-02-11 12:44:51 +01:00
  • 907e08c110
    server : add llama2 chat template (#5425) Xuan Son Nguyen 2024-02-11 11:16:22 +01:00
  • f026f8120f
    metal : use autoreleasepool to avoid memory leaks (#5437) Ian Bull 2024-02-10 02:53:28 -08:00
  • cd9aea63b5
    scripts : update sync scripts with new backends Georgi Gerganov 2024-02-10 09:53:05 +02:00
  • 43b65f5eb8
    sync : ggml Georgi Gerganov 2024-02-10 09:30:36 +02:00
  • 4633d93af0
    ggml : add abort_callback for cpu backend (ggml/725) Michael Podvitskiy 2024-02-09 10:42:27 +01:00
  • 4b7b38bef5
    vulkan: Set limit for task concurrency (#5427) Neuman Vong 2024-02-10 05:30:19 +11:00
  • e00d2a62dd
    llava : add requirements.txt and update README.md (#5428) Daniel Bevenius 2024-02-09 14:00:59 +01:00
  • 7c777fcd5d
    server : fix prompt caching for repeated prompts (#5420) Riley Stewart 2024-02-09 02:49:49 -08:00
  • e5ca3937c6
    llama : do not cap thread count when MoE on CPU (#5419) Paul Tsochantaris 2024-02-09 10:48:06 +00:00
  • e4124c2477
    readme : add JavaScript/Wasm repo (#5415) Marko Tasic 2024-02-09 11:17:00 +01:00
  • b2f87cb64d
    ggml : fix error C2078: too many initializers for MSVC ARM64 (#5404) Michael Podvitskiy 2024-02-09 10:56:43 +01:00
  • 44fbe34360
    Fix Vulkan crash on APUs with very little device memory (#5424) 0cc4m 2024-02-09 06:52:33 +01:00
  • 8e6a9d2de0
    CUDA: more warps for mmvq on NVIDIA (#5394) Johannes Gäßler 2024-02-08 21:56:40 +01:00
  • 41f308f58e
    llama : do not print "offloading layers" message in CPU-only builds (#5416) slaren 2024-02-08 21:33:03 +01:00
  • 6e99f2a04f
    Fix f16_sycl cpy call from Arc (#5411) Abhilash Majumder 2024-02-08 22:39:10 +05:30
  • ff4ff05c5f
    llava : add missing .py, and fix paths in README.md (#5414) Daniel Bevenius 2024-02-08 15:20:03 +01:00
  • b7b74cef36
    fix trailing whitespace (#5407) Johannes Gäßler 2024-02-08 11:36:54 +01:00
  • 4aa43fab56
    llama : fix MiniCPM (#5392) runfuture 2024-02-08 18:36:19 +08:00
  • a6e514a85f
    llava: fix typo/formatting in README.md (#5405) Daniel Bevenius 2024-02-08 09:58:19 +01:00
  • 26d4efd11e
    sampling: fix top_k <= 0 (#5388) Johannes Gäßler 2024-02-08 09:46:30 +01:00
  • 8504d2d0da
    tests : .gitignore obj files Georgi Gerganov 2024-02-08 09:46:47 +02:00
  • c4fbb6717c
    CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393) Michael Podvitskiy 2024-02-07 22:39:23 +01:00
  • 8c933b70c2
    fix typo in readme (#5399) Ebey Abraham 2024-02-07 21:11:30 +00:00
  • b906596bb7
    Add Ava in the list of llama.cpp UIs (#4362) Kamil Tomšík 2024-02-07 19:44:52 +01:00