Commit Graph

  • aa7ab99be2
    CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) Johannes Gäßler 2024-02-07 12:40:26 +01:00
  • 10afa6f1d1
    [SYCL] update install make by w64devkit (#5297) Neo Zhang Jianyu 2024-02-07 18:16:55 +08:00
  • 0ef46da632
    llava-cli : always tokenize special tokens (#5382) Xiao-Yong Jin 2024-02-07 02:17:25 -06:00
  • ee1628bdfe
    Basic Vulkan Multi-GPU implementation (#5321) 0cc4m 2024-02-07 07:54:50 +01:00
  • ed0bf32290
    readme : modernize (#5379) Eve 2024-02-07 06:21:30 +00:00
  • 9a697d842b
    readme : update ui list (#5354) Ben Williams 2024-02-06 22:16:48 -08:00
  • 316c7faf77
    llama : add MiniCPM support (#5346) runfuture 2024-02-07 14:15:56 +08:00
  • f3e2b4fa3f
    server : update /props with "total_slots" value (#5373) Justin Parker 2024-02-07 01:15:19 -05:00
  • f68664ac24
    convert : fix TypeError on GPT-2 vocab.json (#5288) Sang-Kil Park 2024-02-07 13:28:00 +09:00
  • 213d1439fa
    server : remove model.json endpoint (#5371) Alexey Parfenov 2024-02-06 18:08:38 +00:00
  • 17c97fb062
    CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) Johannes Gäßler 2024-02-06 18:43:06 +01:00
  • b08f22c882
    Update README.md (#5366) Kawrakow 2024-02-06 19:00:16 +02:00
  • f57fadc009
    Slight quantization improvement for Q4_K and Q5_K (#5361) Kawrakow 2024-02-06 17:28:02 +02:00
  • 2e9c0bd6b3
    readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) BarfingLemurs 2024-02-06 09:06:48 -05:00
  • 2c516611f1
    CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) Johannes Gäßler 2024-02-06 14:44:06 +01:00
  • 8a79c591de
    server : include total "num_slots" in props endpoint (#5349) Justin Parker 2024-02-06 04:20:59 -05:00
  • 31e7903221
    server : add dynatemp_range and dynatemp_exponent (#5352) Michael Coppola 2024-02-06 04:20:00 -05:00
  • 4ffc7a17d4
    server : various fixes for the prompt field in /completion (#5300) Niall Coates 2024-02-06 08:16:23 +00:00
  • 906cff55c2
    py : handle byte tokens in get_token_type (#5341) Georgi Gerganov 2024-02-06 07:47:22 +02:00
  • 098f6d737b
    make: Use ccache for faster compilation (#5318) Johannes Gäßler 2024-02-05 19:33:00 +01:00
  • 78b00dda6c
    README: updated introduction (#5343) Johannes Gäßler 2024-02-05 15:55:10 +01:00
  • c6b395535a
    ggml : make use of ggml-quants.h possible in C++ code (#5338) Kawrakow 2024-02-05 14:09:47 +02:00
  • abb61944a5
    ggml : avoid duplicating function calls using MIN/MAX macros (#5325) Dr. Tom Murphy VII Ph.D 2024-02-05 06:13:57 -05:00
  • 89503dcb5f
    iq3_xxs: quards for the no-imatrix situation (#5334) Kawrakow 2024-02-05 12:32:27 +02:00
  • 7e1ae372f3
    py : fix internlm2-hf convert to gguf (#5305) Guoteng 2024-02-05 17:04:06 +08:00
  • 6fdfa2ecc6
    iq2_xxs: tune quantization (#5320) Kawrakow 2024-02-05 10:46:06 +02:00
  • a2d60c9158
    server : allow to get default generation settings for completion (#5307) Alexey Parfenov 2024-02-05 08:10:22 +00:00
  • e6f8177532
    common : add dynamic temperature parameters to main example cli (#5295) l3utterfly 2024-02-05 17:00:47 +09:00
  • 30679d438d
    scripts : fix typos, cleanup (#5303) Georgi Gerganov 2024-02-05 09:48:03 +02:00
  • 4be04c8965
    scripts : add non-interactive server-llm.sh (#5303) Нияз Гарифзянов 2024-02-05 10:43:57 +03:00
  • 5d55b0cd82
    readme : add CodeShell models to the supported models list (#5330) chiranko 2024-02-05 15:41:38 +08:00
  • 4833ac209d
    [SYCL] Fix cpy with dims of 3 (#5289) AidanBeltonS 2024-02-05 07:08:24 +00:00
  • 9392ebd49e flake.lock: Update github-actions[bot] 2024-02-04 00:17:24 +00:00
  • 5ed26e1fc9
    Adding some imatrix tools (#5302) Kawrakow 2024-02-04 10:39:58 +02:00
  • 277fad30c6
    cmake : use set() for LLAMA_WIN_VER (#5298) Welby Seely 2024-02-03 23:18:51 -05:00
  • 3c0d25c475
    make: add nvcc info print (#5310) Johannes Gäßler 2024-02-03 20:15:13 +01:00
  • 3cc5ed353c
    make: fix nvcc optimization flags for host code (#5309) Johannes Gäßler 2024-02-03 20:14:59 +01:00
  • 60ecf099ed add Vulkan support to Nix flake Martin Schwaighofer 2024-01-28 12:59:43 +01:00
  • e920ed393d
    Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301) 0cc4m 2024-02-03 18:15:00 +01:00
  • 52bb63c708
    refactor : switch to emplace_back to avoid extra object (#5291) Michael Klimenko 2024-02-03 12:23:37 +01:00
  • 1ec3332ade
    YaRN : store rope scaling type as int32_t in memory (#5285) Jared Van Bortel 2024-02-03 06:22:06 -05:00
  • 6a66c5071a
    readme : add tenere in the ui tools list (#5284) BADR 2024-02-03 12:20:26 +01:00
  • a305dba8ff
    Fix im2col with 32fp (#5286) AidanBeltonS 2024-02-03 08:11:37 +00:00
  • 191221178f
    perplexity : fix KL divergence calculations on Windows (#5273) kalomaze 2024-02-02 08:15:30 -06:00
  • e437b37fd0
    scripts : parse wtype in server-llm.sh (#5167) Georgi Gerganov 2024-02-02 14:23:40 +02:00
  • 2d40085c26
    py : add check for '.attn.masked_bias' layers to GPT2model (#5281) Mirror Azure 2024-02-02 14:39:09 +03:00
  • b05102fe8c
    Tidy ggml-sycl (#5261) AidanBeltonS 2024-02-02 08:39:48 +00:00
  • 6b91b1e0a9
    docker : add build for SYCL, Vulkan + update readme (#5228) Xuan Son Nguyen 2024-02-02 08:56:31 +01:00
  • e805f0fa99
    [SYCL] get MAX_MEM_ALLOC from device property (#5270) Meng, Hengyu 2024-02-02 15:54:14 +08:00
  • af3ba5d946
    [SYCL] update guide of SYCL backend (#5254) Neo Zhang Jianyu 2024-02-02 15:53:27 +08:00
  • e1e721094d
    llama : fix memory leak in llama_batch_free (#5252) Ian Bull 2024-02-01 23:20:13 -08:00
  • 128dcbd3c9
    add --no-mmap in llama-bench (#5257) Neo Zhang Jianyu 2024-02-02 03:48:53 +08:00
  • 4d0924a890
    Vulkan Phi Fix for AMD Proprietary Drivers (#5260) 0cc4m 2024-02-01 19:25:24 +01:00
  • 8ca511cade
    cuda : fix LLAMA_CUDA_F16 (#5262) slaren 2024-02-01 18:30:17 +01:00
  • d71ac90985
    make : generate .a library for static linking (#5205) Ali Nehzat 2024-02-02 02:18:53 +11:00
  • ce32060198
    llama : support InternLM2 (#5184) Guoteng 2024-02-01 17:19:51 +08:00
  • 1cfb5372cf
    Fix broken Vulkan Cmake (properly) (#5230) Eve 2024-01-31 19:21:55 +00:00
  • d3bac7d584
    llama : reorder build_orion() at correct place (#5118) Georgi Gerganov 2024-01-31 18:47:10 +02:00
  • 5cb04dbc16
    llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240) Georgi Gerganov 2024-01-31 17:30:17 +02:00
  • efb7bdbbd0
    metal : add im2col F32 dst support (#5132) Georgi Gerganov 2024-01-31 15:35:41 +02:00
  • 15606309a0
    llava : add MobileVLM support (#5132) JidongZhang-THU 2024-01-31 21:10:15 +08:00
  • b2b9f025e7
    format license text, restore apache license by legal suggestion (#5233) Neo Zhang Jianyu 2024-01-31 21:04:46 +08:00
  • dabcc5b471
    ggml : limit n_threads to the max n_tasks (#5238) slaren 2024-01-31 13:43:03 +01:00
  • f8e9140cb4
    Vulkan Fixes (#5223) 0cc4m 2024-01-31 11:44:19 +01:00
  • d62520eb2c
    Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231) Yiming Cui 2024-01-31 11:04:21 +08:00
  • 01684139c3
    support SYCL backend windows build (#5208) Neo Zhang Jianyu 2024-01-31 10:38:07 +08:00
  • e8dc55d006
    kompute : llama-bench support and ggml_cpu_has_kompute() (#5226) Jared Van Bortel 2024-01-30 19:04:37 -05:00
  • e0085fdf7c
    Revert "server : change deps.sh xxd files to string literals (#5221)" Georgi Gerganov 2024-01-30 21:19:26 +02:00
  • e6f291d158
    server : fix context shift (#5195) Georgi Gerganov 2024-01-30 20:17:30 +02:00
  • 4003be0e5f
    server : change deps.sh xxd files to string literals (#5221) JohnnyB 2024-01-30 12:15:05 -06:00
  • fea4fd4ba7
    ggml : fix IQ3_XXS on Metal (#5219) Kawrakow 2024-01-30 19:15:28 +02:00
  • 8f8ddfcfad
    sync : ggml (#0) Georgi Gerganov 2024-01-30 16:21:57 +02:00
  • 6fb50ebbf0
    gguf : fix comparison (ggml/715) Georgi Gerganov 2024-01-29 21:08:18 +02:00
  • 625a699b54
    ggml_cuda_cpy support for 4d tensors and float16->float32 upcasting (ggml/686) John Balis 2024-01-29 06:37:33 -06:00
  • a4b07c057a
    gguf : add input validation, prevent integer overflows (ggml/709) Georgi Gerganov 2024-01-29 14:00:10 +02:00
  • 549a1e6cd5
    ci : fix yolo URLs + fix metal capture (ggml/712) Georgi Gerganov 2024-01-29 13:29:46 +02:00
  • 5f14ee0b0c
    metal : add debug capture backend function (ggml/694) Jack Mousseau 2024-01-29 01:22:23 -08:00
  • 8e14e3ddb3
    Faster AVX2 dot product for IQ2_XS (#5187) Kawrakow 2024-01-30 15:15:07 +02:00
  • f4d7e54974
    SOTA 3-bit quants (#5196) Kawrakow 2024-01-30 15:14:12 +02:00
  • 2256f36b79
    Vulkan Windows APU Memory Handling (#5199) 0cc4m 2024-01-30 13:59:30 +01:00
  • 7359016c7c
    quantize : fix typo (#5211) Vladimir Malyutin 2024-01-30 17:57:07 +07:00
  • 813416991a
    main : allow empty --prompt-cache file (#5176) divinity76 2024-01-30 10:18:02 +01:00
  • 5589921ef8
    readme : minor (#5204) Romain Neutron 2024-01-30 10:16:38 +01:00
  • 49f44b5c55
    readme : update hot topics Georgi Gerganov 2024-01-30 11:14:44 +02:00
  • 6685cc41c2
    server : improve README (#5209) Wu Jian Ping 2024-01-30 17:11:46 +08:00
  • ceebbb5b21
    ggml alloc: Fix for null dereference on alloc failure (#5200) Paul Tsochantaris 2024-01-29 22:19:29 +00:00
  • 6daa69ee81
    kompute : fix fallback to CPU (#5201) Jared Van Bortel 2024-01-29 17:11:27 -05:00
  • fbf1ddec69
    Nomic Vulkan backend (#4456) Jared Van Bortel 2024-01-29 15:50:50 -05:00
  • 2aed77eb06
    fix typo "RLIMIT_MLOCK" (#5175) divinity76 2024-01-29 15:45:41 +01:00
  • c82d18e863
    server : embeddings compatibility for OpenAI (#5190) Wu Jian Ping 2024-01-29 21:48:10 +08:00
  • 14fef85e2d
    py : fix except (#5194) Georgi Gerganov 2024-01-29 15:35:54 +02:00
  • e76627bcce
    py : improve BPE tokenizer support (#5189) Sang-Kil Park 2024-01-29 18:24:19 +09:00
  • fbe7dfa53c
    ggml : add max buffer sizes to opencl and metal backends (#5181) slaren 2024-01-29 09:05:13 +01:00
  • 172ac82629
    cmake : fix Vulkan build (#5182) Eve 2024-01-29 08:04:47 +00:00
  • d2f650cb5b
    metal : free metal objects (#5161) Paul Tsochantaris 2024-01-28 19:50:16 +00:00
  • 35dec26cc2
    sync : ggml Georgi Gerganov 2024-01-28 19:48:05 +02:00
  • d460510c72
    ggml : minor type fix (int64_t -> size_t) Georgi Gerganov 2024-01-28 18:44:58 +02:00
  • 2307523d32
    ggml : add Vulkan backend (#2059) 0cc4m 2024-01-28 18:03:59 +01:00
  • 0f648573dd
    ggml : add unified SYCL backend for Intel GPUs (#2690) Abhilash Majumder 2024-01-28 21:26:23 +05:30
  • b764b8f1d0
    flake.lock: Update (#5162) Georgi Gerganov 2024-01-28 16:54:54 +02:00