Commit Graph

  • 5d4f12e462
    server: add cURL support to server.Dockerfile (#6461) Ed Lepedus 2024-04-03 18:56:37 +01:00
  • 154d4ee39c
    readme : add feature-rich rust bindings (#6465) Francisco Melo 2024-04-03 18:53:37 +01:00
  • e69945d953
    security : create policy (#6354) Joyce 2024-04-03 14:48:07 -03:00
  • db214fa578
    Missing tokenizer.model error during gguf conversion (#6443) Abhishek Gopinath K 2024-04-03 21:12:52 +05:30
  • 1ff4d9f3d6
    Add OpenChat, Alpaca, Vicuna chat templates (#6397) kaizau 2024-04-03 23:24:31 +08:00
  • 076b08649e
    readme : update hot topics Georgi Gerganov 2024-04-03 16:11:15 +03:00
  • 08a0c02060
    ggml : mul_mat_id use the same tensor for all the experts (#6387) slaren 2024-04-03 15:07:05 +02:00
  • 52604860f9
    [SYCL] Disable iqx on windows as WA (#6435) Meng, Hengyu 2024-04-03 10:34:40 +08:00
  • f87f7b8986
    flake.lock: Update (#6402) Georgi Gerganov 2024-04-01 19:05:57 +03:00
  • 33a5244806
    compare-llama-bench.py: fix long hexsha args (#6424) Johannes Gäßler 2024-04-01 13:30:43 +02:00
  • 226e819371
    ci: server: verify deps are coherent with the commit (#6409) Pierrick Hymbert 2024-04-01 12:36:40 +02:00
  • c50a82ce0f
    readme : update hot topics Georgi Gerganov 2024-03-31 11:56:30 +03:00
  • 37e7854c10
    ci: bench: fix Resource not accessible by integration on PR event (#6393) Pierrick Hymbert 2024-03-30 11:36:07 +01:00
  • c342d070c6
    Fedora build update (#6388) Mohammadreza Hendiani 2024-03-30 01:29:56 +03:30
  • f7fc5f6c6f
    split: allow --split-max-size option (#6343) Xuan Son Nguyen 2024-03-29 22:34:44 +01:00
  • ba0c7c70ab
    Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 0cc4m 2024-03-29 17:29:21 +01:00
  • d48ccf3ad4
    sync : ggml (#6351) Georgi Gerganov 2024-03-29 17:45:46 +02:00
  • 069574775c
    [Model] Add support for xverse (#6301) hxer7963 2024-03-29 21:37:03 +08:00
  • cfde806eb9
    ci : fix BGE wget (#6383) Georgi Gerganov 2024-03-29 14:34:28 +02:00
  • b910287954
    readme : add project (#6356) zhouwg 2024-03-29 15:33:46 +08:00
  • 8093987090
    cmake : add explicit metal version options (#6370) Matt Clayton 2024-03-29 03:27:42 -04:00
  • 057400a3fd
    llama : remove redundant reshape in build_kv_store (#6369) Daniel Bevenius 2024-03-29 08:23:22 +01:00
  • b75c38166c
    convert : allow conversion of Mistral HF models (#6144) Pedro Cuenca 2024-03-29 08:15:00 +01:00
  • bfe7dafc9c
    readme : add notice for UI list Georgi Gerganov 2024-03-28 22:56:03 +02:00
  • 5106ef482c
    [SYCL] Revisited & updated SYCL build documentation (#6141) Ouadie EL FAROUKI 2024-03-28 16:01:47 +00:00
  • be55134a53
    convert : refactor vocab selection logic (#6355) Jared Van Bortel 2024-03-28 11:44:36 -04:00
  • 66ba560256
    llava : fix MobileVLM (#6364) Ziang Wu 2024-03-28 22:33:10 +08:00
  • 0308f5e3d7
    llama : fix command-r inference when omitting outputs (#6367) compilade 2024-03-28 08:05:54 -04:00
  • 28cb9a09c4
    ci: bench: fix master not schedule, fix commit status failed on external repo (#6365) Pierrick Hymbert 2024-03-28 11:27:56 +01:00
  • cfc4d75df6
    doc: fix outdated default value of batch size (#6336) Ting Sun 2024-03-28 16:51:06 +08:00
  • 6902cb7f2e
    server : stop gracefully on SIGTERM (#6348) Eric Zhang 2024-03-28 16:50:48 +08:00
  • d2d8f38996 nix: removed unnessesary indentation hutli 2024-03-27 19:17:30 +01:00
  • d39b308eaf nix: moved blas availability check to package inputs so it is still overridable hutli 2024-03-27 19:14:28 +01:00
  • c873976649 using blas.meta.available to check host platform hutli 2024-03-27 18:10:08 +01:00
  • dbb03e2b9c only using explicit blas if hostPlatform is allowed hutli 2024-03-27 17:25:05 +01:00
  • e9f17dc3bf nix: .#windows: proper cross-compilation set-up Someone Serge 2024-03-26 16:22:42 +00:00
  • 22a462cc1f nix: package: don't introduce the dependency on python Someone Serge 2024-03-26 16:22:07 +00:00
  • f6a0f5c642 nix: .#widnows: init hutli 2024-02-15 14:25:04 +01:00
  • d0e2f6416b
    doc: fix typo in MobileVLM-README.md (#6181) Ziang Wu 2024-03-28 12:03:30 +08:00
  • 25f4a613c4
    [SYCL] fix set main gpu crash (#6339) Neo Zhang Jianyu 2024-03-28 08:55:24 +08:00
  • a016026a3a
    server: continuous performance monitoring and PR comment (#6283) Pierrick Hymbert 2024-03-27 20:26:49 +01:00
  • 53c7ec53d5 nix: ci: dont test cuda and rocm (for now) Someone Serge 2024-03-27 16:17:46 +00:00
  • e5b89a441a
    ggml : fix bounds checking of zero size views (#6347) slaren 2024-03-27 15:07:50 +01:00
  • 3a0345970e
    make : whitespace Georgi Gerganov 2024-03-27 15:02:49 +02:00
  • 1e13987fba
    embedding : show full embedding for single prompt (#6342) howlger 2024-03-27 12:15:44 +01:00
  • e82f9e2b83
    [SYCL] Fix batched impl for NVidia GPU (#6164) AidanBeltonS 2024-03-27 08:16:40 +00:00
  • cbc8343619
    Make IQ1_M work for QK_K = 64 (#6327) Kawrakow 2024-03-27 08:44:27 +01:00
  • e562b9714b
    common : change --no-penalize-nl to --penalize-nl (#6334) Sigbjørn Skjæret 2024-03-27 08:23:10 +01:00
  • 2ab4f00d25
    llama2c : open file as binary (#6332) Georgi Gerganov 2024-03-27 09:16:02 +02:00
  • 1740d6dd4e
    readme : add php api bindings (#6326) Mateusz Charytoniuk 2024-03-27 08:08:59 +01:00
  • 0642b22cd1
    server: public: use relative routes for static files (#6325) Eric Zhang 2024-03-27 13:55:29 +08:00
  • a4f569e8a3
    [SYCL] fix no file in win rel (#6314) Neo Zhang Jianyu 2024-03-27 09:47:06 +08:00
  • 32c8486e1f
    wpm : portable unicode tolower (#6305) Jared Van Bortel 2024-03-26 17:46:21 -04:00
  • 557410b8f0
    llama : greatly reduce output buffer memory usage (#6122) compilade 2024-03-26 10:46:41 -04:00
  • 55c1b2a3bb
    IQ1_M: 1.75 bpw quantization (#6302) Kawrakow 2024-03-26 15:21:27 +01:00
  • e097633f63
    convert-hf : fix exception in sentencepiece with added tokens (#6320) Pedro Cuenca 2024-03-26 13:32:19 +01:00
  • d25b1c31b0
    quantize : be able to override metadata by key (#6321) Kawrakow 2024-03-26 13:09:30 +01:00
  • deb7240100
    embedding : adjust n_ubatch value (#6296) Minsoo Cheong 2024-03-26 18:11:46 +09:00
  • 3d032ece8e
    server : add n_discard parameter (#6300) Jan Boon 2024-03-26 16:47:43 +08:00
  • e190f1fca6
    nix: make xcrun visible in Nix sandbox for precompiling Metal shaders (#6118) Joseph Stahl 2024-03-25 20:51:46 -04:00
  • 280345968d
    cuda : rename build flag to LLAMA_CUDA (#6299) slaren 2024-03-26 01:16:01 +01:00
  • b06c16ef9f
    nix: fix blas support (#6281) Christian Kögler 2024-03-25 18:52:45 +01:00
  • 1f2fd4e727
    tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303) Kawrakow 2024-03-25 18:33:15 +01:00
  • 43139cc528
    flake.lock: Update (#6266) Georgi Gerganov 2024-03-25 17:22:27 +02:00
  • 2f34b865b6
    cuda : fix LLAMA_CUDA_F16 build (#6298) slaren 2024-03-25 15:43:22 +01:00
  • ae1f211ce2
    cuda : refactor into multiple files (#6269) slaren 2024-03-25 13:50:23 +01:00
  • ad3a0505e3
    Server: clean up OAI params parsing function (#6284) Xuan Son Nguyen 2024-03-25 09:42:17 +01:00
  • 95ad616cdd
    [SYCL] fix SYCL backend build on windows is break by LOG() error (#6290) Neo Zhang Jianyu 2024-03-25 15:52:41 +08:00
  • 64e7b47c69
    examples : add "retrieval" (#6193) Minsoo Cheong 2024-03-25 16:38:22 +09:00
  • 7733f0c760
    ggml : support AVX512VNNI (#6280) Justine Tunney 2024-03-25 01:39:56 -04:00
  • a32b77c4b2
    Fix heap corruption from wmode out-of-bound writes on windows (#6272) Rick G 2024-03-24 14:45:56 -07:00
  • a0e584defd
    imatrix : fix wname for mul_mat_id ops (#6271) Georgi Gerganov 2024-03-24 16:18:45 +02:00
  • 7aed0ffe68
    Fixed lookup compilation issues on Windows (#6273) Johannes Gäßler 2024-03-24 14:21:17 +01:00
  • ea279d5609
    ci : close inactive issue, increase operations per run (#6270) Pierrick Hymbert 2024-03-24 09:57:06 +01:00
  • 586e7bc561
    sampling : deduplicated code for probability distribution access (#6240) Minsoo Cheong 2024-03-24 17:54:07 +09:00
  • ddf6568510
    [SYCL] offload op (#6217) Meng, Hengyu 2024-03-24 12:04:25 +08:00
  • d03224ac98
    Support build win release for SYCL (#6241) Neo Zhang Jianyu 2024-03-24 09:44:01 +08:00
  • 94d1b3b411
    use _wfopen instead of fopen on Windows (#6248) Jared Van Bortel 2024-03-23 18:48:02 -04:00
  • 95562175f8
    gitignore : gguf-split Georgi Gerganov 2024-03-23 21:35:23 +02:00
  • f482bb2e49
    common: llama_load_model_from_url split support (#6192) Pierrick Hymbert 2024-03-23 18:07:00 +01:00
  • 1997577d5e
    server: docs: --threads and --threads, --ubatch-size, --log-disable (#6254) Pierrick Hymbert 2024-03-23 18:00:38 +01:00
  • 476b0251b2
    llama : add grok-1 support (#6204) Julius Arkenberg 2024-03-23 17:41:53 +01:00
  • 21cad01b6e
    split: add gguf-split in the make build target (#6262) Pierrick Hymbert 2024-03-23 17:18:13 +01:00
  • 1b26aebe4d
    server: flush stdout after logging in both text and json layout (#6253) Pierrick Hymbert 2024-03-23 13:18:45 +01:00
  • 50ccaf5eac
    lookup: complement data from context with general text statistics (#5479) Johannes Gäßler 2024-03-23 01:24:36 +01:00
  • 56a00f0a2f
    common : default --hf-file to --model (#6234) Georgi Gerganov 2024-03-22 21:10:39 +02:00
  • 92397d87a4
    convert-llama2c-to-ggml : enable conversion of GQA models (#6237) fraxy-v 2024-03-22 20:49:06 +02:00
  • 1d0331c12a
    quantize: options for output and token embedding tensors qtype (#6239) Kawrakow 2024-03-22 19:47:14 +01:00
  • dba1af6129
    llama_model_loader: support multiple split/shard GGUFs (#6187) Pierrick Hymbert 2024-03-22 19:00:01 +01:00
  • ee804f6223
    ci: apply concurrency limit for github workflows (#6243) Minsoo Cheong 2024-03-23 02:15:06 +09:00
  • 80bd33bc2c
    common : add HF arg helpers (#6234) Georgi Gerganov 2024-03-22 15:33:38 +02:00
  • e80f06d2a1
    llama : correction of the attn.v.weight quantization for IQ3_XS (#6209) Nexesenex 2024-03-22 14:32:02 +01:00
  • f77a8ffd3b
    tests : conditional python & node json schema tests (#6207) Olivier Chafik 2024-03-22 13:09:07 +00:00
  • 72114edf06
    json-schema-to-grammar : fix order of props + non-str const/enum (#6232) Olivier Chafik 2024-03-22 13:07:44 +00:00
  • 2f0e81e053
    cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208) slaren 2024-03-22 14:05:31 +01:00
  • 29ab270e65
    readme : add RecurseChat to the list of UIs (#6219) Xiaoyi Chen 2024-03-22 04:29:49 -07:00
  • 6b8bb3a31d
    server : fix n_keep always showing as 0 in response (#6211) Jan Boon 2024-03-22 19:12:05 +08:00
  • 68e210b354
    server : enable continuous batching by default (#6231) Georgi Gerganov 2024-03-22 13:08:28 +02:00
  • b3e94f26ba
    metal : proper assert for mat-mat memory alignment (#6225) Georgi Gerganov 2024-03-22 11:35:53 +02:00
  • b2075fd6a5
    ci : add CURL flag for the mac builds (#6214) Vaibhav Srivastav 2024-03-22 08:53:43 +01:00