Commit Graph

  • 3814a07392
    [SYCL] Add support for SYCL Nvidia target (#5738) AidanBeltonS 2024-03-11 01:13:57 +00:00
  • bb6d00bbf9
    metal : move mm_id indices to shared mem (#5982) Georgi Gerganov 2024-03-10 23:12:48 +02:00
  • 7ab7b733bb
    android : fix utf8 decoding error (#5935) Dean 2024-03-11 04:03:17 +08:00
  • d9f65c97c3
    readme : update hot topics Georgi Gerganov 2024-03-10 20:58:26 +02:00
  • b838b53ad6
    sync : ggml Georgi Gerganov 2024-03-10 20:10:46 +02:00
  • df4dc3e7cb
    ggml : try fix 32-bit arm compat (whisper/1938) Georgi Gerganov 2024-03-08 23:45:07 +02:00
  • bf47a5eefc
    ggml : remove __constant__ specifier for CUDA tables (#5940) Georgi Gerganov 2024-03-10 20:09:24 +02:00
  • fa8a809a91
    server: ci: windows build and tests (#5968) Pierrick Hymbert 2024-03-10 18:17:47 +01:00
  • bcebd7dbf6
    llama : add support for GritLM (#5959) DAN™ 2024-03-10 11:56:30 -04:00
  • 2960eae847
    grammar : verify parsed state (#5950) Clint Herron 2024-03-10 11:17:43 -04:00
  • c78541479c
    nix: update flake.lock (#5969) Georgi Gerganov 2024-03-10 16:43:08 +02:00
  • 621e86b331
    server: benchmark: chat/completions scenario and other llm servers comparison (#5941) Pierrick Hymbert 2024-03-09 23:41:49 +01:00
  • 77d1ac7e00
    server : print chat template info Georgi Gerganov 2024-03-09 22:04:00 +02:00
  • d894f352bf
    perplexity : support using multiple sequences to allow larger batch sizes (#5946) slaren 2024-03-09 19:55:54 +01:00
  • 098dbaab44
    readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
  • 8380ecfb21
    ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) Georgi Gerganov 2024-03-09 17:36:20 +02:00
  • 58308a0ecc
    server : fix metrics init (#5964) Georgi Gerganov 2024-03-09 17:34:15 +02:00
  • 5b09797321
    ggml : remove old quantization functions (#5942) Georgi Gerganov 2024-03-09 15:53:59 +02:00
  • 97c09585d6
    server : clarify some items in the readme (#5957) Georgi Gerganov 2024-03-09 15:47:47 +02:00
  • fb215c3832
    server : normalize embeddings (#5956) SeungWon Jeong 2024-03-09 21:27:58 +09:00
  • 2c4f566c88
    tests : gitignore ggml-common.h Georgi Gerganov 2024-03-09 14:17:11 +02:00
  • 0db32beaf0
    server : fix passing prompt as tokens (#5955) Alexey Parfenov 2024-03-09 11:16:53 +00:00
  • 8a3012a4ad
    ggml : add ggml-common.h to deduplicate shared code (#5940) Georgi Gerganov 2024-03-09 12:47:57 +02:00
  • 9674aaf35c
    server : simplify logic for empty prompts (#5953) Georgi Gerganov 2024-03-09 12:34:18 +02:00
  • 950ba1ab84
    Server: reorganize some http logic (#5939) Xuan Son Nguyen 2024-03-09 11:27:53 +01:00
  • e1fa9569ba
    server : add SSL support (#5926) Gabe Goodhart 2024-03-09 02:57:09 -07:00
  • fd72d2d2a5
    server: tests: add truncated prompt tests, better kv cache size (#5933) Pierrick Hymbert 2024-03-09 10:30:04 +01:00
  • c2101a2e90
    llama : support Mamba Selective State Space Models (#5328) compilade 2024-03-08 17:31:00 -05:00
  • 515f7d0d4f
    llama : fix quantization of shared token_embd (#5944) compilade 2024-03-08 10:53:37 -05:00
  • 76e868821a
    server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) Pierrick Hymbert 2024-03-08 12:25:04 +01:00
  • e457fb3540
    llama : assume tied weights if lm_head/output weights is missing (#5824) Don Mahurin 2024-03-08 02:41:50 -08:00
  • af37fd8b30
    server : fix EOS token detection with disabled cache (#5938) Georgi Gerganov 2024-03-08 12:40:02 +02:00
  • 581ed5c4fe
    log : fix MSVC compile errors (#5643) UEXTM.com 2024-03-08 04:35:04 -05:00
  • 6cdabe6526
    llama-bench : add embeddings option (#5924) Georgi Gerganov 2024-03-07 16:32:38 +02:00
  • 89fb735fcf
    Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) Neo Zhang Jianyu 2024-03-07 19:14:49 +08:00
  • 55a2a900ff
    server : add /v1/completions endpoint (#5914) Minsoo Cheong 2024-03-07 19:42:39 +09:00
  • 2002bc96bf
    server : refactor (#5882) Georgi Gerganov 2024-03-07 11:41:53 +02:00
  • ceca1aef07
    [SYCL] fix error when set main gpu to non-zero (#5901) Neo Zhang Jianyu 2024-03-07 16:34:31 +08:00
  • e04e04f8fa
    ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906) Jared Van Bortel 2024-03-06 15:42:23 -05:00
  • e25fb4b18f
    ggml : use uint8x16_t return type for ggml_vqtbl1q_u8 (#5894) bobqianic 2024-03-06 07:35:07 +00:00
  • 1e35d619a6
    convert : remove AWQ remnants (#5768) Georgi Gerganov 2024-03-06 09:12:25 +02:00
  • 8ced9f7e32
    add wait() to make code stable (#5895) Neo Zhang Jianyu 2024-03-06 12:08:32 +08:00
  • 652ca2bded
    compare-llama-bench.py : remove mul_mat_q (#5892) slaren 2024-03-05 22:27:29 +01:00
  • bd836944f8
    quants : use MM256_SET_M128I consistently to fix gcc 7 build (#5889) Jared Van Bortel 2024-03-05 11:56:37 -05:00
  • 3de31677d3
    grammars : blacklists character control set (#5888) ExtReMLapin 2024-03-05 17:33:08 +01:00
  • 82cb31eb93
    Revert "grammars : don't allow to output unescaped new line in string (#5885)" Georgi Gerganov 2024-03-05 15:56:24 +02:00
  • b1a4e994fd
    grammars : don't allow to output unescaped new line in string (#5885) ExtReMLapin 2024-03-05 14:44:29 +01:00
  • 61d1c88e15
    Vulkan Improvements (#5835) 0cc4m 2024-03-05 13:33:42 +01:00
  • 21b0867433
    [SYCL] fix mul_mat fault in CI/unit-test (#5862) Neo Zhang Jianyu 2024-03-05 16:08:35 +08:00
  • 6a87ac3a52
    fix editorconfig check break (#5879) Minsoo Cheong 2024-03-05 15:12:23 +09:00
  • 29eee40474
    fix speculative decoding build on windows (#5874) Jeffrey Quesnelle 2024-03-04 19:23:06 -08:00
  • 1d41d6f7c2
    nix: static build (#5814) hutli 2024-03-05 02:33:08 +01:00
  • 29ae62d2ae
    llama : fix embeddings (#5796) Georgi Gerganov 2024-03-04 22:31:20 +02:00
  • e0843afe1b
    flake : fix Georgi Gerganov 2024-03-04 21:50:50 +02:00
  • a1c6d96ed8 ggml : fix unknown status (#0) Georgi Gerganov 2024-03-04 20:53:27 +02:00
  • efd8533ef8 sync : ggml Georgi Gerganov 2024-03-04 11:06:39 +02:00
  • 9fa2627347 ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +01:00
  • fe52be11e3
    cmake : handle cases where git index is not found in .git (#5844) Dane Madsen 2024-03-05 05:26:55 +11:00
  • 6d341ab6c5
    speculative : implement stochastic speculative sampling (#5625) Minsoo Cheong 2024-03-05 03:24:00 +09:00
  • 4ffcdce2ff
    add alias for chat template (#5858) Xuan Son Nguyen 2024-03-04 12:22:08 +01:00
  • a0fc62661f
    sync : ggml Georgi Gerganov 2024-03-04 10:40:04 +02:00
  • 7d43c585dc
    add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) leejet 2024-03-03 20:23:52 +08:00
  • 82f3e668ad
    common : use LLAMA_DEFAULT_SEED (#5855) DAN™ 2024-03-04 03:08:19 -05:00
  • 5a51cc1bb4
    main : support special tokens as reverse/anti prompt (#5847) DAN™ 2024-03-04 02:57:20 -05:00
  • 67be2ce101
    cuda : fix data race in soft max (#5853) slaren 2024-03-03 14:26:18 +01:00
  • 231ae28f07
    readme : add API changes section Georgi Gerganov 2024-03-03 12:44:03 +02:00
  • 475df1d6cf
    llama : allow for user specified embedding pooling type (#5849) Douglas Hanley 2024-03-03 04:40:27 -06:00
  • 87c2e8b279
    gguf-dump : support i-quants (#5841) Nindaleth 2024-03-03 09:43:42 +01:00
  • de9692a7d2
    llama : fix llama_copy_state_data with fragmented KV cache (#5840) compilade 2024-03-03 03:41:55 -05:00
  • e6029348e8
    ci : schedule slow server tests only on Release or on demand (#5839) Pierrick Hymbert 2024-03-03 09:35:23 +01:00
  • 8ef969afce
    server : init http requests thread pool with --parallel if set (#5836) Pierrick Hymbert 2024-03-03 08:48:36 +01:00
  • fa974646e1
    flake.lock: Update (#5842) Georgi Gerganov 2024-03-03 06:11:31 +02:00
  • 9731134296
    server: tests: passkey challenge / self-extend with context shift demo (#5832) Pierrick Hymbert 2024-03-02 22:00:14 +01:00
  • 4a6e2d6142
    llama : add abort_callback to interrupt computation (#5409) Michael Podvitskiy 2024-03-02 20:52:25 +01:00
  • 494c870326
    ggml : fix IQ3_S AVX implementation (#5834) Georgi Gerganov 2024-03-02 20:00:49 +02:00
  • 4d4d2366fc
    convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) Jared Van Bortel 2024-03-02 12:27:26 -05:00
  • c7a0ad8ec9
    convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
  • bbde6eb256
    ggml : IQ3_S improvements (#5829) Kawrakow 2024-03-02 17:00:51 +02:00
  • ef2cd694c4
    scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
  • 6c32d8c7ad
    llama : refactor internal quantization functions (#5830) Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
  • 802da0091b
    llama : fix segfault from unknown model arch name (#5820) compilade 2024-03-02 08:42:56 -05:00
  • 715641391d
    Support multiple GPUs (split mode) on SYCL backend (#5806) Neo Zhang Jianyu 2024-03-02 19:49:30 +08:00
  • 9bf297a02b
    workflows : remove nocleanup arg for check-requirements.sh (#5826) crasm 2024-03-02 00:11:06 -05:00
  • cb5e8f7fc4
    build(nix): Introduce flake.formatter for nix fmt (#5687) Tushar 2024-03-02 04:48:26 +05:30
  • da3b9ba2b7
    convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) nold 2024-03-01 22:51:12 +01:00
  • c29af7e225
    llama : add StarCoder2 support (#5795) Sourab Mangrulkar 2024-03-02 01:00:46 +05:30
  • 38d16b1426
    server : remove api_like_OAI.py proxy script (#5808) Georgi Gerganov 2024-03-01 20:00:58 +02:00
  • c2224f003b
    ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) ddpasa 2024-03-01 18:00:00 +01:00
  • e743386728
    gemma : fix bfloat16 -> float16 conversion issue (#5810) kunal-vaishnavi 2024-03-01 06:08:08 -08:00
  • f49a535686
    common : fix flag --logits-all to --all-logits (#5805) Miwa / Ensan 2024-03-01 22:48:56 +09:00
  • 3ab8b3a92e
    llama : cleanup unused mmq flags (#5772) Pierrick Hymbert 2024-03-01 12:39:06 +01:00
  • 9600d59e01
    unicode : switch to multimap based nfd_map (#5799) Douglas Hanley 2024-03-01 03:15:36 -06:00
  • 5cb02b4a01
    server: allow to override threads server pool with --threads-http (#5794) Pierrick Hymbert 2024-03-01 10:08:08 +01:00
  • 6ea0f010ff
    ci : add Ubuntu 22 Vulkan CI run (#5789) Eve 2024-03-01 08:54:53 +00:00
  • f105471ef6
    server : fix newlines in help (#5785) Georgi Gerganov 2024-03-01 09:59:43 +02:00
  • 38d1521608
    [SYCL] Use batched mul_mat pathway (#5591) AidanBeltonS 2024-03-01 07:36:47 +00:00
  • 052051d8ae
    Server: normalize naming (#5779) Xuan Son Nguyen 2024-02-29 21:42:11 +01:00
  • d5ab29757e
    llama : constified llama_set_state_data's src (#5774) Marcus Dunn 2024-02-29 00:17:23 -08:00
  • 87c91c0766
    ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771) Georgi Gerganov 2024-02-28 21:44:21 +02:00
  • 317709b2a8
    make portability_enumeration_ext apple only (#5757) Eve 2024-02-28 19:33:37 +00:00