Commit Graph

  • 6763f713bb
    readme : more lora detail in main example readme (#10064) Rich Dougherty 2024-10-31 01:22:39 +13:00
  • 79a2bc042d
    convert : more detailed convert lora usage docs (#10065) Rich Dougherty 2024-10-31 01:22:21 +13:00
  • fc83a9e584
    ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029) xctan 2024-10-30 15:00:40 +08:00
  • c5b0f4b5d9
    llama : refactor model loader with backend registry (#10026) Diego Devesa 2024-10-30 02:01:23 +01:00
  • 8f275a7c45
    ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) Changyeon Kim 2024-10-29 17:52:56 +09:00
  • 8d8ff71536
    llama : remove Tail-Free sampling (#10071) Georgi Gerganov 2024-10-29 10:42:05 +02:00
  • 61715d5cc8
    llama : Add IBM granite template (#10013) arch-btw 2024-10-28 10:45:33 -07:00
  • 07028f9d74
    flake.lock: Update (#10063) Georgi Gerganov 2024-10-28 17:41:24 +02:00
  • 524afeec9d
    musa: workaround for Guilty Lockup in cleaning src0 (#10042) R0CKSTAR 2024-10-28 17:02:48 +08:00
  • 8125e6cbfc
    server : don't overfill the batch during infill (#10018) Georgi Gerganov 2024-10-28 08:49:32 +02:00
  • 8841ce3f43
    llama : switch KQ multiplication to F32 precision by default (#10015) Georgi Gerganov 2024-10-27 20:59:58 +02:00
  • cc2983d375
    sync : ggml Georgi Gerganov 2024-10-26 10:34:08 +03:00
  • 8c60a8a462
    increase cuda_cpy block size (ggml/996) bssrdf 2024-10-23 14:34:00 -04:00
  • 9e4a2563ea
    scripts : fix amx sync [no ci] Georgi Gerganov 2024-10-26 10:33:31 +03:00
  • 668750357e
    metal : support permuted matrix multiplicaions (#10033) Georgi Gerganov 2024-10-25 22:26:15 +03:00
  • ff252ea48e
    llama : add DRY sampler (#9702) wwoodsTM 2024-10-25 10:07:34 -06:00
  • d80fb71f8b
    llama: string_split fix (#10022) Michael Podvitskiy 2024-10-25 17:57:54 +02:00
  • 2f8bd2b901
    llamafile : extend sgemm.cpp support for Q5_0 models (#10010) Srihari-mcw 2024-10-25 12:57:41 +05:30
  • bc5ba007b2
    server : check that the prompt fits in the slot's context (#10030) Georgi Gerganov 2024-10-25 10:13:46 +03:00
  • 958367bf53
    server : refactor slot input data, move tokenizer to HTTP thread (#10023) Xuan Son Nguyen 2024-10-24 21:51:22 +02:00
  • 40f2555797
    ci : fix cmake flags for SYCL Georgi Gerganov 2024-10-24 21:23:33 +03:00
  • 167a515651
    CUDA: fix insufficient buffer clearing for MMQ (#10032) Johannes Gäßler 2024-10-24 14:40:23 +02:00
  • c39665f589
    CUDA: fix MMQ for non-contiguous src0, add tests (#10021) Johannes Gäßler 2024-10-24 11:09:36 +02:00
  • 0a1c750c80
    server : samplers accept the prompt correctly (#10019) wwoodsTM 2024-10-23 13:27:51 -06:00
  • 190a37d797
    sync : ggml Georgi Gerganov 2024-10-23 17:23:55 +03:00
  • 2d3aba9ee8
    llama.vim : bump generation time limit to 3s [no ci] Georgi Gerganov 2024-10-23 17:16:56 +03:00
  • 80273a306d CUDA: fix 1D im2col, add tests (ggml/993) Johannes Gäßler 2024-10-18 09:24:44 +02:00
  • c19af0acb1 ggml : remove redundant set of contexts used field (ggml/978) Daniel Bevenius 2024-10-16 20:10:01 +02:00
  • ac113a0fee
    llama.vim : add classic vim support (#9995) Michael Coppola 2024-10-23 07:09:26 -04:00
  • 4c9388fb96
    metal : add POOL2D and fix IM2COL (#9943) Jun Hee Yoo 2024-10-23 19:33:45 +09:00
  • 873279b159 flake.lock: Update github-actions[bot] 2024-10-20 00:22:59 +00:00
  • c8c07d658a
    llama : fix empty batch causing llama_batch_allocr to crash (#9966) Xuan Son Nguyen 2024-10-22 16:59:02 +02:00
  • 19d900a756
    llama : rename batch to ubatch (#9950) Daniel Bevenius 2024-10-22 15:31:06 +02:00
  • 11d47057a5
    Rwkv chat template fix (#10001) Molly Sophia 2024-10-22 21:22:26 +08:00
  • c421ac072d
    lora : warn user if new token is added in the adapter (#9948) Xuan Son Nguyen 2024-10-22 13:08:41 +02:00
  • 4ff7fe1fb3
    llama : add chat template for RWKV-World + fix EOT (#9968) Molly Sophia 2024-10-22 18:33:37 +08:00
  • 6b8447352d
    [CANN] Adapt to dynamically loadable backends mechanism (#9970) leo-pony 2024-10-22 16:16:01 +08:00
  • 674804a996
    arg : fix typo in embeddings argument help [no ci] (#9994) Daniel Bevenius 2024-10-22 09:40:02 +02:00
  • e94a138d64
    llama.vim : fix info text display [no ci] (#9787) Georgi Gerganov 2024-10-22 00:35:25 +03:00
  • e01c67affe
    llama.vim : move info to the right of screen [no ci] (#9787) Georgi Gerganov 2024-10-21 22:52:22 +03:00
  • 994cfb1acb
    readme : update UI list (#9972) Asghar Ghorbani 2024-10-21 20:20:59 +02:00
  • 94008cc760
    arg : fix attention non-causal arg value hint (#9985) Daniel Bevenius 2024-10-21 20:12:52 +02:00
  • dbd5f2f573
    llama.vim : plugin for Neovim (#9787) Georgi Gerganov 2024-10-21 20:25:02 +03:00
  • f594bc80ba
    ggml : add asserts for type conversion in fattn kernels (#9971) Georgi Gerganov 2024-10-21 16:20:46 +03:00
  • d5ebd79c76
    rpc : pack only RPC structs (#9959) Radoslav Gerganov 2024-10-21 13:35:40 +03:00
  • 55e47786e3
    llama : default sampling changes + greedy update (#9897) Georgi Gerganov 2024-10-21 09:46:40 +03:00
  • bc21975084
    speculative : fix handling of some input params (#9963) Georgi Gerganov 2024-10-21 09:37:12 +03:00
  • 1db8c84fc6
    fix mul_mat_vec_q and *_vec_q error (#9939) Neo Zhang Jianyu 2024-10-21 14:26:09 +08:00
  • 45f097645e
    readme : update bindings list (#9951) Loïc Carrère 2024-10-20 18:25:41 +02:00
  • 7cab2083c7
    readme : update infra list (#9942) icppWorld 2024-10-20 12:01:34 -04:00
  • cda0e4b648
    llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745) Xuan Son Nguyen 2024-10-18 23:18:01 +02:00
  • afd9909a64
    rpc : backend refactoring (#9912) Radoslav Gerganov 2024-10-18 14:33:58 +03:00
  • 87421a23e8
    [SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705) Ouadie EL FAROUKI 2024-10-18 06:46:16 +01:00
  • 60ce97c9d8
    add amx kernel for gemm (#8998) Ma Mingfei 2024-10-18 13:34:36 +08:00
  • 8901755ba3
    server : add n_indent parameter for line indentation requirement (#9929) Georgi Gerganov 2024-10-18 07:32:19 +03:00
  • 6f55bccbb8
    llama : rename batch_all to batch (#8881) Daniel Bevenius 2024-10-18 01:41:51 +02:00
  • 17bb928080
    readme : remove --memory-f32 references (#9925) Georgi Gerganov 2024-10-17 23:43:05 +03:00
  • 9f45fc1e99
    llama : change warning to debug log Georgi Gerganov 2024-10-17 23:26:32 +03:00
  • 99bd4ac28c
    llama : infill sampling handle very long tokens (#9924) Georgi Gerganov 2024-10-17 22:32:47 +03:00
  • 3752217ed5
    readme : update bindings list (#9918) Tim Wang 2024-10-17 17:57:14 +11:00
  • f010b77a37
    vulkan : add backend registry / device interfaces (#9721) Diego Devesa 2024-10-17 02:46:58 +02:00
  • 2194200278
    fix: allocating CPU buffer with size 0 (#9917) Gilad S. 2024-10-17 02:34:22 +03:00
  • 73afe681aa
    fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) Gilad S. 2024-10-17 01:36:51 +03:00
  • 9e04102448
    llama : suppress conversion from 'size_t' to 'int' (#9046) Daniel Bevenius 2024-10-16 19:34:28 +02:00
  • dbf18e4de9
    llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
  • 66c2c93082
    grammar : fix JSON Schema for string regex with top-level alt. (#9903) Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
  • 10433e8b45
    llama : add tensor name for "result_norm" (#9907) Molly Sophia 2024-10-16 18:10:21 +08:00
  • 1f66b699c4
    server : fix the disappearance of the end of the text (#9867) Alexey Parfenov 2024-10-16 08:35:53 +00:00
  • 0e41b300ed
    sync : ggml Georgi Gerganov 2024-10-16 11:28:14 +03:00
  • cd60b88bf7
    ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
  • becfd387f6
    [CANN] Fix cann compilation error (#9891) leo-pony 2024-10-16 08:51:46 +08:00
  • 755a9b2bf0
    llama : add infill sampler (#9896) Georgi Gerganov 2024-10-15 16:35:33 +03:00
  • 223c25a72f
    server : improve infill context reuse (#9894) Georgi Gerganov 2024-10-15 16:28:55 +03:00
  • fbc98b748e
    sampling : add XTC sampler (#9742) MaggotHATE 2024-10-15 15:54:55 +05:00
  • dcdd535302
    server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
  • 4c42f93b22
    readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
  • a89f75e1b7
    server : handle "logprobs" field with false value (#9871) VoidIsVoid 2024-10-14 15:04:36 +08:00
  • 13dca2a54a
    Vectorize load instructions in dmmv f16 CUDA kernel (#9816) agray3 2024-10-14 01:49:08 +01:00
  • d4c19c0f5c
    server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
  • c7181bd294
    server : reuse cached context chunks (#9866) Georgi Gerganov 2024-10-13 18:52:48 +03:00
  • 92be9f1216
    flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
  • edc265661c
    server : add option to time limit the generation phase (#9865) Georgi Gerganov 2024-10-12 16:14:27 +03:00
  • 1bde94dd02
    server : remove self-extend features (#9860) Georgi Gerganov 2024-10-12 16:06:31 +03:00
  • 95c76e8e92
    server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
  • 11ac9800af
    llama : improve infill support and special token detection (#9798) Georgi Gerganov 2024-10-12 08:21:51 +03:00
  • 943d20b411
    musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
  • 96776405a1
    ggml : move more prints to the ggml log system (#9839) Diego Devesa 2024-10-11 15:34:45 +02:00
  • 7eee341bee
    common : use common_ prefix for common library functions (#9805) Diego Devesa 2024-10-10 22:57:42 +02:00
  • 0e9f760eb1
    rpc : add backend registry / device interfaces (#9812) Diego Devesa 2024-10-10 20:14:55 +02:00
  • cf8e0a3bb9
    musa: add docker image support (#9685) R0CKSTAR 2024-10-11 02:10:37 +08:00
  • c7499c557c
    examples : do not use common library in simple example (#9803) Diego Devesa 2024-10-10 19:50:49 +02:00
  • c81f3bbb05
    cmake : do not build common library by default when standalone (#9804) Diego Devesa 2024-10-09 18:49:52 +02:00
  • e7022064ab
    perplexity : fix integer overflow (#9783) Georgi Gerganov 2024-10-09 17:00:18 +03:00
  • 3dc48fe75a
    examples : remove llama.vim Georgi Gerganov 2024-10-09 10:55:42 +03:00
  • dca1d4b58a
    ggml : fix BLAS with unsupported types (#9775) Diego Devesa 2024-10-08 14:21:43 +02:00
  • 458367a906
    server : better security control for public deployments (#9776) Xuan Son Nguyen 2024-10-08 13:27:04 +02:00
  • fa42aa6d89
    scripts : fix spelling typo in messages and comments (#9782) standby24x7 2024-10-08 15:19:53 +09:00
  • 6374743747
    ggml : add backend registry / device interfaces to BLAS backend (#9752) Diego Devesa 2024-10-07 21:55:08 +02:00
  • f1af42fa8c
    Update building for Android (#9672) Andrew Minh Nguyen 2024-10-07 09:37:31 -07:00
  • 6279dac039
    flake.lock: Update (#9753) Georgi Gerganov 2024-10-07 19:35:42 +03:00