Commit Graph

  • fb487bb567
    common : add support for cpu_get_num_physical_cores() on Windows (#8771) Liu Jia 2024-08-16 14:23:12 +08:00
  • 2a24c8caa6
    Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922) Yoshi Suhara 2024-08-15 19:23:33 -07:00
  • e3f6fd56b1
    ggml : dynamic ggml_sched_max_splits based on graph_size (#9047) Nico Bosshard 2024-08-16 04:22:55 +02:00
  • 4b9afbbe90
    retrieval : fix memory leak in retrieval query handling (#8955) gtygo 2024-08-15 15:40:12 +08:00
  • 37501d9c79
    server : fix duplicated n_predict key in the generation_settings (#8994) Riceball LEE 2024-08-15 15:28:05 +08:00
  • 4af8420afb
    common : remove duplicate function llama_should_add_bos_token (#8778) Zhenwei Jin 2024-08-15 15:23:23 +08:00
  • 6bda7ce6c3
    llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish (#8850) Esko Toivonen 2024-08-15 10:17:12 +03:00
  • d5492f0525
    ci : disable bench workflow (#9010) Georgi Gerganov 2024-08-15 10:11:11 +03:00
  • 234b30676a
    server : init stop and error fields of the result struct (#9026) Jiří Podivín 2024-08-15 08:21:57 +02:00
  • 5fd89a70ea
    Vulkan Optimizations and Fixes (#8959) 0cc4m 2024-08-14 18:32:53 +02:00
  • 98a532d474
    server : fix segfault on long system prompt (#8987) compilade 2024-08-14 02:51:02 -04:00
  • 43bdd3ce18
    cmake : remove unused option GGML_CURL (#9011) Georgi Gerganov 2024-08-14 09:14:49 +03:00
  • 06943a69f6
    ggml : move rope type enum to ggml.h (#8949) Daniel Bevenius 2024-08-13 21:13:15 +02:00
  • 828d6ff7d7
    export-lora : throw error if lora is quantized (#9002) Xuan Son Nguyen 2024-08-13 11:41:14 +02:00
  • fc4ca27b25
    ci : fix github workflow vulnerable to script injection (#9008) Diogo Teles Sant'Anna 2024-08-12 13:28:23 -03:00
  • 1f67436c5e
    ci : enable RPC in all of the released builds (#9006) Radoslav Gerganov 2024-08-12 19:17:03 +03:00
  • 0fd93cdef5
    llama : model-based max number of graph nodes calculation (#8970) Nico Bosshard 2024-08-12 17:13:59 +02:00
  • 84eb2f4fad
    docs: introduce gpustack and gguf-parser (#8873) Frank Mai 2024-08-12 20:45:50 +08:00
  • 1262e7ed13
    grammar-parser : fix possible null-deref (#9004) DavidKorczynski 2024-08-12 13:36:41 +01:00
  • df5478fbea
    ggml: fix div-by-zero (#9003) DavidKorczynski 2024-08-12 13:21:41 +01:00
  • 2589292cde
    Fix a spelling mistake (#9001) Liu Jia 2024-08-12 17:46:03 +08:00
  • d3ae0ee8d7
    py : fix requirements check '==' -> '~=' (#8982) Georgi Gerganov 2024-08-12 11:02:01 +03:00
  • 5ef07e25ac
    server : handle models with missing EOS token (#8997) Georgi Gerganov 2024-08-12 10:21:50 +03:00
  • 4134999e01
    gguf-py : Numpy dequantization for most types (#8939) compilade 2024-08-11 14:45:41 -04:00
  • 8cd1bcfd3f
    flake.lock: Update (#8979) Georgi Gerganov 2024-08-11 16:58:58 +03:00
  • a21c6fd450
    update guide (#8909) Neo Zhang 2024-08-11 16:37:43 +08:00
  • 33309f661a
    llama : check all graph nodes when searching for result_embd_pooled (#8956) fairydreaming 2024-08-11 10:35:26 +02:00
  • 7c5bfd57f8
    Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943) Markus Tavenrath 2024-08-11 10:09:09 +02:00
  • 6e02327e8b
    metal : fix uninitialized abort_callback (#8968) slaren 2024-08-10 15:42:10 +02:00
  • 7eb23840ed
    llama : default n_swa for phi-3 (#8931) Xuan Son Nguyen 2024-08-10 13:04:40 +02:00
  • 7c3f55c100
    Add support for encoder-only T5 models (#8900) fairydreaming 2024-08-10 11:43:26 +02:00
  • 911b437f22
    gguf-py : fix double call to add_architecture() (#8952) Matteo Mortari 2024-08-10 07:58:49 +02:00
  • b72942fac9
    Merge commit from fork Georgi Gerganov 2024-08-09 23:03:21 +03:00
  • 6afd1a99dc
    llama : add support for lora adapters in T5 model (#8938) fairydreaming 2024-08-09 18:53:09 +02:00
  • 272e3bd95e
    make : fix llava obj file race (#8946) Georgi Gerganov 2024-08-09 18:24:30 +03:00
  • 45a55b91aa
    llama : better replace_all (cont) (#8926) Georgi Gerganov 2024-08-09 18:23:52 +03:00
  • 3071c0a5f2
    llava : support MiniCPM-V-2.5 (#7599) tc-mb 2024-08-09 18:33:53 +08:00
  • 4305b57c80
    sync : ggml Georgi Gerganov 2024-08-09 10:03:48 +03:00
  • 70c0ea3560
    whisper : use vulkan as gpu backend when available (whisper/2302) Matt Stephenson 2024-07-16 03:21:09 -04:00
  • 5b2c04f492
    embedding : add --pooling option to README.md [no ci] (#8934) Daniel Bevenius 2024-08-09 08:33:30 +02:00
  • 6f6496bb09
    llama : fix typo in llama_tensor_get_type comment [no ci] (#8937) Daniel Bevenius 2024-08-09 08:32:23 +02:00
  • daef3ab233
    server : add one level list nesting for embeddings (#8936) Mathieu Geli 2024-08-09 08:32:02 +02:00
  • 345a686d82
    llama : reduce useless copies when saving session (#8916) compilade 2024-08-08 23:54:00 -04:00
  • 3a14e00366
    gguf-py : simplify support for quant types (#8838) compilade 2024-08-08 13:33:09 -04:00
  • afd27f01fe
    scripts : sync cann files (#0) Georgi Gerganov 2024-08-08 14:56:52 +03:00
  • 366d486c16
    scripts : fix sync filenames (#0) Georgi Gerganov 2024-08-08 14:40:12 +03:00
  • e44a561ab0
    sync : ggml Georgi Gerganov 2024-08-08 13:19:47 +03:00
  • f93d49ab1e
    ggml : ignore more msvc warnings (ggml/906) Borislav Stanimirov 2024-08-07 10:00:56 +03:00
  • 5b33ea1ee7
    metal : fix struct name (ggml/912) Georgi Gerganov 2024-08-07 09:57:00 +03:00
  • 85fca8deb6
    metal : add abort callback (ggml/905) Conrad Kramer 2024-08-07 02:55:49 -04:00
  • ebd541a570
    make : clean llamafile objects (#8923) Pablo Duboue 2024-08-08 04:44:51 -04:00
  • 15fa07a5c5
    make : use C compiler to build metal embed object (#8899) slaren 2024-08-07 18:24:05 +02:00
  • be55695eff
    ggml-backend : fix async copy from CPU (#8897) slaren 2024-08-07 13:29:02 +02:00
  • 0478174d59
    [SYCL] Updated SYCL device filtering (#8901) Ouadie EL FAROUKI 2024-08-07 11:25:36 +01:00
  • a8dbc6f753
    CUDA/HIP: fix tests/test-backend-ops (#8896) Johannes Gäßler 2024-08-07 09:07:52 +02:00
  • 506122d854
    llama-bench : add support for getting cpu info on Windows (#8824) Zhenwei Jin 2024-08-07 09:01:06 +08:00
  • 725e3d9437
    quantize : update usage comment in quantize.cpp (#8889) Daniel Bevenius 2024-08-07 01:43:00 +02:00
  • 31958546c3
    typo correction (#8891) Nexes the Old 2024-08-07 01:41:54 +02:00
  • 1e6f6554aa
    server : add lora hotswap endpoint (WIP) (#8857) Xuan Son Nguyen 2024-08-06 17:33:39 +02:00
  • 641f5dd2a6
    CUDA: fix padding logic for FP16/FP32 (#8884) Johannes Gäßler 2024-08-06 17:13:55 +02:00
  • 5f4dcb1e60
    simple : update name of executable to llama-simple (#8885) Daniel Bevenius 2024-08-06 16:44:35 +02:00
  • db20f50cf4
    cmake : Link vulkan-shaders-gen with pthreads (#8835) Jaeden Amero 2024-08-06 17:21:47 +04:00
  • efda90c93a
    [Vulkan] Fix compilation of vulkan-shaders-gen on w64devkit after e31a4f6 (#8880) MaggotHATE 2024-08-06 16:32:03 +05:00
  • 0bf16de07b
    contributing : add note about write access Georgi Gerganov 2024-08-06 11:48:01 +03:00
  • 2d5dd7bb3f
    ggml : add epsilon as a parameter for group_norm (#8818) Molly Sophia 2024-08-06 15:26:46 +08:00
  • cdd1889de6
    convert : add support for XLMRoberta embedding models (#8658) Douglas Hanley 2024-08-06 02:20:54 -05:00
  • c21a896405
    [CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871) Mengqing Cao 2024-08-06 12:42:42 +08:00
  • d4ff847153
    [SYCL] correct cmd name (#8877) Neo Zhang 2024-08-06 09:09:12 +08:00
  • 0a4ce78681
    common : Changed tuple to struct (TODO fix) (#8823) Liu Jia 2024-08-06 00:14:10 +08:00
  • bc0f887e15
    cann: fix buffer_num and runtime speed slowly error (#8865) wangshuai09 2024-08-05 21:10:37 +08:00
  • b42978e7e4
    readme : add ramalama to the availables UI (#8811) Eric Curtin 2024-08-05 13:45:01 +01:00
  • b9dfc25ca3
    ggml : fix overflows in elu function (#8866) Justine Tunney 2024-08-05 05:43:40 -07:00
  • 1ef14b3007
    py: Add more authorship metadata from model card (#8810) Brian 2024-08-05 21:15:28 +10:00
  • d3f0c7166a
    Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858) fairydreaming 2024-08-05 09:38:01 +02:00
  • e31a4f6797
    cmake: fix paths for vulkan shaders compilation on Windows (#8573) stduhpf 2024-08-05 08:18:27 +02:00
  • 400ae6f65f
    readme : update model list (#8851) BarfingLemurs 2024-08-05 01:54:10 -04:00
  • f1ea5146d7
    llama : better replace_all (#8852) Georgi Gerganov 2024-08-05 08:53:39 +03:00
  • 064cdc265f
    vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855) 0cc4m 2024-08-05 07:52:55 +02:00
  • 5587e57a76 sync : ggml Georgi Gerganov 2024-08-04 19:13:25 +03:00
  • a3738b2fa7 vulkan : implement Stable Diffusion operators (ggml/904) 0cc4m 2024-08-04 17:28:08 +02:00
  • 655858ace0 ggml : move c parameter comment to ggml_rope_ext (ggml/901) Daniel Bevenius 2024-07-29 15:06:06 +02:00
  • c02b0a8a4d
    cann: support q4_0 model (#8822) wangshuai09 2024-08-05 12:22:30 +08:00
  • 0d6fb52be0
    Install curl in runtime layer (#8693) Brandon Squizzato 2024-08-04 14:17:16 -04:00
  • 978ba3d83d
    Server: Don't ignore llama.cpp params (#8754) ardfork 2024-08-04 18:16:23 +00:00
  • ecf6b7f23e
    batched-bench : handle empty -npl (#8839) Brian Cunnie 2024-08-04 03:55:03 -07:00
  • 01aae2b497 baby-llama : remove duplicate vector include Daniel Bevenius 2024-08-03 15:07:47 +02:00
  • 4b77ea95f5
    flake.lock: Update (#8847) Georgi Gerganov 2024-08-04 05:53:20 +03:00
  • 76614f352e
    ggml : reading the runtime sve config of the cpu (#8709) jdomke 2024-08-04 01:34:41 +09:00
  • b72c20b85c
    Fix conversion of unnormalized BF16->BF16 weights (#7843) Sigbjørn Skjæret 2024-08-02 21:11:39 +02:00
  • e09a800f9a
    cann: Fix ggml_cann_im2col for 1D im2col (#8819) Mengqing Cao 2024-08-02 16:50:53 +08:00
  • 0fbbd88458
    [SYCL] Fixing wrong VDR iq4nl value (#8812) Ouadie EL FAROUKI 2024-08-02 01:55:17 +01:00
  • afbb4c1322
    ggml-cuda: Adding support for unified memory (#8035) matteo 2024-08-01 23:28:28 +02:00
  • b7a08fd5e0
    Build: Only include execinfo.h on linux systems that support it (#8783) Alex O'Connell 2024-08-01 12:53:46 -04:00
  • 7a11eb3a26
    cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) slaren 2024-08-01 15:26:22 +02:00
  • c8a0090922
    cann: support q8_0 for Ascend backend (#8805) wangshuai09 2024-08-01 10:39:05 +08:00
  • afbbcf3c04
    server : update llama-server embedding flag documentation (#8779) Igor Okulist 2024-07-31 18:59:09 -05:00
  • ed9d2854c9
    Build: Fix potential race condition (#8781) Clint Herron 2024-07-31 15:51:06 -04:00
  • 398ede5efe
    Adding Gemma 2 2B configs (#8784) pculliton 2024-07-31 11:12:10 -04:00
  • 44d28ddd5c
    cmake : fix use of external ggml (#8787) Borislav Stanimirov 2024-07-31 16:40:08 +03:00
  • 268c566006
    nix: cuda: rely on propagatedBuildInputs (#8772) Someone 2024-07-30 23:35:30 +03:00