Commit Graph

  • d5ac8cf2f2
    ggml : add metal backend registry / device (#9713) Georgi Gerganov 2024-10-07 18:27:51 +03:00
  • 96b6912103
    metal : single allocation of encode_async block (#9747) Paul Tsochantaris 2024-10-07 13:26:31 +01:00
  • d5cb86844f
    contrib : simplify + minor edits [no ci] Georgi Gerganov 2024-10-06 14:15:27 +03:00
  • f4b2dcdf49
    readme : fix typo [no ci] Georgi Gerganov 2024-10-06 13:49:41 +03:00
  • b6d6c5289f
    sync : llama.cpp Georgi Gerganov 2024-10-06 12:53:28 +03:00
  • b0915d5b51
    vulkan : retry allocation with fallback flags (whisper/2451) SRHMorris 2024-10-06 08:34:20 +01:00
  • 8c475b97b8
    rerank : use [SEP] token instead of [BOS] (#9737) Georgi Gerganov 2024-10-05 15:55:04 +03:00
  • 58b16695e1
    sync : ggml Georgi Gerganov 2024-10-05 15:53:49 +03:00
  • 905f5485b2
    metal : zero-init buffer contexts (whisper/0) Georgi Gerganov 2024-10-05 14:33:54 +03:00
  • 71967c2a6d
    Add Llama Assistant (#9744) Viet-Anh NGUYEN (Andrew) 2024-10-05 01:29:35 +07:00
  • 17880771ad
    sync : ggml Georgi Gerganov 2024-10-04 18:50:25 +03:00
  • 55951c018d
    ggml : fix typo in example usage ggml_gallocr_new (ggml/984) Daniel Bevenius 2024-10-04 15:46:18 +02:00
  • ff565769f2
    ggml : fixes after sync (ggml/983) Diego Devesa 2024-10-04 08:41:40 +02:00
  • f3fdcfaa79
    ci : fine-grant permission (#9710) Xuan Son Nguyen 2024-10-04 11:47:19 +02:00
  • 133c7b46b3
    Fixed RNG seed docs (#9723) Daniel Kleine 2024-10-04 10:54:44 +02:00
  • d5ed2b929d
    metal : remove abort (skip) (ggml/0) Georgi Gerganov 2024-10-03 21:18:19 +03:00
  • 1bb8a64ebf
    sync : ggml Georgi Gerganov 2024-10-03 21:17:49 +03:00
  • fabdc3bda3
    ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) Johannes Gäßler 2024-10-03 17:29:59 +02:00
  • eee39bdc96
    ggml: refactor cross entropy loss CPU impl. (ggml/976) Johannes Gäßler 2024-10-02 15:32:39 +02:00
  • 5d5ab1e5cc
    metal : fix compute pass descriptor autorelease crash (#9718) Jack Mousseau 2024-10-03 11:01:46 -07:00
  • a7ad553513
    ggml-backend : add device description to CPU backend (#9720) Diego Devesa 2024-10-03 17:39:18 +02:00
  • d6fe7abf04
    ggml: unify backend logging mechanism (#9709) bandoti 2024-10-03 12:39:03 -03:00
  • e3c355ba65
    convert : handle tokenizer merges format from transformers 4.45 (#9696) compilade 2024-10-03 10:22:15 -04:00
  • 841713e1e4
    rpc : enable vulkan (#9714) Radoslav Gerganov 2024-10-03 13:00:52 +03:00
  • 5639971466
    Fixed dequant precision issues in Q4_1 and Q5_1 (#9711) Ouadie EL FAROUKI 2024-10-03 07:50:44 +01:00
  • c83ad6d01e
    ggml-backend : add device and backend reg interfaces (#9707) Diego Devesa 2024-10-03 01:49:47 +02:00
  • a39ab216aa
    llama : reduce compile time and binary size (#9712) Xuan Son Nguyen 2024-10-02 15:49:55 +02:00
  • f536f4c439
    [SYCL] Initial cmake support of SYCL for AMD GPUs (#9658) Alberto Cabrera Pérez 2024-10-02 13:57:18 +01:00
  • 00b7317e63
    vulkan : do not use tensor->extra (#9407) Radoslav Gerganov 2024-10-02 13:49:16 +03:00
  • 76b37d1541
    gguf-split : improve --split and --merge logic (#9619) Zhenwei Jin 2024-10-02 15:21:57 +08:00
  • 148844fe97
    examples : remove benchmark (#9704) Georgi Gerganov 2024-10-02 10:14:44 +03:00
  • 3f1ae2e32c
    Update README.md (#9591) Paweł Wodnicki 2024-10-01 12:18:46 -05:00
  • f1b8c42711
    sync : ggml Georgi Gerganov 2024-10-01 16:09:42 +03:00
  • e98c1c188e
    test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974) Johannes Gäßler 2024-09-30 09:55:23 +02:00
  • cb00020504
    vulkan : mul_mat: fix UB with small warps (ggml/952) Salvatore Mesoraca 2024-09-30 09:14:09 +02:00
  • 6c5322481a
    ggml : fix ggml_cast (ggml/973) Borislav Stanimirov 2024-09-30 10:11:41 +03:00
  • 7254cdf7e8
    ggml: fix gradient allocation logic (ggml/966) Johannes Gäßler 2024-09-29 23:18:02 +02:00
  • cad341d889
    metal : reduce command encoding overhead (#9698) Georgi Gerganov 2024-10-01 16:00:25 +03:00
  • a90484c6d9
    llama : print correct model type for Llama 3.2 1B and 3B Georgi Gerganov 2024-10-01 11:42:01 +03:00
  • 1927378bcc
    convert : refactor rope_freqs generation (#9396) compilade 2024-10-01 02:31:36 -04:00
  • 6f1d9d71f4
    Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641) serhii-nakon 2024-09-30 21:57:12 +03:00
  • 511636df0c
    ci : reduce severity of unused Pyright ignore comments (#9697) compilade 2024-09-30 14:13:16 -04:00
  • 08a43d05b6
    py : update transfomers version (#9694) vb 2024-09-30 17:03:47 +02:00
  • ace4f4be37
    flake.lock: Update (#9680) Georgi Gerganov 2024-09-30 17:48:49 +03:00
  • 8277a817f1
    console : utf-8 fix for windows stdin (#9690) Ruchira Hasaranga 2024-09-30 13:53:42 +05:30
  • c919d5db39
    ggml : define missing HWCAP flags (#9684) Georgi Gerganov 2024-09-29 21:18:23 +03:00
  • d0b1d663e4
    sync : ggml Georgi Gerganov 2024-09-29 21:16:07 +03:00
  • aaa4099925
    CUDA: remove bad assert (ggml/972) Johannes Gäßler 2024-09-29 19:56:17 +02:00
  • 641002fba8
    vulkan : multithread pipeline creation (ggml/963) Jeff Bolz 2024-09-29 11:50:17 -05:00
  • 0de8b203f1
    vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961) Jeff Bolz 2024-09-27 02:58:01 -05:00
  • 544f409b4b
    vulkan : argsort barriers must be under uniform control flow (ggml/951) Salvatore Mesoraca 2024-09-26 08:59:42 +02:00
  • 6084bfb261
    ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969) Georgi Gerganov 2024-09-24 13:23:59 +03:00
  • faac0bae26
    common : ensure llama_batch size does not exceed max size (#9668) matiaslin 2024-09-29 05:25:00 -07:00
  • f99d3f8367
    py : add model class for Chameleon conversion (#9683) nopperl 2024-09-29 12:02:06 +00:00
  • 589b48d41e
    contrib : add Resources section (#9675) Georgi Gerganov 2024-09-29 14:38:18 +03:00
  • f4d2b8846a
    llama : add reranking support (#9510) Georgi Gerganov 2024-09-28 17:42:03 +03:00
  • 1b2f992cd2
    test-backend-ops : use flops for some performance tests (#9657) slaren 2024-09-28 14:32:46 +02:00
  • 739842703e
    llama : add comment about thread-safety [no ci] (#9449) Georgi Gerganov 2024-09-28 15:13:21 +03:00
  • 6102037bbb
    vocab : refactor tokenizer to reduce init overhead (#9449) Zhenwei Jin 2024-09-28 20:10:58 +08:00
  • 9a913110cf
    llama : add support for Chameleon (#8543) nopperl 2024-09-28 12:08:43 +00:00
  • 43bcdd9703
    readme : add tool (#9655) Aarni Koskela 2024-09-28 15:07:14 +03:00
  • 6a0f779484
    ggml : add run-time detection of neon, i8mm and sve (#9331) Dan Johansson 2024-09-28 14:06:16 +02:00
  • 89f9944981
    Enable use to the rebar feature to upload buffers to the device. (#9251) Markus Tavenrath 2024-09-28 12:05:05 +02:00
  • b5de3b74a5
    readme : update hot topics Georgi Gerganov 2024-09-27 20:57:51 +03:00
  • 44f59b4301
    cmake : add option for common library (#9661) Borislav Stanimirov 2024-09-27 10:42:06 +03:00
  • 95bc82fbc0
    [SYCL] add missed dll file in package (#9577) Neo Zhang Jianyu 2024-09-26 17:38:31 +08:00
  • 7691654c68
    mtgpu: enable VMM (#9597) R0CKSTAR 2024-09-26 09:27:40 +08:00
  • ea9c32be71
    ci : fix docker build number and tag name (#9638) Xuan Son Nguyen 2024-09-25 17:26:01 +02:00
  • 1e43630218
    ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (#9217) Charles Xu 2024-09-25 15:12:20 +02:00
  • afbbfaa537
    server : add more env vars, improve gen-docs (#9635) Xuan Son Nguyen 2024-09-25 14:05:13 +02:00
  • 3d6bf6919f
    llama : add IBM Granite MoE architecture (#9438) Gabe Goodhart 2024-09-25 01:06:52 -06:00
  • 904837e0cb
    cann: fix crash when llama-bench is running on multiple cann devices (#9627) Dou Xinpeng 2024-09-25 11:30:38 +08:00
  • 70392f1f81
    ggml : add AVX512DQ requirement for AVX512 builds (#9622) Eric Zhang 2024-09-24 16:03:21 +08:00
  • bb5f819975
    sync : ggml Georgi Gerganov 2024-09-24 11:01:18 +03:00
  • c038931615
    examples : adapt to ggml.h changes (ggml/0) Georgi Gerganov 2024-09-20 21:50:16 +03:00
  • 31ac5834fe
    llama : keep track of all EOG tokens in the vocab (#9609) Georgi Gerganov 2024-09-24 10:16:06 +03:00
  • cea1486ecf
    log : add CONT level for continuing previous log entry (#9610) Georgi Gerganov 2024-09-24 10:15:35 +03:00
  • 0aa15011e3
    server : add newline after chat example (#9616) StrangeBytesDev 2024-09-23 23:04:39 -07:00
  • b0f27361f3
    sampling : avoid expensive softmax during greedy sampling (#9605) Georgi Gerganov 2024-09-24 09:03:17 +03:00
  • c087b6f11d
    threads: fix msvc build without openmp (#9615) Max Krasnyansky 2024-09-23 21:18:48 -07:00
  • 116efee0ee
    cuda: add q8_0->f32 cpy operation (#9571) Ivan 2024-09-24 03:14:24 +03:00
  • 0b3bf966f4
    server : add --no-context-shift option (#9607) Xuan Son Nguyen 2024-09-23 22:23:54 +02:00
  • f0c7b5edf8
    threads: improve ggml_barrier scaling with large number of threads (#9598) Max Krasnyansky 2024-09-23 11:42:43 -07:00
  • 1d48e98e4f
    readme : add programmable prompt engine language CLI (#9599) Riceball LEE 2024-09-23 23:58:17 +08:00
  • f3979df762
    flake.lock: Update (#9586) Georgi Gerganov 2024-09-23 18:43:40 +03:00
  • 1e7b9299c6
    ggml : AVX512 gemm for Q4_0_8_8 (#9532) Srihari-mcw 2024-09-23 19:36:38 +05:30
  • 37f8c7b4c9
    perplexity : remove extra new lines after chunks (#9596) Georgi Gerganov 2024-09-23 11:28:02 +03:00
  • bf9c1013ac
    metal : use F32 prec for K*Q in vec FA (#9595) Georgi Gerganov 2024-09-23 11:27:47 +03:00
  • e62e9789cd
    Revert "[SYCL] fallback mmvq (#9088)" (#9579) Akarshan Biswas 2024-09-23 08:58:06 +05:30
  • c35e586ea5
    musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526) R0CKSTAR 2024-09-22 22:55:49 +08:00
  • 912c331d3d
    Fix merge error in #9454 (#9589) Molly Sophia 2024-09-22 21:26:50 +08:00
  • a5b57b08ce
    CUDA: enable Gemma FA for HIP/Pascal (#9581) Johannes Gäßler 2024-09-22 09:34:52 +02:00
  • ecd5d6b65b
    llama: remove redundant loop when constructing ubatch (#9574) Shankar 2024-09-21 19:30:34 -07:00
  • 2a63caaa69
    RWKV v6: RWKV_WKV op CUDA implementation (#9454) Molly Sophia 2024-09-22 10:29:12 +08:00
  • d09770cae7
    ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573) slaren 2024-09-21 14:24:23 +02:00
  • 41f477879f
    Update CUDA graph on scale change plus clear nodes/params (#9550) agray3 2024-09-21 01:41:07 +01:00
  • e948a7da7a
    CI: Provide prebuilt windows binary for hip (#9467) Huang Qi 2024-09-21 08:39:41 +08:00
  • 63351143b2
    quantize : improve type name parsing (#9570) slaren 2024-09-20 20:55:36 +02:00
  • d13edb17ed ggml : fix builds (#0) Georgi Gerganov 2024-09-20 20:12:52 +03:00
  • 27609c49b9 ggml : fix trailing whitespace (#0) Georgi Gerganov 2024-09-20 19:13:02 +03:00