Commit Graph

  • a9a8c5de3d
    readme : add link to SOTA models Georgi Gerganov 2024-01-08 20:25:17 +02:00
  • dd5ae06405
    SOTA 2-bit quants (#4773) Kawrakow 2024-01-08 16:02:32 +01:00
  • 668b31fc7d
    swift : exclude ggml-metal.metal from the package (#4822) Georgi Gerganov 2024-01-08 16:40:51 +02:00
  • 42ea63c5a3
    llama.swiftui : update readme Georgi Gerganov 2024-01-08 15:57:36 +02:00
  • 52531fdff8
    main : add self-extend support (#4815) Georgi Gerganov 2024-01-08 11:18:32 +02:00
  • b0034d93ce
    examples : add passkey test (#3856) Georgi Gerganov 2024-01-08 11:14:04 +02:00
  • b7e7982953
    readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814) Lars Grammel 2024-01-07 21:24:11 +01:00
  • 226460cc0d
    llama-bench : add no-kv-offload parameter (#4812) slaren 2024-01-07 17:59:01 +01:00
  • d5a410e855
    CUDA: fixed redundant value dequantization (#4809) Johannes Gäßler 2024-01-07 17:24:08 +01:00
  • 9dede37d81
    llama : remove unused vars (#4796) Georgi Gerganov 2024-01-07 14:29:36 +02:00
  • 3c36213df8
    llama : remove redundant GQA check (#4796) Georgi Gerganov 2024-01-07 11:21:53 +02:00
  • 72d8407b36
    llama.swiftui : use llama.cpp as SPM package (#4804) Alex Azarov 2024-01-07 09:20:50 +01:00
  • d117d4dc5d
    llama : print tensor meta for debugging Georgi Gerganov 2024-01-07 09:50:31 +02:00
  • 3418c03ecc
    llama.swiftui : add visionOS target (#4805) Alex Azarov 2024-01-07 08:46:55 +01:00
  • 63ee677efd
    ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787) Konstantin Zhuravlyov 2024-01-07 01:52:42 -05:00
  • 67984921a7
    server : fix n_predict check (#4798) Georgi Gerganov 2024-01-07 08:45:26 +02:00
  • c75ca5d96f
    llama.swiftui : use correct pointer for llama_token_eos (#4797) Daniel Illescas Romero 2024-01-06 16:12:59 +01:00
  • 96e80dabc6
    examples : improve base-translate.sh script (#4783) Georgi Gerganov 2024-01-06 11:40:24 +02:00
  • eec22a1c63
    cmake : check for openblas64 (#4134) a-n-n-a-l-e-e 2024-01-05 08:04:40 -08:00
  • be36bb946a
    flake.nix : fix typo (#4700) Ikko Eltociear Ashimine 2024-01-06 01:02:44 +09:00
  • 91d38876df metal : switch back to default.metallib (ggml/681) Georgi Gerganov 2024-01-05 16:30:52 +02:00
  • d061bf9405 ggml : fix q2_k bpw in comments (ggml/680) Georgi Gerganov 2024-01-05 15:36:04 +02:00
  • 1bf681f90e ggml : add error handling to graph_compute (whisper/1714) Finn Voorhees 2024-01-03 08:39:43 -05:00
  • c1d7cb28d3
    ggml : do not sched_yield when calling BLAS (#4761) Georgi Gerganov 2024-01-05 15:18:21 +02:00
  • 3681f22443
    examples : add few-shot translation example (#4783) Georgi Gerganov 2024-01-05 15:11:10 +02:00
  • b3a7c20b5c
    finetune : remove unused includes (#4756) Daniel Bevenius 2024-01-04 20:45:37 +01:00
  • 012cf349ae
    server : send token probs for "stream == false" (#4714) Georgi Gerganov 2024-01-04 19:56:33 +02:00
  • a91928014f
    Print backend name on test-backend-ops failure (#4751) Johannes Gäßler 2024-01-04 09:43:23 +01:00
  • 3c0b585561
    llama.swiftui : support loading custom model from file picker (#4767) singularity 2024-01-04 16:22:38 +08:00
  • e5804313a1
    server : fix options in README.md (#4765) Michael Coppola 2024-01-04 03:17:09 -05:00
  • dc891b7f7a
    ggml : include stdlib.h before intrin.h (#4736) Georgi Gerganov 2024-01-04 10:12:26 +02:00
  • 46cea79e1f
    llama.swiftui : fix build of ggml.metallib (#4754) singularity 2024-01-04 15:58:16 +08:00
  • cb1e2818e0
    train : fix typo in overlapping-samples help msg (#4758) Daniel Bevenius 2024-01-03 18:53:40 +01:00
  • ece9a45e8f
    swift : update Package.swift to use ggml as dependency (#4691) Ashraful Islam 2024-01-03 11:30:02 -06:00
  • 7bed7eba35 cuda : simplify expression Georgi Gerganov 2024-01-03 14:18:46 +02:00
  • d55356d3ba cuda : mark I16 and I32 ops as unsupported Georgi Gerganov 2024-01-03 13:01:44 +02:00
  • 75e3fd8581 sync : ggml Georgi Gerganov 2024-01-03 11:37:44 +02:00
  • 289313716f metal : add kernel_get_rows_i32 Georgi Gerganov 2024-01-03 11:35:46 +02:00
  • ab62fc3e55 scripts : fix sync order + metal sed Georgi Gerganov 2024-01-03 11:25:54 +02:00
  • 5f66ebca9c ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) Guillaume Wenzek 2023-12-29 18:07:03 +01:00
  • f2eb19bd8b
    server : throw an error when slot unavailable (#4741) Justin Parker 2024-01-03 03:43:19 -05:00
  • f3f62f0d83
    metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725) Georgi Gerganov 2024-01-02 21:07:47 +02:00
  • 0ef3ca2ac6
    server : add token counts to html footer (#4738) Phil H 2024-01-02 15:48:49 +00:00
  • 540938f890
    llama : llama_model_desc print number of experts Georgi Gerganov 2024-01-02 16:26:45 +02:00
  • 0040d42eeb
    llama : replace all API facing int's with int32_t (#4577) Marcus Dunn 2024-01-02 06:15:16 -08:00
  • 83e633c27e
    llama : differentiate the KV dims in the attention (#4657) postmasters 2024-01-02 03:51:28 -08:00
  • 32866c5edd
    editorconfig : fix whitespace and indentation #4710 Georgi Gerganov 2024-01-02 13:28:15 +02:00
  • 5d7002d437
    server : add --override-kv parameter (#4710) minarchist 2024-01-02 04:38:15 -06:00
  • 26f3071d71
    py : re-enable mmap in convert hf (#4732) Nam D. Tran 2024-01-02 16:23:38 +07:00
  • 775ac8712a
    finetune: fix typo in README.md (#4733) Daniel Bevenius 2024-01-02 10:16:55 +01:00
  • 58ba655af0
    metal : enable shader debugging (cmake option) (#4705) Georgi Gerganov 2024-01-02 10:57:44 +02:00
  • edd1ab7bc3 flake.lock: update Someone Serge 2023-12-31 17:42:22 +00:00
  • 198ed7ebfc flake.nix: suggest the binary caches Someone Serge 2023-12-30 18:25:25 +00:00
  • d836174731 workflows: nix-ci: add a qemu job for jetsons Someone Serge 2023-12-30 18:01:07 +00:00
  • 06f2a5d190 workflows: nix-flakestry: drop tag filters Someone Serge 2023-12-30 17:36:08 +00:00
  • c5239944ba workflows: weekly nix flake update Someone Serge 2023-12-30 16:38:36 +00:00
  • 1e9ae54cf2 workflows: nix-ci: add a job for eval Someone Serge 2023-12-30 17:19:11 +00:00
  • 7adedecbe3 workflows: nix-ci: init; build flake outputs Someone Serge 2023-12-26 19:17:26 +00:00
  • 356ea17e0f flake.nix: expose checks Someone Serge 2023-12-29 16:21:50 +00:00
  • a5c088d8c6 flake.nix: rocm not yet supported on aarch64, so hide the output Someone Serge 2023-12-26 23:34:40 +00:00
  • 1e3900ebac flake.nix: expose full scope in legacyPackages Someone Serge 2023-12-29 16:15:37 +00:00
  • e39106c055
    ggml : add ggml_vdotq_s32 alias (#4715) Georgi Gerganov 2023-12-31 11:43:31 +02:00
  • 9fbda719de
    clip : refactor + bug fixes (#4696) Georgi Gerganov 2023-12-30 23:24:42 +02:00
  • 39d8bc71ed
    CUDA: fixed tensor cores not being used on RDNA3 (#4697) Johannes Gäßler 2023-12-30 13:52:01 +01:00
  • 24a447e20a
    ggml : add ggml_cpu_has_avx_vnni() (#4589) automaticcat 2023-12-30 15:07:48 +07:00
  • a20f3c7465
    CUDA: fix tensor core logic for Pascal and HIP (#4682) Johannes Gäßler 2023-12-29 23:12:53 +01:00
  • 0235b9b571
    clip : use ggml_backend_buffer_is_host (#4205) Georgi Gerganov 2023-12-29 18:53:34 +02:00
  • ce18d727a4
    clip : enable gpu backend (#4205) Steward Garcia 2023-12-29 11:52:15 -05:00
  • 91bb39cec7
    cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687) hydai 2023-12-30 00:31:19 +08:00
  • 04ac0607e9
    python : add check-requirements.sh and GitHub workflow (#4585) crasm 2023-12-29 09:50:29 -05:00
  • 68eccbdc5b
    flake.nix : rewrite (#4605) Philip Taron 2023-12-29 06:42:26 -08:00
  • 97bbca6e85
    cmake : fix ld warning duplicate libraries libllama.a (#4671) Cuong Trinh Manh 2023-12-29 21:39:15 +07:00
  • 4af4801566
    llava-cli : refactor to use sampling library (#4669) Justine Tunney 2023-12-29 06:38:38 -08:00
  • db49ff8ed7
    server : replace sleep with condition variables (#4673) Justine Tunney 2023-12-29 06:24:12 -08:00
  • 60f55e888c
    server : fix OpenAI server sampling w.r.t. penalty. (#4675) SakuraUmi 2023-12-29 22:22:44 +08:00
  • b93edd22f5
    server : allow to generate multimodal embeddings (#4681) Karthik Sethuraman 2023-12-29 06:22:10 -08:00
  • 82d6eab224
    main-cmake-pkg : fix build issue (#4665) andrijdavid 2023-12-29 15:18:20 +01:00
  • afd997ab60
    llama.swiftui : fix infinite loop, ouput timings, buff UI (#4674) Peter Sugihara 2023-12-29 05:58:56 -08:00
  • c8255f8a6b
    scripts : print list of sync commits Georgi Gerganov 2023-12-29 15:12:35 +02:00
  • 441f51dca0
    ci : build with CLBlast + ggml-opencl use GGML_API (whisper/1576) Tamotsu Takahashi 2023-12-29 19:23:27 +09:00
  • 38b3de4658
    sync : ggml Georgi Gerganov 2023-12-29 14:56:41 +02:00
  • afc8c19291
    ggml : fix some mul mat cases + add tests for src1 F16 (ggml/669) bssrdf 2023-12-29 03:32:31 -05:00
  • ca38b8d334
    scripts : do not sync commits from this repo Georgi Gerganov 2023-12-29 14:41:36 +02:00
  • 65e5f6dadb
    Fix OpenAI server sampling w.r.t. temp and seed (#4668) Justine Tunney 2023-12-28 11:20:00 -08:00
  • ea5497df5d
    gpt2 : Add gpt2 architecture integration (#4555) manikbhandari 2023-12-28 09:03:57 -05:00
  • f6793491b5
    llama : add AWQ for llama, llama2, mpt, and mistral models (#4593) Nam D. Tran 2023-12-27 22:39:45 +07:00
  • 879b690a9e
    finetune : fix output formatting in print_params (#4653) Daniel Bevenius 2023-12-27 15:16:55 +01:00
  • b47879b0dd
    scripts : add sync-ggml-am.sh Georgi Gerganov 2023-12-27 11:15:31 +02:00
  • 951010fa53
    ggml : fix dot product for ARM (#4630) Georgi Gerganov 2023-12-27 11:02:13 +02:00
  • f56d6077d0
    Add byte token type when tokenizer.model is not exists (#4641) wonjun Jang 2023-12-27 17:37:25 +09:00
  • dc68f0054c
    cuda : fix vmm pool with multi GPU (#4620) slaren 2023-12-26 21:23:59 +01:00
  • de8e496437
    Update comment for AdamW implementation reference. (#4604) WillCorticesAI 2023-12-26 05:42:08 -05:00
  • 77465dad48
    Fix new CUDA10 compilation errors (#4635) FantasyGmm 2023-12-26 18:38:36 +08:00
  • a206137f92
    Adding Emeltal reference to UI list (#4629) Paul Tsochantaris 2023-12-25 16:09:53 +00:00
  • b9f47952ff
    simplify bug issue template (#4623) slaren 2023-12-24 21:01:12 +01:00
  • 753be377b6
    llama : add PLaMo model (#3557) Shintarou Okada 2023-12-24 22:35:49 +09:00
  • 5bf3953d7e
    cuda : improve cuda pool efficiency using virtual memory (#4606) slaren 2023-12-24 14:34:22 +01:00
  • 708e179e85
    fallback to CPU buffer if host buffer alloc fails (#4610) slaren 2023-12-23 16:10:51 +01:00
  • 925e5584a0
    ci(docker): fix tags in "Build and push docker image (tagged)" (#4603) Samuel Maynard 2023-12-23 11:35:55 +02:00
  • 6123979952
    server : allow to specify custom prompt for penalty calculation (#3727) Alexey Parfenov 2023-12-23 09:31:49 +00:00