Commit Graph

  • 9241c3a2ac
    Apply min_p to unsorted tokens (#5115) Johannes Gäßler 2024-01-28 09:59:49 +01:00
  • b2b2bf988c
    Tests for min_p, sampling queue (#5147) Johannes Gäßler 2024-01-28 09:35:14 +01:00
  • af4980bfed
    readme : add link to rust bindings (#5148) Marcus Dunn 2024-01-28 00:30:44 -08:00
  • f2e69d28c0
    llama : add support for Orion-14B (#5118) sharpHL 2024-01-28 16:00:30 +08:00
  • 39baaf55a1
    docker : add server-first container images (#5157) Kyle Mistele 2024-01-28 01:55:31 -06:00
  • 6db2b41a76
    llava : support for Yi-VL and fix for mobileVLM (#5093) John 2024-01-27 16:09:18 +01:00
  • 753eafed0e
    sync : ggml Georgi Gerganov 2024-01-27 16:59:20 +02:00
  • e976423005
    ggml : check ggml_add src1 type (ggml/708) Judd 2024-01-26 21:04:01 +08:00
  • 35a2ee9143
    Remove unused data and add fixes (#5154) Michael Klimenko 2024-01-27 15:25:55 +01:00
  • ec903c0341
    server : add self-extend support (#5104) Maximilian Winter 2024-01-27 14:38:05 +01:00
  • a1d6df129b
    Add OpenCL add kernel (#5151) 0cc4m 2024-01-26 23:07:32 +01:00
  • bbe7c56c99
    cmake : pass CPU architecture flags to nvcc (#5146) Jared Van Bortel 2024-01-26 15:34:06 -05:00
  • 62fead3ea0
    cuda : fix tensor size calculation for non-split buffer (#5145) slaren 2024-01-26 18:59:43 +01:00
  • 15b4538ff2
    ggml-alloc : add 10% margin to the buffer sizes (#5149) slaren 2024-01-26 18:18:26 +01:00
  • 7032f4f634
    ggml : update softmax n_task calculation (#5126) snadampal 2024-01-26 11:17:59 -06:00
  • 5f1925a8ce
    scripts : move run-with-preset.py from root to scripts folder Georgi Gerganov 2024-01-26 17:09:44 +02:00
  • 3b7c914de2
    tests : gitignore test-c.o Georgi Gerganov 2024-01-26 14:48:15 +02:00
  • 48c857aa10
    server : refactored the task processing logic (#5065) Xuan Son Nguyen 2024-01-26 13:42:20 +01:00
  • 413e7b0559
    ci : add model tests + script wrapper (#4586) crasm 2024-01-26 07:18:00 -05:00
  • 6dd3c28c9c
    metal : remove unused n_buffers and buffers (#5129) Paul Tsochantaris 2024-01-26 12:16:07 +00:00
  • 38b431de23
    gguf : fix "general.alignment" type in gguf_reader.py (#5136) Riceball LEE 2024-01-26 17:10:28 +08:00
  • aad0b01d73
    readme : update hot topics Georgi Gerganov 2024-01-26 10:52:33 +02:00
  • 1182cf4d4f
    Another bucket sort (#5109) Kawrakow 2024-01-26 09:14:39 +02:00
  • fe54033b69
    readme : add MobileVLM 1.7B/3B to the supported models list (#5107) XiaotaoChen 2024-01-26 04:14:32 +08:00
  • 5eaf9964fc
    llama : dynamic temperature sampling (#4972) l3utterfly 2024-01-26 05:06:22 +09:00
  • d292f4f204
    examples : make pydantic scripts pass mypy and support py3.8 (#5099) Jared Van Bortel 2024-01-25 14:51:24 -05:00
  • 256d1bb0dd
    android : use release cmake build type by default (#5123) Valentin Konovalov 2024-01-25 12:05:51 -05:00
  • faa3526a1e
    Fix Q3_K_XS for MoE models (#5113) Kawrakow 2024-01-25 17:58:53 +02:00
  • ddc5a5033f
    metal : show compile log messages Georgi Gerganov 2024-01-25 11:26:17 +02:00
  • cd4fddb29f
    cuda : fix 2-bit quants on amd hip (#5105) Engininja2 2024-01-24 16:18:15 -06:00
  • c9b316c78f nix-shell: use addToSearchPath Michael Hueschen 2024-01-22 16:44:10 -07:00
  • bf63d695b8 nix: add cc to devShell LD_LIBRARY_PATH Michael Hueschen 2024-01-22 03:17:05 -07:00
  • 1387ea2117
    llama : pre-allocate input tensors in a separate buffer (#5100) slaren 2024-01-24 12:48:14 +01:00
  • 26d607608d
    metal : disable support for MUL_MAT F32 x F16 Georgi Gerganov 2024-01-23 15:50:56 +02:00
  • 44879ee885
    Additional KL-divergence statistics (#5081) Kawrakow 2024-01-23 15:17:20 +02:00
  • 9ecdd12e95
    CUDA: more info when no device code (#5088) Johannes Gäßler 2024-01-23 13:31:56 +01:00
  • 89758723c7
    minor : clean-up some warnings and style (#5094) Georgi Gerganov 2024-01-23 14:12:57 +02:00
  • 2bed4aa3f3
    devops : add intel oneapi dockerfile (#5068) Xuan Son Nguyen 2024-01-23 08:11:39 +01:00
  • 125d03a503
    llama.vim : added api key support (#5090) Michael Coppola 2024-01-23 01:51:27 -05:00
  • 011e8ec577
    llama : fix not enough space in buffer with Qwen (#5086) slaren 2024-01-22 23:42:41 +01:00
  • 6f9939d119
    KL-divergence (#5076) Kawrakow 2024-01-22 16:10:14 +02:00
  • 780e24a22e
    ggml : parallelize FP32 conversion when using BLAS (#5045) Reinforce-II 2024-01-22 21:15:08 +08:00
  • 3ce7e8f8e7
    llava : MobileVLM support (#4954) XiaotaoChen 2024-01-22 21:09:35 +08:00
  • b2d80e105a flake.nix: add a comment about flakes vs nix Someone Serge 2024-01-21 03:41:37 +00:00
  • 28603cd283 nix: add a comment on the many nixpkgs-with-cuda instances Someone Serge 2024-01-21 03:29:38 +00:00
  • 5e97ec91ae nix: add a comment about makeScope Someone Serge 2024-01-21 03:15:13 +00:00
  • 7251870780 nix: refactor the cleanSource rules Someone Serge 2024-01-13 17:45:01 +00:00
  • fe8b3c0d4b workflows: nix-ci: drop the redundant "paths" filter Someone Serge 2024-01-13 17:38:32 +00:00
  • f4dd059259 workflows: nix-build-aarch64: rate limit Someone Serge 2024-01-13 17:16:54 +00:00
  • f7276f7500 workflows: nix-ci: rebuild on flake.lock updates Someone Serge 2024-01-13 17:10:19 +00:00
  • 15bceec2d7
    imatrix : keep intermediate imatrix results (#5077) Kawrakow 2024-01-22 14:18:43 +02:00
  • d6bd4d46dd
    llama : support StableLM 2 1.6B (#5052) compilade 2024-01-22 06:21:52 -05:00
  • 152d9d05e0
    finetune : print sample-start/include-sample-start (#5072) Daniel Bevenius 2024-01-22 12:11:01 +01:00
  • 66d575c45c
    llama : add Q3_K_XS (#5060) Kawrakow 2024-01-22 12:43:33 +02:00
  • 57744932c6
    ci : fix Windows CI by updating Intel SDE version (#5053) bobqianic 2024-01-22 08:55:05 +00:00
  • 3466c6ebcf
    llama : add more qwen2 models (#5071) Shijie 2024-01-22 15:33:19 +08:00
  • 504dc37be8
    Revert LLAMA_NATIVE to OFF in flake.nix (#5066) iSma 2024-01-21 22:37:13 +01:00
  • 05490fad7f
    add safetensors support to convert-lora-to-ggml.py (#5062) kuronekosaiko 2024-01-22 00:28:14 +08:00
  • 6c5629d4d2
    add #include <string> to unicode.h (#5051) bobqianic 2024-01-21 15:17:35 +00:00
  • 7dcbe39d36
    Add ability to evauate multiple choice tasks (#5047) Kawrakow 2024-01-21 14:42:44 +02:00
  • 726c0fa9a2
    Slightly faster imatrix (#5050) Kawrakow 2024-01-21 08:01:20 +02:00
  • 942c0107a7
    flake.lock: Update (#5054) Georgi Gerganov 2024-01-21 05:17:27 +02:00
  • b43ebde3b0
    convert : partially revert PR #4818 (#5041) Jared Van Bortel 2024-01-20 18:14:18 -05:00
  • 97c1549808
    perplexity : fix MSVC build after #5020 (#5043) Jared Van Bortel 2024-01-20 10:08:08 -05:00
  • 6df465a91d
    llama : run all KQV ops on the CPU with no KV offload (#5049) slaren 2024-01-20 16:05:49 +01:00
  • 77bc1bbd05
    cmake : add support for ccache (#5002) Herman Semenov 2024-01-20 08:11:31 +00:00
  • 48e2b13372
    Add a dart/flutter binding to README.md (#4882) adel boussaken 2024-01-20 09:05:43 +01:00
  • cca894f16a
    cuda : fix compile error in jetson platform (#4975) Kylin 2024-01-20 15:01:46 +08:00
  • 381ee19572
    finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033) Uzo Nweke 2024-01-19 13:20:50 -05:00
  • a5cacb22b2
    imatrix : add README.md Georgi Gerganov 2024-01-19 15:24:47 +02:00
  • 9b75cb2b3c
    llama : support upcoming Qwen2 (#5037) Shijie 2024-01-19 19:53:13 +08:00
  • de9a147df1 py : fix flake8 lint Georgi Gerganov 2024-01-19 13:52:22 +02:00
  • 7051aacfac
    winogrande: evaluate log-probs in parallel (#5036) Kawrakow 2024-01-19 11:39:11 +02:00
  • 2b3b999cac
    llama : add CodeShell support (#5016) chiranko 2024-01-19 17:07:27 +08:00
  • 993fba8180
    perplexity: avoid unnecessary alloocations and logit copies (#5035) Kawrakow 2024-01-19 11:02:39 +02:00
  • 8b20858e5e
    perplexity : faster Winogrande via batching (#5024) Georgi Gerganov 2024-01-19 10:45:06 +02:00
  • 57e2a7a52a
    llama : fix falcon arch for tied output embeddings (#4978) John 2024-01-18 23:12:15 +01:00
  • 9b6ea4263a
    cmake : add ggml public headers (#5011) Georgi Gerganov 2024-01-18 23:36:07 +02:00
  • 821f0a271e
    server : defer tasks when "slot unavailable" (#5018) Xuan Son Nguyen 2024-01-18 21:33:05 +01:00
  • 96d7f56d29
    llama : fix mlock with no-mmap with Metal (#5025) slaren 2024-01-18 21:12:15 +01:00
  • 2d5419d08a
    imatrix : fix assert for src0 non-cont check Georgi Gerganov 2024-01-18 21:45:51 +02:00
  • d391ae9b49
    perplexity : fix winogrande N tasks option Georgi Gerganov 2024-01-18 20:49:00 +02:00
  • e9240cdfa0
    scripts : add get-winogrande.sh Georgi Gerganov 2024-01-18 20:45:39 +02:00
  • b46757735d
    convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019) David Sommers 2024-01-18 12:20:59 -05:00
  • 3e945cc1e9
    HellaSwag: speed up by parallelizing log-prob evaluation (#5020) Kawrakow 2024-01-18 19:18:21 +02:00
  • ad19812cda
    perplexity : faster HellaSwag via batching (#5017) Georgi Gerganov 2024-01-18 15:33:01 +02:00
  • 682986a08e
    Add Winogrande evaluation (#5015) Kawrakow 2024-01-18 13:46:27 +02:00
  • dcad445d0c
    scritps : add helper script to get hellaswag data in txt format Georgi Gerganov 2024-01-18 11:44:49 +02:00
  • 1e605f4102
    metal : fix memory leak, dangling pointer and unused autorel (#5007) Paul Tsochantaris 2024-01-18 08:47:24 +00:00
  • 6b6916b215
    sync : ggml Georgi Gerganov 2024-01-17 20:54:50 +02:00
  • 38566680cd
    ggml : add IQ2 to test-backend-ops + refactoring (#4990) Georgi Gerganov 2024-01-17 18:54:56 +02:00
  • ba69bbc84c
    imatrix : offload to GPU support (#4957) Georgi Gerganov 2024-01-17 18:46:30 +02:00
  • 44a1a4a41a
    backend : add eval callback (#4935) Georgi Gerganov 2024-01-17 18:39:41 +02:00
  • c918fe8dca
    metal : create autorelease pool during library build (#4970) Georgi Gerganov 2024-01-17 18:38:39 +02:00
  • 0f83e727af
    py : fix whitespace Georgi Gerganov 2024-01-17 18:37:36 +02:00
  • 4f4bf35f46
    py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971) Georgi Gerganov 2024-01-17 15:45:03 +02:00
  • 2b3a665d39
    llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996) Kawrakow 2024-01-17 12:36:37 +02:00
  • 7563293665
    metal : remove unnecessary nil check (#4986) Paul Tsochantaris 2024-01-17 08:07:24 +00:00
  • f46c0c1b0e
    llama : fix copy/paste error in llama_sampling_params comment (#4994) David Renshaw 2024-01-17 02:17:50 -05:00
  • 5c99960901
    py : remove unnecessary hasattr (#4903) Georgi Gerganov 2024-01-16 20:59:31 +02:00