Commit Graph

  • dc020985b8
    Avoid unnecessarily disabling CUDA graphs (#7302) agray3 2024-05-15 14:44:49 +01:00
  • 344f9126cc
    ggml : tag ggml_tensor::backend as deprecated (#7290) slaren 2024-05-15 15:08:48 +02:00
  • 9a17ab914b
    Add missing " (#7303) AidanBeltonS 2024-05-15 13:26:30 +01:00
  • ea3b0590ee
    embedding : free the batch after execution (#7297) dm4 2024-05-15 20:01:12 +08:00
  • 29499bb593
    sync : ggml Georgi Gerganov 2024-05-15 13:23:41 +03:00
  • 48aa8fd1f2
    ggml : add ggml_upscale_ext (ggml/814) John Balis 2024-05-15 03:52:33 -05:00
  • 583fd6b000
    server bench: fix bench not waiting for model load (#7284) Johannes Gäßler 2024-05-15 08:44:16 +02:00
  • 9f773486ab
    script : sync ggml-rpc Georgi Gerganov 2024-05-14 19:14:38 +03:00
  • e8a7fd4fb0
    metal : support FA without mask + add asserts (#7278) Georgi Gerganov 2024-05-14 19:09:30 +03:00
  • a5e3fde857 sync : ggml Georgi Gerganov 2024-05-14 15:33:16 +03:00
  • f308ea7059 metal : tune soft_max number of threads (whisper/0) Georgi Gerganov 2024-05-13 11:01:07 +03:00
  • c3c88f296a ggml : try fix ppc64 (whisper/0) Georgi Gerganov 2024-05-12 20:36:31 +03:00
  • 182adefcf3 ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (whisper/2128) Przemysław Pawełczyk 2024-05-08 17:33:43 +02:00
  • 0d26d8ccd8 ggml : optimize for ppc64le using VSX intrinsics (ggml/784) Hong Bo PENG 2024-05-12 17:17:18 +08:00
  • 4f0263633b
    server: free sampling contexts on exit (#7264) Steve Grubb 2024-05-14 10:11:24 -04:00
  • 1265c670fd
    Revert "move ndk code to a new library (#6951)" (#7282) Brian 2024-05-14 23:10:39 +10:00
  • 5e31828d3e
    ggml : add RPC backend (#6829) Radoslav Gerganov 2024-05-14 14:27:19 +03:00
  • 541600201e
    llama : disable pipeline parallelism with nkvo (#7265) slaren 2024-05-14 09:33:42 +02:00
  • efc8f767c8
    move ndk code to a new library (#6951) Elton Kola 2024-05-14 03:30:30 -04:00
  • e0f556186b
    Add left recursion check: quit early instead of going into an infinite loop (#7083) Haggai Nuchi 2024-05-13 22:25:56 -07:00
  • 27f65d6267
    docs: Fix typo and update description for --embeddings flag (#7026) Ryuei 2024-05-14 14:20:47 +09:00
  • ee52225067
    convert-hf : support direct Q8_0 conversion (#7234) compilade 2024-05-13 14:10:51 -04:00
  • 614d3b914e
    llama : less KV padding when FA is off (#7257) Georgi Gerganov 2024-05-13 17:15:15 +03:00
  • 30e70334f7
    llava-cli: fix base64 prompt (#7248) k.h.lai 2024-05-13 22:02:36 +08:00
  • 1c570d8bee
    perplexity: add BF16 vs. FP16 results (#7150) Johannes Gäßler 2024-05-13 13:03:27 +02:00
  • 948f4ec7c5
    [SYCL] rm wait() (#7233) Neo Zhang 2024-05-13 18:11:26 +08:00
  • 9aa672490c
    llama : rename jina tokenizers to v2 (#7249) Joan Fontanals 2024-05-13 10:35:14 +02:00
  • b1f8af1886
    convert.py: Outfile default name change and additional metadata support (#4858) Brian 2024-05-13 12:56:47 +10:00
  • e586ee4259
    change default temperature of OAI compat API from 0 to 1 (#7226) Benjamin Findley 2024-05-12 19:40:08 -07:00
  • cbf75894d2
    [SYCL] Add oneapi runtime dll files to win release package (#7241) Neo Zhang 2024-05-13 08:04:29 +08:00
  • 0d5cef78ae
    [SYCL] update CI with oneapi 2024.1 (#7235) Neo Zhang 2024-05-13 08:02:55 +08:00
  • dc685be466
    CUDA: add FP32 FlashAttention vector kernel (#7188) Johannes Gäßler 2024-05-12 19:40:45 +02:00
  • 6f1b63606f
    cmake : fix version cmp (#7227) Georgi Gerganov 2024-05-12 18:30:23 +03:00
  • b228aba91a
    remove convert-lora-to-ggml.py (#7204) slaren 2024-05-12 02:29:33 +02:00
  • 7bd4ffb780
    metal : fix warnings (skipme) (#0) Georgi Gerganov 2024-05-11 21:36:20 +03:00
  • 1622ac023f
    sync : ggml Georgi Gerganov 2024-05-11 21:35:05 +03:00
  • 6aeff24f8b
    metal : fix indent (ggml/0) Georgi Gerganov 2024-05-11 16:57:53 +03:00
  • 325756d28d
    ggml : resolve merge (ggml/0) Georgi Gerganov 2024-05-11 16:25:50 +03:00
  • fed0108491
    Scripting & documenting debugging one test without anything else in the loop. (#7096) Josh Ramer 2024-05-11 12:26:35 -05:00
  • 72c177c1f6
    fix system prompt handling (#7153) Xuan Son Nguyen 2024-05-11 17:28:10 +02:00
  • 5a419926b0
    convert-hf : support bfloat16 conversion (#7158) compilade 2024-05-11 11:06:26 -04:00
  • fae9d234b6 sync : ggml Georgi Gerganov 2024-05-11 12:02:39 +03:00
  • f5ef34e428 feat: implemented sigmoid function (ggml/806) Justina Cho 2024-05-01 14:44:26 -07:00
  • ef0d5e3ec9 build: fix and ignore msvc warnings (ggml/805) Borislav Stanimirov 2024-04-25 17:24:07 +03:00
  • 3292733f95
    convert : skip unaccessible HF repos (#7210) CrispStrobe 2024-05-11 10:18:35 +02:00
  • 988631335a
    server : free llama_batch on exit (#7212) Steve Grubb 2024-05-11 04:13:02 -04:00
  • f99e1e456e
    llama : lookup word in vocab before doing BPE merges (#7193) Haoxiang Fei 2024-05-11 16:12:06 +08:00
  • 5ae3426b0b
    server: fix reported top tokens for temperature 0 (#7203) Johannes Gäßler 2024-05-11 10:11:28 +02:00
  • b83cc3f5b3
    llama : add Jina Embeddings architecture (#6826) Joan Fontanals 2024-05-11 09:46:09 +02:00
  • 9cb317f77e
    ggml : full ALiBi support (#7192) Georgi Gerganov 2024-05-11 10:32:41 +03:00
  • e849648888
    llama-bench : add pp+tg test type (#7199) slaren 2024-05-10 18:03:54 +02:00
  • 18e437665c
    metal : fix flash attention kernel requirements (#7169) Georgi Gerganov 2024-05-10 18:20:10 +03:00
  • 8c660242d7
    convert : print "ignore_merges" field Georgi Gerganov 2024-05-10 17:53:04 +03:00
  • 25c6e82e7a
    llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200) slaren 2024-05-10 14:28:01 +02:00
  • 4e3880978f
    Fix memory bug in grammar parser (#7194) Justine Tunney 2024-05-10 07:01:08 -04:00
  • f89fe2732c
    Main+: optionally allow special tokens from user in interactive mode (#7097) HanishKVC 2024-05-10 15:51:58 +05:30
  • d11afd6652
    llava : fix moondream support (#7163) Andrei 2024-05-10 02:41:10 -04:00
  • 8c570c9496
    Minor arithmetic improvement to mmvq wrapper kernel (#7172) Ouadie EL FAROUKI 2024-05-10 01:32:15 +01:00
  • eaf4bd8b39
    eval-callback : fix conversion to float (#7184) slaren 2024-05-10 01:04:12 +02:00
  • befddd0f15
    Vulkan Bugfixes and Improvements (#7084) 0cc4m 2024-05-09 20:39:54 +02:00
  • d46dbc76f8
    readme : add scheduled server workflow status badge Georgi Gerganov 2024-05-09 16:40:42 +03:00
  • 0961d86604
    readme : add app (#6371) l3utterfly 2024-05-09 22:32:40 +09:00
  • 43248e5594
    llama3 custom regex split (#6965) jaime-m-p 2024-05-09 15:30:44 +02:00
  • a743d76a01
    CUDA: generalize FP16 fattn vec kernel (#7061) Johannes Gäßler 2024-05-09 14:32:02 +02:00
  • f31ec120bc
    Add warning if token is invalid (#7173) Galunid 2024-05-09 14:13:05 +02:00
  • fd9f92b154
    llama : update llama_timings.n_p_eval setting (#7160) Daniel Bevenius 2024-05-09 13:03:29 +02:00
  • 22842164bc
    gguf-py : add special token modification capability (#7166) Sigbjørn Skjæret 2024-05-09 12:56:00 +02:00
  • 4734524882
    opencl : alignment size converted from bits to bytes (#7090) Albert Jin 2024-05-09 17:34:37 +08:00
  • 07cd41d096
    TypoFix (#7162) Ahmet Zeer 2024-05-09 11:16:45 +03:00
  • 4426e2987b
    cmake : fix typo (#7151) Jared Van Bortel 2024-05-08 19:55:32 -04:00
  • f98eb31c51
    convert-hf : save memory with lazy evaluation (#7075) compilade 2024-05-08 18:16:38 -04:00
  • bc4bba364f
    Introduction of CUDA Graphs to LLama.cpp (#6766) agray3 2024-05-08 21:55:49 +01:00
  • c12452c7ae
    JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143) Johannes Gäßler 2024-05-08 21:53:08 +02:00
  • 9da243b36a
    Revert "llava : add support for moondream vision language model (#6899)" Georgi Gerganov 2024-05-08 22:14:39 +03:00
  • bd1871fa2b
    server : add themes + favicon (#6848) JohnnyB 2024-05-08 20:12:06 +01:00
  • 26458af1d6
    metal : use vm_allocate instead of posix_memalign on macOS (#7078) Gilad S 2024-05-08 22:08:10 +03:00
  • 83330d8cd6
    main : add --conversation / -cnv flag (#7108) Dawid Potocki 2024-05-09 02:32:32 +12:00
  • 465263d0cf
    sgemm : AVX Q4_0 and Q8_0 (#6891) Eve 2024-05-08 14:29:23 +00:00
  • 911b3900dd
    server : add_special option for tokenize endpoint (#7059) Johan 2024-05-08 14:27:58 +02:00
  • ad211edef5
    convert.py : --vocab-only generates false but valid params (#7027) 20kdc 2024-05-08 13:22:32 +01:00
  • 229ffff872
    llama : add BPE pre-tokenization for Qwen2 (#7114) Ren Xuancheng 2024-05-08 20:06:43 +08:00
  • 1fd9c1741d
    clean up json_value & server_log (#7142) Xuan Son Nguyen 2024-05-08 13:24:14 +02:00
  • 4cd621c26d
    convert : add BPE pre-tokenization for DBRX (#7132) DAN™ 2024-05-08 06:43:23 -04:00
  • 7e0b6a7b3b
    py : also print the normalizers Georgi Gerganov 2024-05-08 12:47:07 +03:00
  • acdce3cdef
    compare-llama-bench.py: add missing basicConfig (#7138) Brian 2024-05-08 18:54:39 +10:00
  • 3855416027
    ggml : introduce bfloat16 support (#6412) Justine Tunney 2024-05-08 02:30:09 -04:00
  • c0e6fbf8c3
    metal : fix unused warning Georgi Gerganov 2024-05-08 09:14:50 +03:00
  • c780e75305
    Further tidy on Android instructions README.md (#7077) Jeximo 2024-05-07 21:26:43 -03:00
  • 48b2f9c1fc
    Fixed save_imatrix to match old behaviour for MoE (#7099) jukofyork 2024-05-08 01:24:16 +01:00
  • af0a5b6163
    server: fix incorrectly reported token probabilities (#7125) Johannes Gäßler 2024-05-07 23:07:58 +02:00
  • b6aa670203
    Fix OLMo HF to GGUF conversion (#6910) nopperl 2024-05-07 19:39:43 +00:00
  • 260b7c6529
    server : update readme with undocumented options (#7013) Kyle Mistele 2024-05-07 13:44:29 -05:00
  • 53d6c52e22
    readme : update hot topics Georgi Gerganov 2024-05-07 21:43:13 +03:00
  • 3af34c1d1b
    main : update log text (EOS to EOG) (#7104) RhinoDevel 2024-05-07 19:51:31 +02:00
  • 04976db7a8
    docs: fix typos (#7124) omahs 2024-05-07 17:20:33 +02:00
  • 947d3ad27d
    ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098) Georgi Gerganov 2024-05-07 11:08:49 +03:00
  • 858f6b73f6
    Add an option to build without CUDA VMM (#7067) William Tambellini 2024-05-06 11:12:14 -07:00
  • b3a995b416
    flake.lock: Update (#7079) Georgi Gerganov 2024-05-06 18:36:06 +03:00
  • bcdee0daa7
    minor : fix trailing whitespace Georgi Gerganov 2024-05-06 09:31:30 +03:00
  • 628b299106
    Adding support for the --numa argument for llama-bench. (#7080) kunnis 2024-05-05 07:17:47 -05:00