llama.cpp

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-01-20 05:33:37 +08:00

History

Johannes Gäßler 864a0b67a6

CUDA: use mma PTX instructions for FlashAttention (#11583 )

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>

2025-02-02 19:31:09 +01:00

ggml-alloc.h

ggml : fix typo in example usage ggml_gallocr_new (ggml/984)

2024-10-04 18:50:05 +03:00

ggml-backend.h

rpc : early register backend devices (#11262 )

2025-01-17 10:57:09 +02:00

ggml-blas.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-cann.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-cpp.h

GGUF: C++ refactor, backend support, misc fixes (#11030 )

2025-01-07 18:01:58 +01:00

ggml-cpu.h

ggml : refactor online repacking (#10446 )

2024-12-07 14:37:50 +02:00

ggml-cuda.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-kompute.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-metal.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-opencl.h

Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693 )

2024-12-13 12:23:52 -08:00

ggml-opt.h

ggml: new optimization interface (ggml/988)

2024-11-17 08:30:29 +02:00

ggml-rpc.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-sycl.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-vulkan.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml.h

CUDA: use mma PTX instructions for FlashAttention (#11583 )

2025-02-02 19:31:09 +01:00

gguf.h

GGUF: C++ refactor, backend support, misc fixes (#11030 )

2025-01-07 18:01:58 +01:00