llama.cpp/fattn-mma-f16-instance-cpb16.cu at 864a0b67a6c8f648c43ce8271f9cb2e12dd5df6e - llama.cpp - Gitea4PDT

RYDE-WORK/llama.cpp

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-02-07 00:03:16 +08:00

Johannes Gäßler 864a0b67a6

CUDA: use mma PTX instructions for FlashAttention (#11583 )

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>

2025-02-02 19:31:09 +01:00

11 lines

318 B

Plaintext

Raw Blame History

 // This file has been autogenerated by generate_cu_files.py, do not edit manually.
 #include "../fattn-mma-f16.cuh"
 DECL_FATTN_MMA_F16_CASE(64, 16);
 DECL_FATTN_MMA_F16_CASE(80, 16);
 DECL_FATTN_MMA_F16_CASE(96, 16);
 DECL_FATTN_MMA_F16_CASE(112, 16);
 DECL_FATTN_MMA_F16_CASE(128, 16);
 DECL_FATTN_MMA_F16_CASE(256, 16);