llama.cpp

mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-01-28 02:53:15 +08:00

History

llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 )

* Add attention and final logit softcapping.

* fix

* Add custom add_ functions

* Disable flash attention for Gemma2

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add default value for attention and final logit softcap value

* Add custom kq scaling from Gemma2Attention

* Remove custom pre attention scaling and use computed value instead.

---------

Co-authored-by: slaren <slarengh@gmail.com>

2024-06-29 23:44:08 -04:00

__init__.py

convert-hf : support direct Q8_0 conversion (#7234 )

2024-05-13 14:10:51 -04:00

constants.py

llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 )

2024-06-29 23:44:08 -04:00

gguf_reader.py

Gguf dump start data offset via --data-offset and some extra refactor (#8054 )

2024-06-25 22:03:25 +10:00

gguf_writer.py

llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 )