mirror of https://github.com/RYDE-WORK/llama.cpp.git synced 2026-01-19 21:23:26 +08:00

History

Georgi Gerganov fcca0a7004

refact : fix convert script + zero out KV cache to avoid nans (#3523 )

* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements

2023-10-09 14:32:17 +03:00

CMakeLists.txt

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

parallel.cpp

refact : fix convert script + zero out KV cache to avoid nans (#3523 )

2023-10-09 14:32:17 +03:00

README.md

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

README.md

llama.cpp/example/parallel

Simplified simluation for serving incoming requests in parallel