From 369f4d917dd911e15b0587cbca703178176af9f8 Mon Sep 17 00:00:00 2001 From: Atream <80757050+Atream@users.noreply.github.com> Date: Wed, 26 Feb 2025 22:04:29 +0800 Subject: [PATCH] Update DeepseekR1_V3_tutorial.md --- doc/en/DeepseekR1_V3_tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/en/DeepseekR1_V3_tutorial.md b/doc/en/DeepseekR1_V3_tutorial.md index 29bfe3b..02575c9 100644 --- a/doc/en/DeepseekR1_V3_tutorial.md +++ b/doc/en/DeepseekR1_V3_tutorial.md @@ -160,7 +160,7 @@ is speed up which is inspiring. So our showcase makes use of this finding* ### V0.2.2 longer context & FP8 kernel #### longer context To use this feature, [install flashinfer](https://github.com/flashinfer-ai/flashinfer) first. - +Note: The latest MLA kernel in FlashInfer still has a few minor issues. They are continuously fixing them on the main branch. If you are using FlashInfer, please install it from the main source code. If you want to use long context(longer than 20K) for prefill, enable the matrix absorption MLA during the prefill phase, which will significantly reduce the size of the kv cache. Modify yaml file like this: ```