From ecb090ad80ea6b5e5d958d13d7baf92964f3ddf7 Mon Sep 17 00:00:00 2001 From: SillyXu Date: Thu, 1 Feb 2024 09:15:00 +0800 Subject: [PATCH] Update README.md --- README.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 81c246b..af120e2 100644 --- a/README.md +++ b/README.md @@ -82,9 +82,7 @@ XXXXXX 在具体评测时,我们以两种评测方式得分的最高者为最终结果,以此保证对比的公平性。 -#### 评测结果 - -* 文本评测 +#### 文本评测 |模型|英文均分|中文均分|CEval|CMMLU|MMLU|HumanEval|MBPP|GSM8K|MATH|BBH|Arc-e|ARC-c|HellaSwag| |-|-|-|-|-|-|-|-|-|-|-|-|-|-| @@ -105,7 +103,7 @@ XXXXXX |Phi-2-2B|48.84|54.41|23.775|23.37|24.18|52.66|47.56|55.04|57.16|3.5|43.39|86.11|71.25|73.07*| |MiniCPM-2B|52.33|52.6|51.1|51.13|51.07|53.46|50.00|47.31|53.83|10.24|36.87|85.44|68.00|68.25| -* 多模态评测 +#### 多模态评测 |模型|MME(P)|MMB-dev(en)|MMB-dev(zh)|MMMU-val|CMMMU-val| |-|-|-|-|-|-| @@ -115,7 +113,7 @@ XXXXXX |Qwen-VL-Chat|**1487**|60.6|56.7|**35.9**|30.7 |**MiniCPM-V**|1446|**67.3**|**61.9**|34.7|**32.1**| -* DPO评测 +#### DPO评测 |模型|MT-bench| |---|---|