mirror of
https://github.com/RYDE-WORK/MiniCPM.git
synced 2026-01-19 12:53:36 +08:00
update dataset name
This commit is contained in:
parent
213716ff0a
commit
b0f6110f8a
@ -75,11 +75,11 @@ XXXXXX
|
||||
* PPL:将选项作为题目生成的延续,并根据各个选项的PPL来进行答案选择;
|
||||
* 第二种是直接生成答案选项。
|
||||
* 对于不同模型,这两种方式得到的结果差异较大。MiniCPM两种模式上的结果较为接近,而Mistral-7B-v0.1等模型在PPL上表现较好,直接生成上效果较差。
|
||||
* 在具体评测时,我们以两种评测方式得分的最高者为最终结果,以此保证对比的公平性。
|
||||
* 在具体评测时,我们以两种评测方式得分的最高者为最终结果,以此保证对比的公平性(以下表格中*号表示采用PPL)。
|
||||
|
||||
#### 文本评测
|
||||
|
||||
|模型|英文均分|中文均分|CEval|CMMLU|MMLU|HumanEval|MBPP|GSM8K|MATH|BBH|Arc-e|ARC-c|HellaSwag|
|
||||
|模型|英文均分|中文均分|C-Eval|CMMLU|MMLU|HumanEval|MBPP|GSM8K|MATH|BBH|ARC-E|ARC-C|HellaSwag|
|
||||
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
|
||||
|Llama2-7B|35.40|36.21|31.765|32.42|31.11|44.32|12.2|27.17|13.57|1.8|33.23|75.25|42.75|75.62*|
|
||||
|Qwen-7B|53.87|52.42|59.655|58.96|60.35|57.65|17.07|42.15|41.24|37.75|83.42|64.76|75.32*|
|
||||
@ -90,7 +90,7 @@ XXXXXX
|
||||
|Falcon-40B|43.62|44.21|40.93|40.29|41.57|53.53|24.39|36.53|22.44|1.92|36.24|81.94*|57.68|83.26*|
|
||||
|MiniCPM-2B|52.33|52.6|51.1|51.13|51.07|53.46|50.00|47.31|53.83|10.24|36.87|85.44|68.00|68.25|
|
||||
|
||||
|模型|英文均分|中文均分|CEval|CMMLU|MMLU|HumanEval|MBPP|GSM8K|MATH|BBH|Arc-e|ARC-c|HellaSwag|
|
||||
|模型|英文均分|中文均分|C-Eval|CMMLU|MMLU|HumanEval|MBPP|GSM8K|MATH|BBH|ARC-E|ARC-C|HellaSwag|
|
||||
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
|
||||
|TinyLlama-1.1B|25.36|25.55|24.525|25.02|24.03|24.3|6.71|19.91|2.27|0.74|28.78|60.77*|28.15*|58.33*|Qwen-1.8B|34.72|31.87|47.565|49.81|45.32|43.37|7.93|17.8|19.26|2.42|29.07|63.97*|43.69|59.28*|
|
||||
|Gemini Nano-3B|-|-|-|-|-|-|-|27.2(report)|22.8(report)|-|42.4(report)|-|-|-|
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user