diff --git a/README.md b/README.md index 3aa5954..f103d37 100644 --- a/README.md +++ b/README.md @@ -144,14 +144,14 @@ print(responds) |Falcon-40B|43.62|44.21|40.93|40.29|41.57|53.53|24.39|36.53|22.44|1.92|36.24|81.94*|57.68|83.26*| |MiniCPM-2B|52.33|52.6|51.1|51.13|51.07|53.46|50.00|47.31|53.83|10.24|36.87|85.44|68.00|68.25| -| Model | 平均分 | 英文均分 | 中文均分 | CEval | CMMLU | MMLU | HumanEval | MBPP | GSM8K | MATH | BBH | Arc-e | ARC-c | -|-----------------------|--------|--------------------------|---------|-------|-------|------|-----------|------|-------|------|------|-------|-------| -| TinyLlama-1.1B | 25.36 | 25.55 | 24.525 | 25.02 | 24.03 | 24.3 | 6.71 | 19.91| 2.27 | 0.74 | 28.78| 60.77*| 28.15*| -| Qwen-1.8B | 34.72 | 31.87 | 47.565 | 49.81 | 45.32 | 43.37| 7.93 | 17.8 | 19.26 | 2.42 | 29.07| 63.97*| 43.69 | -| Gemini Nano-3B | - | - | - | - | - | - | - | 27.2(report)| 22.8(report)| - | 42.4(report)| - | - | -| StableLM-Zephyr-3B | 43.43 | 46.28 | 30.615 | 30.34 | 30.89 | 45.9 | 35.37 | 31.85| 52.54 | 12.12| 37.68| 73.78 | 55.38 | -| Phi-2(2B) | 48.84 | 54.41 | 23.775 | 23.37 | 24.18 | 52.66| 47.56 | 55.04| 57.16 | 3.5 | 43.39| 86.11 | 71.25 | -| MiniCPM-2B | 52.33 | 52.6 | 51.1 | 51.13 | 51.07 | 53.46| 50.00 | 47.31| 53.83 | 10.24| 36.87| 85.44 | 68.00c| +| Model | 平均分 | 英文均分(包括代码数学推理) | 中文均分 | CEval | CMMLU | MMLU | HumanEval | MBPP | GSM8K | MATH | BBH | Arc-e | ARC-c | HellaSwag | +|-----------------------|--------|--------------------------|---------|-------|-------|------|-----------|------|-------|------|------|-------|-------|-----------| +| TinyLlama-1.1B | 25.36 | 25.55 | 24.53 | 25.02 | 24.03 | 24.3 | 6.71 | 19.91| 2.27 | 0.74 | 28.78| 60.77*| 28.15*| 58.33* | +| Qwen-1.8B | 34.72 | 31.87 | 47.57 | 49.81 | 45.32 | 43.37| 7.93 | 17.8 | 19.26 | 2.42 | 29.07| 63.97*| 43.69 | 59.28* | +| Gemini Nano-3B | - | - | - | - | - | - | - | 27.2(report)| 22.8(report)| - | 42.4(report)| - | - | - | +| StableLM-Zephyr-3B | 43.43 | 46.28 | 30.62 | 30.34 | 30.89 | 45.9 | 35.37 | 31.85| 52.54 | 12.12| 37.68| 73.78 | 55.38| 71.87* | +| Phi-2(2B) | 48.84 | 54.41 | 23.78 | 23.37 | 24.18 | 52.66| 47.56 | 55.04| 57.16 | 3.5 | 43.39| 86.11 | 71.25| 73.07* | +| MiniCPM-2B | 52.33 | 52.6 | 51.10 | 51.13 | 51.07 | 53.46| 50.00 | 47.31| 53.83 | 10.24| 36.87| 85.44 | 68.00| 68.25 | #### 多模态评测