Update README.md
Browse files
README.md
CHANGED
@@ -133,10 +133,10 @@ The benchmarks and corresponding scores listed in the table below are taken dire
|
|
133 |
|IFEval|-|59.5|77.4|74.02|+24.40%|-4.37%|
|
134 |
|GSM9K|8-shot, CoT|44.4|77.7|74.9|+68.69%|-3.60%|
|
135 |
|MATH|0-shot, CoT|30.6|48|49.68|+62.35%|+3.50%|
|
136 |
-
|ARC Challenge|0-shot|59.4|78.
|
137 |
|GPQA|0-shot|27.2|32.8|25.45|-6.43%|-22.41%|
|
138 |
|Hellaswag|0-shot|41.2|69.8|61.35|+48.91%|-12.11%|
|
139 |
-
|||||**Average**|**+39.42%**|**-3.
|
140 |
|
141 |
\*: We were unable to find an evaluation framework for this benchmark.
|
142 |
|
|
|
133 |
|IFEval|-|59.5|77.4|74.02|+24.40%|-4.37%|
|
134 |
|GSM9K|8-shot, CoT|44.4|77.7|74.9|+68.69%|-3.60%|
|
135 |
|MATH|0-shot, CoT|30.6|48|49.68|+62.35%|+3.50%|
|
136 |
+
|ARC Challenge|0-shot|59.4|78.6|74.2|+24.92%|-5.6%|
|
137 |
|GPQA|0-shot|27.2|32.8|25.45|-6.43%|-22.41%|
|
138 |
|Hellaswag|0-shot|41.2|69.8|61.35|+48.91%|-12.11%|
|
139 |
+
|||||**Average**|**+39.42%**|**-3.86%**|
|
140 |
|
141 |
\*: We were unable to find an evaluation framework for this benchmark.
|
142 |
|