SungminLee commited on
Commit
31762c4
·
verified ·
1 Parent(s): 6cd2bd0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -133,10 +133,10 @@ The benchmarks and corresponding scores listed in the table below are taken dire
133
  |IFEval|-|59.5|77.4|74.02|+24.40%|-4.37%|
134
  |GSM9K|8-shot, CoT|44.4|77.7|74.9|+68.69%|-3.60%|
135
  |MATH|0-shot, CoT|30.6|48|49.68|+62.35%|+3.50%|
136
- |ARC Challenge|0-shot|59.4|78.5|74.2|+24.92%|-5.48%|
137
  |GPQA|0-shot|27.2|32.8|25.45|-6.43%|-22.41%|
138
  |Hellaswag|0-shot|41.2|69.8|61.35|+48.91%|-12.11%|
139
- |||||**Average**|**+39.42%**|**-3.83%**|
140
 
141
  \*: We were unable to find an evaluation framework for this benchmark.
142
 
 
133
  |IFEval|-|59.5|77.4|74.02|+24.40%|-4.37%|
134
  |GSM9K|8-shot, CoT|44.4|77.7|74.9|+68.69%|-3.60%|
135
  |MATH|0-shot, CoT|30.6|48|49.68|+62.35%|+3.50%|
136
+ |ARC Challenge|0-shot|59.4|78.6|74.2|+24.92%|-5.6%|
137
  |GPQA|0-shot|27.2|32.8|25.45|-6.43%|-22.41%|
138
  |Hellaswag|0-shot|41.2|69.8|61.35|+48.91%|-12.11%|
139
+ |||||**Average**|**+39.42%**|**-3.86%**|
140
 
141
  \*: We were unable to find an evaluation framework for this benchmark.
142