JH-Motif commited on
Commit
1fe9873
·
verified ·
1 Parent(s): e5ec424

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -113,6 +113,27 @@ The benchmarks and metrics used are identical to those in the [Phi-3 technical r
113
  #### Gemma 1 & 2
114
  The benchmarks and metrics used are identical to those in the [Gemma 2 technical report](https://arxiv.org/abs/2408.00118).
115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  #### Gemma 3
117
  The benchmarks and metrics used are identical to those in the [Gemma 3 technical report](https://arxiv.org/abs/2503.19786).
118
 
@@ -127,6 +148,6 @@ The benchmarks and metrics used are identical to those in the [Gemma 3 technical
127
  |MATH|4-shot|48|75.6|40.2|-16.25%|-46.83%|
128
  |HiddenMath*|-|15.8|43|-|-|-|
129
  |MMLU(val)|5-shot|-|48.8|57.93|-|+18.71%|
130
- |||||**Average**|+24.71%|-8.28%|
131
 
132
  \*: We were unable to find an evaluation framework for this benchmark.
 
113
  #### Gemma 1 & 2
114
  The benchmarks and metrics used are identical to those in the [Gemma 2 technical report](https://arxiv.org/abs/2408.00118).
115
 
116
+ |Benchmark|Metric|Gemma 1 2B|Gemma 1 7B|Gemma 2 2B|Gemma 2 9B|Motif 2.6B|Improvement(over 1 1B)|Improvement(over 1 7B)|Improvement(over 2 2B)|Improvement(over 2 9B)|
117
+ |---|---|---|---|---|---|---|---|---|---|---|
118
+ |MMLU|5-shot||||||||||
119
+ |ARC-C|25-shot||||||||||
120
+ |GSM8K|5-shot||||||||||
121
+ |AGIEval*|3-5-shot||||||||||
122
+ |DROP|3-shot, F1||||||||||
123
+ |BBH|3-shot, CoT||||||||||
124
+ |Winogrande|5-shot||||||||||
125
+ |HellaSwag|10-shot||||||||||
126
+ |MATH|4-shot||||||||||
127
+ |ARC-e|0-shot||||||||||
128
+ |PIQA|0-shot||||||||||
129
+ |SIQA|0-shot||||||||||
130
+ |Boolq|0-shot||||||||||
131
+ |TriviaQA|5-shot||||||||||
132
+ |NQ|5-shot||||||||||
133
+ |HumanEval|pass@1||||||||||
134
+ |MBPP|3-shot||||||||||
135
+ |||||||**Average**|**TBA**|**TBA**|**TBA**|**TBA**|
136
+
137
  #### Gemma 3
138
  The benchmarks and metrics used are identical to those in the [Gemma 3 technical report](https://arxiv.org/abs/2503.19786).
139
 
 
148
  |MATH|4-shot|48|75.6|40.2|-16.25%|-46.83%|
149
  |HiddenMath*|-|15.8|43|-|-|-|
150
  |MMLU(val)|5-shot|-|48.8|57.93|-|+18.71%|
151
+ |||||**Average**|**+24.71%**|**-8.28%**|
152
 
153
  \*: We were unable to find an evaluation framework for this benchmark.