JH-Motif commited on
Commit
e5ec424
·
verified ·
1 Parent(s): 6daedc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -1
README.md CHANGED
@@ -106,5 +106,27 @@ The benchmarks and metrics used are identical to those in the [Phi-3 technical r
106
  |MT Bench|2R. Avg.|8.38|8.7|-|6.77|-19.21%|-22.18%|-|
107
  ||||||**Average**|**-10.09%**|**-13.45%**|**+10.18%**|
108
 
109
-
110
  \*: We were unable to find an evaluation framework for this benchmark.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  |MT Bench|2R. Avg.|8.38|8.7|-|6.77|-19.21%|-22.18%|-|
107
  ||||||**Average**|**-10.09%**|**-13.45%**|**+10.18%**|
108
 
 
109
  \*: We were unable to find an evaluation framework for this benchmark.
110
+
111
+ ### Comparsion to Gemma
112
+
113
+ #### Gemma 1 & 2
114
+ The benchmarks and metrics used are identical to those in the [Gemma 2 technical report](https://arxiv.org/abs/2408.00118).
115
+
116
+ #### Gemma 3
117
+ The benchmarks and metrics used are identical to those in the [Gemma 3 technical report](https://arxiv.org/abs/2503.19786).
118
+
119
+ |Benchmark|Metric|Gemma 3 1B|Gemma 3 4B|Motif 2.6B|Improvement(over 1B)|Improvement(over 4B)|
120
+ |---|---|---|---|---|---|---|
121
+ |MMLU-Pro|5-shot|14.7|43.6|-|-|-|
122
+ |LiveCodeBench*|-|1.9|12.6|-|-|-|
123
+ |Bird-SQL(dev)\*|-|6.4|36.3|-|-|-|
124
+ |GPQA Diamond|5-shot|19.2|30.8|31.81|+65.68%|+3.28%|
125
+ |SimpleQA*|-|2.2|4|-|-|-|
126
+ |FACTS Grounding*|-|36.4|70.1|-|-|-|
127
+ |MATH|4-shot|48|75.6|40.2|-16.25%|-46.83%|
128
+ |HiddenMath*|-|15.8|43|-|-|-|
129
+ |MMLU(val)|5-shot|-|48.8|57.93|-|+18.71%|
130
+ |||||**Average**|+24.71%|-8.28%|
131
+
132
+ \*: We were unable to find an evaluation framework for this benchmark.