Update README.md
Browse files
README.md
CHANGED
@@ -106,5 +106,27 @@ The benchmarks and metrics used are identical to those in the [Phi-3 technical r
|
|
106 |
|MT Bench|2R. Avg.|8.38|8.7|-|6.77|-19.21%|-22.18%|-|
|
107 |
||||||**Average**|**-10.09%**|**-13.45%**|**+10.18%**|
|
108 |
|
109 |
-
|
110 |
\*: We were unable to find an evaluation framework for this benchmark.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|MT Bench|2R. Avg.|8.38|8.7|-|6.77|-19.21%|-22.18%|-|
|
107 |
||||||**Average**|**-10.09%**|**-13.45%**|**+10.18%**|
|
108 |
|
|
|
109 |
\*: We were unable to find an evaluation framework for this benchmark.
|
110 |
+
|
111 |
+
### Comparsion to Gemma
|
112 |
+
|
113 |
+
#### Gemma 1 & 2
|
114 |
+
The benchmarks and metrics used are identical to those in the [Gemma 2 technical report](https://arxiv.org/abs/2408.00118).
|
115 |
+
|
116 |
+
#### Gemma 3
|
117 |
+
The benchmarks and metrics used are identical to those in the [Gemma 3 technical report](https://arxiv.org/abs/2503.19786).
|
118 |
+
|
119 |
+
|Benchmark|Metric|Gemma 3 1B|Gemma 3 4B|Motif 2.6B|Improvement(over 1B)|Improvement(over 4B)|
|
120 |
+
|---|---|---|---|---|---|---|
|
121 |
+
|MMLU-Pro|5-shot|14.7|43.6|-|-|-|
|
122 |
+
|LiveCodeBench*|-|1.9|12.6|-|-|-|
|
123 |
+
|Bird-SQL(dev)\*|-|6.4|36.3|-|-|-|
|
124 |
+
|GPQA Diamond|5-shot|19.2|30.8|31.81|+65.68%|+3.28%|
|
125 |
+
|SimpleQA*|-|2.2|4|-|-|-|
|
126 |
+
|FACTS Grounding*|-|36.4|70.1|-|-|-|
|
127 |
+
|MATH|4-shot|48|75.6|40.2|-16.25%|-46.83%|
|
128 |
+
|HiddenMath*|-|15.8|43|-|-|-|
|
129 |
+
|MMLU(val)|5-shot|-|48.8|57.93|-|+18.71%|
|
130 |
+
|||||**Average**|+24.71%|-8.28%|
|
131 |
+
|
132 |
+
\*: We were unable to find an evaluation framework for this benchmark.
|