Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ pipeline_tag: text-generation
|
|
7 |
|
8 |
<img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
|
9 |
|
10 |
-
OLMo 2 32B Instruct March 2025 is post-trained variant of the [OLMo-2 32B March 2025](https://huggingface.co/allenai/OLMo-2-0325-32B/) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-
|
11 |
Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
|
12 |
Check out the [OLMo 2 paper](https://arxiv.org/abs/2501.00656) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
|
13 |
|
@@ -81,27 +81,25 @@ See the Falcon 180B model card for an example of this.
|
|
81 |
|
82 |
## Performance
|
83 |
|
84 |
-
| Model
|
85 |
-
|
86 |
-
|
|
87 |
-
|
|
88 |
-
|
|
89 |
-
|
|
90 |
-
|
|
91 |
-
|
|
92 |
-
|
|
93 |
-
|
|
94 |
-
|
|
95 |
-
|
|
96 |
-
|
|
97 |
-
|
|
98 |
-
|
|
99 |
-
|
|
100 |
-
|
|
101 |
-
|
|
102 |
-
|
103 |
-
| **OLMo-2-7B-1124–Instruct** | 54.8 | 29.1 | 46.6 | 60.5 | 85.1 | 72.3 | 32.5 | 61.3 | 80.6 | 23.2 | 56.5 |
|
104 |
-
| **OLMo-2-13B-1124-Instruct** | 62.0 | 39.5 | 58.8 | 71.5 | 87.4 | 82.6 | 39.2 | 68.5 | 79.1 | 28.8 | 64.3 |
|
105 |
|
106 |
|
107 |
## License and use
|
|
|
7 |
|
8 |
<img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
|
9 |
|
10 |
+
OLMo 2 32B Instruct March 2025 is post-trained variant of the [OLMo-2 32B March 2025](https://huggingface.co/allenai/OLMo-2-0325-32B/) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1).
|
11 |
Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
|
12 |
Check out the [OLMo 2 paper](https://arxiv.org/abs/2501.00656) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
|
13 |
|
|
|
81 |
|
82 |
## Performance
|
83 |
|
84 |
+
| Model | AVG | AE2 | BBH | DROP | GSM8K | IFE | MATH | MMLU | Safety | PQA | TQA |
|
85 |
+
|-------------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|
|
86 |
+
| OLMo 2 7B SFT | 51.4 | 10.2 | 49.6 | 59.6 | 74.6 | 66.9 | 25.3 | 61.1 | 94.6 | 23.6 | 48.6 |
|
87 |
+
| OLMo 2 7B DPO | 55.9 | 27.9 | 51.1 | 60.2 | 82.6 | 73.0 | 30.3 | 60.8 | 93.7 | 23.5 | 56.0 |
|
88 |
+
| OLMo 2 7B Instruct| 56.5 | 29.1 | 51.4 | 60.5 | 85.1 | 72.3 | 32.5 | 61.3 | 93.3 | 23.2 | 56.5 |
|
89 |
+
| OLMo 2 13B SFT | 56.6 | 11.5 | 59.9 | 71.3 | 76.3 | 68.6 | 29.5 | 68.0 | 94.3 | 29.4 | 57.1 |
|
90 |
+
| OLMo 2 13B DPO | 62.0 | 38.3 | 61.4 | 71.5 | 82.3 | 80.2 | 35.2 | 67.9 | 90.3 | 29.0 | 63.9 |
|
91 |
+
| OLMo 2 13B Instruct| 63.4 | 39.5 | 63.0 | 71.5 | 87.4 | 82.6 | 39.2 | 68.5 | 89.7 | 28.8 | 64.3 |
|
92 |
+
| OLMo 2 32B SFT | 58.09 | 14.49 | 67.10 | 75.68 | 79.76 | 74.49 | 36.02 | 77.80 | - | 34.25 | 63.26 |
|
93 |
+
| OLMo 2 32B DPO | 64.95 | 46.18 | 68.05 | 76.45 | 85.60 | 80.59 | 39.08 | 78.26 | - | 36.57 | 73.78 |
|
94 |
+
| OLMo 2 32B Instruct | 66.17 | 45.70 | 69.10 | 76.49 | 89.08 | 83.55 | 42.74 | 78.53 | - | 36.70 | 73.64 |
|
95 |
+
| Gemma-2-27b | 61.32 | 49.01 | 72.69 | 67.52 | 80.67 | 63.22 | 35.06 | 70.66 | 75.9 | 33.85 | 64.58 |
|
96 |
+
| GPT-3.5 Turbo 0125| 59.56 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9* |
|
97 |
+
| GPT 4o Mini2024-07-18| 65.72 | 49.7 | 65.9* | 36.3 | 83.0 | 83.5 | 67.9 | 82.2 | 84.9 | 39.0 | 64.8* |
|
98 |
+
| Qwen2.5-32B | 66.54 | 39.07 | 82.34 | 48.26 | 87.49 | 82.44 | 77.89 | 84.66 | 82.4 | 26.10 | 70.57 |
|
99 |
+
| Mistral-Small-24B | 67.6 | 43.20 | 80.11 | 78.51 | 87.19 | 77.26 | 65.86 | 83.72 | 66.5 | 24.38 | 68.14 |
|
100 |
+
| Llama-3.1-70B | 69.99 | 32.91 | 82.97 | 76.96 | 94.47 | 87.99 | 56.17 | 85.15 | 76.4 | 46.50 | 66.83 |
|
101 |
+
| Llama-3.3-70B | 72.96 | 36.48 | 85.79 | 77.99 | 93.56 | 90.76 | 71.84 | 85.85 | 70.4 | 48.24 | 66.11 |
|
102 |
+
|
|
|
|
|
103 |
|
104 |
|
105 |
## License and use
|