Text Generation
Transformers
Safetensors
English
olmo2
conversational
amanrangapur commited on
Commit
791c1a7
·
verified ·
1 Parent(s): 106d0dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -22
README.md CHANGED
@@ -7,7 +7,7 @@ pipeline_tag: text-generation
7
 
8
  <img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
9
 
10
- OLMo 2 32B Instruct March 2025 is post-trained variant of the [OLMo-2 32B March 2025](https://huggingface.co/allenai/OLMo-2-0325-32B/) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix), and finally RLVR training using [this data](https://huggingface.co/datasets/allenai/RLVR-GSM).
11
  Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
12
  Check out the [OLMo 2 paper](https://arxiv.org/abs/2501.00656) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
13
 
@@ -81,27 +81,25 @@ See the Falcon 180B model card for an example of this.
81
 
82
  ## Performance
83
 
84
- | Model | Average | AlpacaEval | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
85
- |-------|---------|------------|-----|------|--------|---------|------|-------|---------|-------|---------|
86
- | **Open weights models** |
87
- | Gemma-2-9B-it | 51.9 | 43.7 | 2.5 | 58.8 | 79.7 | 69.9 | 29.8 | 69.1 | 75.5 | 28.3 | 61.4 |
88
- | Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 |
89
- | Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 |
90
- | Qwen-2.5-7B-Instruct | 57.1 | 29.7 | 25.3 | 54.4 | 83.8 | 74.7 | 69.9 | 76.6 | 75.0 | 18.1 | 63.1 |
91
- | Llama-3.1-8B-Instruct | 58.9 | 25.8 | 69.7 | 61.7 | 83.4 | 80.6 | 42.5 | 71.3 | 70.2 | 28.4 | 55.1 |
92
- | Tülu 3 8B | 60.4 | 34.0 | 66.0 | 62.6 | 87.6 | 82.4 | 43.7 | 68.2 | 75.4 | 29.1 | 55.0 |
93
- | Qwen-2.5-14B-Instruct | 60.8 | 34.6 | 34.0 | 50.5 | 83.9 | 82.4 | 70.6 | 81.1 | 79.3 | 21.1 | 70.8 |
94
- | **Fully open models** |
95
- | OLMo-7B-Instruct | 28.2 | 5.2 | 35.3 | 30.7 | 14.3 | 32.2 | 2.1 | 46.3 | 54.0 | 17.1 | 44.5 |
96
- | OLMo-7B-0424-Instruct | 33.1 | 8.5 | 34.4 | 47.9 | 23.2 | 39.2 | 5.2 | 48.9 | 49.3 | 18.9 | 55.2 |
97
- | OLMoE-1B-7B-0924-Instruct | 35.5 | 8.5 | 37.2 | 34.3 | 47.2 | 46.2 | 8.4 | 51.6 | 51.6 | 20.6 | 49.1 |
98
- | MAP-Neo-7B-Instruct | 42.9 | 17.6 | 26.4 | 48.2 | 69.4 | 35.9 | 31.5 | 56.5 | 73.7 | 18.4 | 51.6 |
99
- | *OLMo-2-7B-SFT* | 50.2 | 10.2 | 49.7 | 59.6 | 74.6 | 66.9 | 25.3 | 61.1 | 82.1 | 23.6 | 48.6 |
100
- | *OLMo-2-7B-DPO* | 54.2 | 27.9 | 46.7 | 60.2 | 82.6 | 73.0 | 30.3 | 60.8 | 81.0 | 23.5 | 56.0 |
101
- | *OLMo-2-13B-SFT* | 55.3 | 11.5 | 59.6 | 71.3 | 76.3 | 68.6 | 29.5 | 68.0 | 82.3 | 29.4 | 57.1 |
102
- | *OLMo-2-13B-DPO* | 60.6 | 38.3 | 57.9 | 71.5 | 82.3 | 80.2 | 35.2 | 67.9 | 79.7 | 29.0 | 63.9 |
103
- | **OLMo-2-7B-1124–Instruct** | 54.8 | 29.1 | 46.6 | 60.5 | 85.1 | 72.3 | 32.5 | 61.3 | 80.6 | 23.2 | 56.5 |
104
- | **OLMo-2-13B-1124-Instruct** | 62.0 | 39.5 | 58.8 | 71.5 | 87.4 | 82.6 | 39.2 | 68.5 | 79.1 | 28.8 | 64.3 |
105
 
106
 
107
  ## License and use
 
7
 
8
  <img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
9
 
10
+ OLMo 2 32B Instruct March 2025 is post-trained variant of the [OLMo-2 32B March 2025](https://huggingface.co/allenai/OLMo-2-0325-32B/) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1).
11
  Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
12
  Check out the [OLMo 2 paper](https://arxiv.org/abs/2501.00656) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
13
 
 
81
 
82
  ## Performance
83
 
84
+ | Model | AVG | AE2 | BBH | DROP | GSM8K | IFE | MATH | MMLU | Safety | PQA | TQA |
85
+ |-------------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|
86
+ | OLMo 2 7B SFT | 51.4 | 10.2 | 49.6 | 59.6 | 74.6 | 66.9 | 25.3 | 61.1 | 94.6 | 23.6 | 48.6 |
87
+ | OLMo 2 7B DPO | 55.9 | 27.9 | 51.1 | 60.2 | 82.6 | 73.0 | 30.3 | 60.8 | 93.7 | 23.5 | 56.0 |
88
+ | OLMo 2 7B Instruct| 56.5 | 29.1 | 51.4 | 60.5 | 85.1 | 72.3 | 32.5 | 61.3 | 93.3 | 23.2 | 56.5 |
89
+ | OLMo 2 13B SFT | 56.6 | 11.5 | 59.9 | 71.3 | 76.3 | 68.6 | 29.5 | 68.0 | 94.3 | 29.4 | 57.1 |
90
+ | OLMo 2 13B DPO | 62.0 | 38.3 | 61.4 | 71.5 | 82.3 | 80.2 | 35.2 | 67.9 | 90.3 | 29.0 | 63.9 |
91
+ | OLMo 2 13B Instruct| 63.4 | 39.5 | 63.0 | 71.5 | 87.4 | 82.6 | 39.2 | 68.5 | 89.7 | 28.8 | 64.3 |
92
+ | OLMo 2 32B SFT | 58.09 | 14.49 | 67.10 | 75.68 | 79.76 | 74.49 | 36.02 | 77.80 | - | 34.25 | 63.26 |
93
+ | OLMo 2 32B DPO | 64.95 | 46.18 | 68.05 | 76.45 | 85.60 | 80.59 | 39.08 | 78.26 | - | 36.57 | 73.78 |
94
+ | OLMo 2 32B Instruct | 66.17 | 45.70 | 69.10 | 76.49 | 89.08 | 83.55 | 42.74 | 78.53 | - | 36.70 | 73.64 |
95
+ | Gemma-2-27b | 61.32 | 49.01 | 72.69 | 67.52 | 80.67 | 63.22 | 35.06 | 70.66 | 75.9 | 33.85 | 64.58 |
96
+ | GPT-3.5 Turbo 0125| 59.56 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9* |
97
+ | GPT 4o Mini2024-07-18| 65.72 | 49.7 | 65.9* | 36.3 | 83.0 | 83.5 | 67.9 | 82.2 | 84.9 | 39.0 | 64.8* |
98
+ | Qwen2.5-32B | 66.54 | 39.07 | 82.34 | 48.26 | 87.49 | 82.44 | 77.89 | 84.66 | 82.4 | 26.10 | 70.57 |
99
+ | Mistral-Small-24B | 67.6 | 43.20 | 80.11 | 78.51 | 87.19 | 77.26 | 65.86 | 83.72 | 66.5 | 24.38 | 68.14 |
100
+ | Llama-3.1-70B | 69.99 | 32.91 | 82.97 | 76.96 | 94.47 | 87.99 | 56.17 | 85.15 | 76.4 | 46.50 | 66.83 |
101
+ | Llama-3.3-70B | 72.96 | 36.48 | 85.79 | 77.99 | 93.56 | 90.76 | 71.84 | 85.85 | 70.4 | 48.24 | 66.11 |
102
+
 
 
103
 
104
 
105
  ## License and use