amanrangapur commited on
Commit
8da9886
·
verified ·
1 Parent(s): bcd66f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -11,7 +11,7 @@ language:
11
 
12
  # Model Card for OLMo 2 32B
13
 
14
- We introduce OLMo 2 32B, to the family of 7B and 13B models featuring a 9-point increase in MMLU, among other evaluation improvements, compared to the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model. These gains come from training on [OLMo-mix-0325](https://huggingface.co/datasets/allenai/olmo-mix-1124) and [Dolmino-mix-0325](https://huggingface.co/datasets/allenai/dolmino-mix-1124) datasets and staged training approach.
15
 
16
  OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
17
  These models are trained on the Dolma dataset. We have released all code, checkpoints, logs, and associated training details on [GitHub](https://github.com/allenai/OLMo-core).
@@ -160,7 +160,7 @@ Core model results for OLMo 2 32B are found below.
160
  - 32B Model: ~1 epoch
161
 
162
  #### Stage 2: Fine-tuning
163
- - Dataset: [Dolmino-Mix-0325](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (843B tokens)
164
  - Three training mixes:
165
  - 100B tokens
166
  - 100B tokens
 
11
 
12
  # Model Card for OLMo 2 32B
13
 
14
+ We introduce OLMo 2 32B, to the family of 7B and 13B models featuring a 9-point increase in MMLU, among other evaluation improvements, compared to the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model. These gains come from training on [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) and Dolmino-mix-0325 (releasing soon) datasets and staged training approach.
15
 
16
  OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
17
  These models are trained on the Dolma dataset. We have released all code, checkpoints, logs, and associated training details on [GitHub](https://github.com/allenai/OLMo-core).
 
160
  - 32B Model: ~1 epoch
161
 
162
  #### Stage 2: Fine-tuning
163
+ - Dataset: Dolmino-Mix-0325 (releasing soon)
164
  - Three training mixes:
165
  - 100B tokens
166
  - 100B tokens