allenai
/

OLMo-2-0325-32B

Text Generation

Model card Files Files and versions Community

amanrangapur commited on 16 days ago

Commit

8da9886

·

verified ·

1 Parent(s): bcd66f9

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ language:
 # Model Card for OLMo 2 32B
-We introduce OLMo 2 32B, to the family of 7B and 13B models featuring a 9-point increase in MMLU, among other evaluation improvements, compared to the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model. These gains come from training on [OLMo-mix-0325](https://huggingface.co/datasets/allenai/olmo-mix-1124) and [Dolmino-mix-0325](https://huggingface.co/datasets/allenai/dolmino-mix-1124) datasets and staged training approach.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
 These models are trained on the Dolma dataset. We have released all code, checkpoints, logs, and associated training details on [GitHub](https://github.com/allenai/OLMo-core).
@@ -160,7 +160,7 @@ Core model results for OLMo 2 32B are found below.
 - 32B Model: ~1 epoch
 #### Stage 2: Fine-tuning
-- Dataset: [Dolmino-Mix-0325](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (843B tokens)
 - Three training mixes:
   - 100B tokens
   - 100B tokens

 # Model Card for OLMo 2 32B
+We introduce OLMo 2 32B, to the family of 7B and 13B models featuring a 9-point increase in MMLU, among other evaluation improvements, compared to the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model. These gains come from training on [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) and Dolmino-mix-0325 (releasing soon) datasets and staged training approach.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
 These models are trained on the Dolma dataset. We have released all code, checkpoints, logs, and associated training details on [GitHub](https://github.com/allenai/OLMo-core).
 - 32B Model: ~1 epoch
 #### Stage 2: Fine-tuning
+- Dataset: Dolmino-Mix-0325 (releasing soon)
 - Three training mixes:
   - 100B tokens
   - 100B tokens