--- license: apache-2.0 datasets: - allenai/datadecide language: - en --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62bddd0b1e22ec8427a0f27e/MwddQs_8OaU4128VYrwoU.png) More than one training run goes into making a large language model, but developers rarely release the small models and datasets they experiment with during the development process. How do they decide what dataset to use for pretraining or which benchmarks to hill climb on? To empower open exploration of these questions, we release [DataDecide](allenai.org/paper/datadecide)—a suite of models we pretrain on 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, over 14 different model sizes ranging from 4M parameters up to 1B parameters (more than 30k model checkpoints in total). ## 350 Models over Differences in Data in Scale For each of our 25 datasets and 14 model sizes, we train a model linked below. Each has intermediate checkpoints (uploading after initial release), runs over 3 random seeds. All models finish training at a token to parameter ratio of 100 (e.g., 1B parameters -> 100B tokens). | | | | | | | | | | | | | | | | |-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|------|------|------|------|-----| | Dolma1.7 | [4M](https://huggingface.co/allenai/DataDecide-dolma1_7-4M) | [6M](https://huggingface.co/allenai/DataDecide-dolma1_7-6M) | [8M](https://huggingface.co/allenai/DataDecide-dolma1_7-8M) | [10M](https://huggingface.co/allenai/DataDecide-dolma1_7-10M) | [14M](https://huggingface.co/allenai/DataDecide-dolma1_7-14M) | [16M](https://huggingface.co/allenai/DataDecide-dolma1_7-16M) | [20M](https://huggingface.co/allenai/DataDecide-dolma1_7-20M) | [60M](https://huggingface.co/allenai/DataDecide-dolma1_7-60M) | [90M](https://huggingface.co/allenai/DataDecide-dolma1_7-90M) | [150M](https://huggingface.co/allenai/DataDecide-dolma1_7-150M) | [300M](https://huggingface.co/allenai/DataDecide-dolma1_7-300M) | [530M](https://huggingface.co/allenai/DataDecide-dolma1_7-530M) | [750M](https://huggingface.co/allenai/DataDecide-dolma1_7-750M) | [1B](https://huggingface.co/allenai/DataDecide-dolma1_7-1B) | | Dolma1.7 (no code) | [4M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-4M) | [6M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-6M) | [8M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-8M) | [10M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-10M) | [14M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-14M) | [16M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-16M) | [20M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-20M) | [60M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-60M) | [90M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-90M) | [150M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-150M) | [300M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-300M) | [530M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-530M) | [750M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-750M) | [1B](https://huggingface.co/allenai/DataDecide-dolma1_7-no-code-1B) | | Dolma1.7 (no math, code) | [4M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-4M) | [6M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-6M) | [8M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-8M) | [10M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-10M) | [14M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-14M) | [16M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-16M) | [20M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-20M) | [60M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-60M) | [90M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-90M) | [150M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-150M) | [300M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-300M) | [530M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-530M) | [750M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-750M) | [1B](https://huggingface.co/allenai/DataDecide-dolma1_7-no-math-code-1B) | | Dolma1.7 (no Reddit) | [4M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-4M) | [6M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-6M) | [8M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-8M) | [10M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-10M) | [14M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-14M) | [16M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-16M) | [20M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-20M) | [60M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-60M) | [90M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-90M) | [150M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-150M) | [300M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-300M) | [530M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-530M) | [750M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-750M) | [1B](https://huggingface.co/allenai/DataDecide-dolma1_7-no-reddit-1B) | | Dolma1.7 (no Flan) | [4M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-4M) | [6M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-6M) | [8M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-8M) | [10M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-10M) | [14M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-14M) | [16M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-16M) | [20M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-20M) | [60M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-60M) | [90M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-90M) | [150M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-150M) | [300M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-300M) | [530M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-530M) | [750M](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-750M) | [1B](https://huggingface.co/allenai/DataDecide-dolma1_7-no-flan-1B) | | Dolma1.6++ | [4M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-4M) | [6M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-6M) | [8M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-8M) | [10M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-10M) | [14M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-14M) | [16M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-16M) | [20M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-20M) | [60M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-60M) | [90M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-90M) | [150M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-150M) | [300M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-300M) | [530M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-530M) | [750M](https://huggingface.co/allenai/DataDecide-dolma1_6plus-750M) | [1B](https://huggingface.co/allenai/DataDecide-dolma1_6plus-1B) | | C4 | [4M](https://huggingface.co/allenai/DataDecide-c4-4M) | [6M](https://huggingface.co/allenai/DataDecide-c4-6M) | [8M](https://huggingface.co/allenai/DataDecide-c4-8M) | [10M](https://huggingface.co/allenai/DataDecide-c4-10M) | [14M](https://huggingface.co/allenai/DataDecide-c4-14M) | [16M](https://huggingface.co/allenai/DataDecide-c4-16M) | [20M](https://huggingface.co/allenai/DataDecide-c4-20M) | [60M](https://huggingface.co/allenai/DataDecide-c4-60M) | [90M](https://huggingface.co/allenai/DataDecide-c4-90M) | [150M](https://huggingface.co/allenai/DataDecide-c4-150M) | [300M](https://huggingface.co/allenai/DataDecide-c4-300M) | [530M](https://huggingface.co/allenai/DataDecide-c4-530M) | [750M](https://huggingface.co/allenai/DataDecide-c4-750M) | [1B](https://huggingface.co/allenai/DataDecide-c4-1B) | | FineWeb-Pro | [4M](https://huggingface.co/allenai/DataDecide-fineweb-pro-4M) | [6M](https://huggingface.co/allenai/DataDecide-fineweb-pro-6M) | [8M](https://huggingface.co/allenai/DataDecide-fineweb-pro-8M) | [10M](https://huggingface.co/allenai/DataDecide-fineweb-pro-10M) | [14M](https://huggingface.co/allenai/DataDecide-fineweb-pro-14M) | [16M](https://huggingface.co/allenai/DataDecide-fineweb-pro-16M) | [20M](https://huggingface.co/allenai/DataDecide-fineweb-pro-20M) | [60M](https://huggingface.co/allenai/DataDecide-fineweb-pro-60M) | [90M](https://huggingface.co/allenai/DataDecide-fineweb-pro-90M) | [150M](https://huggingface.co/allenai/DataDecide-fineweb-pro-150M) | [300M](https://huggingface.co/allenai/DataDecide-fineweb-pro-300M) | [530M](https://huggingface.co/allenai/DataDecide-fineweb-pro-530M) | [750M](https://huggingface.co/allenai/DataDecide-fineweb-pro-750M) | [1B](https://huggingface.co/allenai/DataDecide-fineweb-pro-1B) | | FineWeb-Edu | [4M](https://huggingface.co/allenai/DataDecide-fineweb-edu-4M) | [6M](https://huggingface.co/allenai/DataDecide-fineweb-edu-6M) | [8M](https://huggingface.co/allenai/DataDecide-fineweb-edu-8M) | [10M](https://huggingface.co/allenai/DataDecide-fineweb-edu-10M) | [14M](https://huggingface.co/allenai/DataDecide-fineweb-edu-14M) | [16M](https://huggingface.co/allenai/DataDecide-fineweb-edu-16M) | [20M](https://huggingface.co/allenai/DataDecide-fineweb-edu-20M) | [60M](https://huggingface.co/allenai/DataDecide-fineweb-edu-60M) | [90M](https://huggingface.co/allenai/DataDecide-fineweb-edu-90M) | [150M](https://huggingface.co/allenai/DataDecide-fineweb-edu-150M) | [300M](https://huggingface.co/allenai/DataDecide-fineweb-edu-300M) | [530M](https://huggingface.co/allenai/DataDecide-fineweb-edu-530M) | [750M](https://huggingface.co/allenai/DataDecide-fineweb-edu-750M) | [1B](https://huggingface.co/allenai/DataDecide-fineweb-edu-1B) | | Falcon | [4M](https://huggingface.co/allenai/DataDecide-falcon-4M) | [6M](https://huggingface.co/allenai/DataDecide-falcon-6M) | [8M](https://huggingface.co/allenai/DataDecide-falcon-8M) | [10M](https://huggingface.co/allenai/DataDecide-falcon-10M) | [14M](https://huggingface.co/allenai/DataDecide-falcon-14M) | [16M](https://huggingface.co/allenai/DataDecide-falcon-16M) | [20M](https://huggingface.co/allenai/DataDecide-falcon-20M) | [60M](https://huggingface.co/allenai/DataDecide-falcon-60M) | [90M](https://huggingface.co/allenai/DataDecide-falcon-90M) | [150M](https://huggingface.co/allenai/DataDecide-falcon-150M) | [300M](https://huggingface.co/allenai/DataDecide-falcon-300M) | [530M](https://huggingface.co/allenai/DataDecide-falcon-530M) | [750M](https://huggingface.co/allenai/DataDecide-falcon-750M) | [1B](https://huggingface.co/allenai/DataDecide-falcon-1B) | | Falcon+CC | [4M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-4M) | [6M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-6M) | [8M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-8M) | [10M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-10M) | [14M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-14M) | [16M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-16M) | [20M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-20M) | [60M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-60M) | [90M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-90M) | [150M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-150M) | [300M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-300M) | [530M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-530M) | [750M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-750M) | [1B](https://huggingface.co/allenai/DataDecide-falcon-and-cc-1B) | | Falcon+CC (QC 10%) | [4M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-4M) | [6M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-6M) | [8M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-8M) | [10M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-10M) | [14M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-14M) | [16M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-16M) | [20M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-20M) | [60M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-60M) | [90M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-90M) | [150M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-150M) | [300M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-300M) | [530M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-530M) | [750M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-750M) | [1B](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-10p-1B) | | Falcon+CC (QC 20%) | [4M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-4M) | [6M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-6M) | [8M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-8M) | [10M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-10M) | [14M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-14M) | [16M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-16M) | [20M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-20M) | [60M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-60M) | [90M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-90M) | [150M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-150M) | [300M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-300M) | [530M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-530M) | [750M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-750M) | [1B](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-20p-1B) | | Falcon+CC (QC Orig 10%) | [4M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-4M) | [6M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-6M) | [8M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-8M) | [10M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-10M) | [14M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-14M) | [16M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-16M) | [20M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-20M) | [60M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-60M) | [90M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-90M) | [150M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-150M) | [300M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-300M) | [530M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-530M) | [750M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-750M) | [1B](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-orig-10p-1B) | | Falcon+CC (QC Tulu 10%) | [4M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-4M) | [6M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-6M) | [8M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-8M) | [10M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-10M) | [14M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-14M) | [16M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-16M) | [20M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-20M) | [60M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-60M) | [90M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-90M) | [150M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-150M) | [300M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-300M) | [530M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-530M) | [750M](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-750M) | [1B](https://huggingface.co/allenai/DataDecide-falcon-and-cc-qc-tulu-10p-1B) | | DCLM-Baseline | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-1B) | | DCLM-Baseline (QC 7%, FW2) | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw2-1B) | | DCLM-Baseline (QC 7%, FW3) | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-7p-fw3-1B) | | DCLM-Baseline (QC FW 3%) | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-3p-1B) | | DCLM-Baseline (QC FW 10%) | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-fw-10p-1B) | | DCLM-Baseline (QC 10%) | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-10p-1B) | | DCLM-Baseline (QC 20%) | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-qc-20p-1B) | | DCLM-Baseline 25% / Dolma 75% | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-25p-dolma1.7-75p-1B) | | DCLM-Baseline 50% / Dolma 50% | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-50p-dolma1.7-50p-1B) | | DCLM-Baseline 75% / Dolma 25% | [4M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-4M) | [6M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-6M) | [8M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-8M) | [10M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-10M) | [14M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-14M) | [16M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-16M) | [20M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-20M) | [60M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-60M) | [90M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-90M) | [150M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-150M) | [300M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-300M) | [530M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-530M) | [750M](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-750M) | [1B](https://huggingface.co/allenai/DataDecide-dclm-baseline-75p-dolma1.7-25p-1B) | ## Load a Model To load a specific model with HuggingFace: ``` from hf_olmo import OLMoForCausalLM # pip install ai2-olmo olmo = OLMoForCausalLM.from_pretrained("allenai/DataDecide-dolma1_7-1B", revision="step69369-seed-default") ``` ### Model Description - **Developed by:** Allen Institute for AI (Ai2) - **Model type:** a Transformer style autoregressive language model. - **Language(s) (NLP):** English - **License:** The code and model are released under Apache 2.0. - **Contact:** Technical inquiries: `ianmag@cs.washington.edu`. Press: `press@allenai.org` ### Model Sources - **Repository:** [https://github.com/allenai/DataDecide](https://github.com/allenai/DataDecide) - **Paper:** [https:/allenai.org/paper/datadecide](https:/allenai.org/paper/datadecide) - **Data:** [https://huggingface.co/datasets/allenai/datadecide](https://huggingface.co/datasets/allenai/datadecide) ## Data | Source / Recipe | Description | |----------------------------------------|-------------| | **Dolma1.7** *Original, No code, No math/code, No Reddit, No Flan* | A 2.3T-token corpus (Dolma; 1.7 [Soldaini et al., 2024](https://arxiv.org/abs/2402.00159)) sampling common LM sources for open research. We ablate code, math/code, Reddit, or Flan subsets. | | **Dolma1.6++** *Original* | Dolma 1.6 plus additional sources from Dolma 1.7: RedPajama’s arxiv subset, openwebmath, algebraic stack, flan, starcoder, falcon. | | **C4** *Original* | The C4 dataset ([Raffel et al., 2019](https://arxiv.org/abs/1910.10683)) as prepared in Dolma 1.7, heuristically filtered from the April 2019 Common Crawl. | | **FineWeb-Pro** *Original* | The FineWeb Pro corpus ([Zhou et al., 2024](https://arxiv.org/abs/2409.17115)), featuring model-driven data cleaning on FineWeb. | | **FineWeb-Edu** *Original* | The deduplicated FineWeb-Edu subset of SmoLLM-Corpus ([Ben Allal et al., 2024](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus)), focused on educational web pages. | | **Falcon** *Original* | The Falcon RefinedWeb corpus ([Penedo et al., 2023](https://api.semanticscholar.org/CorpusID:259063761)) in Dolma 1.7, derived from Common Crawl through June 2023 and more aggressively filtered/deduplicated than C4. | | **Falcon+CC** *Original, QC 10%, QC 20%, QC Orig 10%, QC Tulu 10%* | Falcon and Dolma 1.7’s Common Crawl. We quality filter to top 10% or 20% documents with reproduced or original [Li et al. (2024)](https://arxiv.org/abs/2406.11794) filter or retrain filter on pre-release version of Tulu-v3 ([Lambert et al., 2024](https://arxiv.org/abs/2411.15124)). | | **DCLM-Baseline** *Original, QC 7% FW2, QC 7% FW3, QC FW 10%, QC 10%, QC 20%* | A SOTA Common Crawl corpus using best ablated deduplication, cleaning heuristics, and quality filter. We quality filter to top 7% of DCLM classified documents and further take 2+ or 3+ scores with FineWeb-edu classifier; or filter to top 3% or 10% with FineWeb-edu classifier; or take top 10% or 20% with reproduced DCLM classifier. | | *λ%* **DCLM-Baseline** *+ 1 – λ%* **Dolma1.7** | Fractional combinations of Dolma1.7 and DCLM-Baseline mixing different proportions of the two datasets for λ ∈ {25%, 50%, 75%}. | ## Evaluation We evaluate all checkpoints over OLMES suite of 10 multiple choice question answering benchmarks ([Gu et al., 2024](https://arxiv.org/abs/2406.08446)): - [MMLU (Hendrycks et al., 2021)](https://arxiv.org/abs/2009.03300) - [HellaSwag (Zellers et al., 2019)](https://arxiv.org/abs/1905.07830) - [ARC-Challenge (Clark et al., 2018)](https://arxiv.org/abs/1803.05457) - [ARC-Easy (Clark et al., 2018)](https://arxiv.org/abs/1803.05457) - [PIQA (Bisk et al., 2020)](https://arxiv.org/abs/1911.11641) - [CommonsenseQA (Talmor et al., 2019)](https://arxiv.org/abs/1811.00937) - [Social IQa (Sap et al., 2019)](https://arxiv.org/abs/1904.09728) - [OpenBookQA (Mihaylov et al., 2018)](https://arxiv.org/abs/1809.02789) - [BoolQ (Clark et al., 2019)](https://arxiv.org/abs/1905.10044) - [Winogrande (Sakaguchi et al., 2020)](https://arxiv.org/abs/1907.10641) We release all these evaluations: - for task-level metric results: [https://huggingface.co/datasets/allenai/DataDecide-eval-results](https://huggingface.co/datasets/allenai/DataDecide-eval-results) - for instance-level results: [https://huggingface.co/datasets/allenai/DataDecide-eval-instances](https://huggingface.co/datasets/allenai/DataDecide-eval-instances) ## Hyperparameters | Name | Batch Size | Hidden Dim. | LR | Model size | Heads | Layers | Training steps | Tokens trained | |---|---|---|---|---|---|---|---|---| | 4M | 32 | 64 | 1.4e-02 | 3.7M | 8 | 8 | 5,725 | 0.4B | | 6M | 32 | 96 | 1.2e-02 | 6.0M | 8 | 8 | 9,182 | 0.6B | | 8M | 32 | 128 | 1.1e-02 | 8.5M | 8 | 8 | 13,039 | 0.9B | | 10M | 32 | 144 | 1.0e-02 | 9.9M | 8 | 8 | 15,117 | 1.0B | | 14M | 32 | 192 | 9.2e-03 | 14.4M | 8 | 8 | 21,953 | 1.4B | | 16M | 32 | 208 | 8.9e-03 | 16.0M | 8 | 8 | 24,432 | 1.6B | | 20M | 64 | 192 | 8.4e-03 | 19.1M | 8 | 16 | 14,584 | 1.9B | | 60M | 96 | 384 | 5.8e-03 | 57.1M | 12 | 16 | 29,042 | 5.7B | | 90M | 160 | 528 | 4.9e-03 | 97.9M | 12 | 16 | 29,901 | 9.8B | | 150M | 192 | 768 | 4.2e-03 | 151.9M | 12 | 12 | 38,157 | 15.0B | | 300M | 320 | 1,024 | 3.3e-03 | 320.0M | 16 | 16 | 45,787 | 30.0B | | 530M | 448 | 1,344 | 2.8e-03 | 530.1M | 16 | 16 | 57,786 | 53.0B | | 750M | 576 | 1,536 | 2.5e-03 | 681.3M | 16 | 16 | 63,589 | 75.0B | | 1B | 704 | 2,048 | 2.1e-03 | 1176.8M | 16 | 16 | 69,369 | 100.0B | ## Bias, Risks, and Limitations Like any base or fine-tuned language model, AI can be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from any LLM are often inaccurate, so facts should be verified. ## Citation **BibTeX:** ``` @article{MagnussonDataDecide2025, title={{DataDecide: How to Predict Best Pretraining Data with Small Experiments}}, author={Ian Magnusson and Nguyen Tai and Ben Bogin and David Heineman and Jena Hwang and Luca Soldaini and Akshita Bhagia and Jiacheng Liu and Dirk Groeneveld and Oyvind Tafjord and Noah A. Smith and Pang Wei Koh and Jesse Dodge}, year={2025}, journal={arXiv preprint}, } ``` ## Model Card Contact For errors in this model card, contact ianmag@cs.washington.edu