ProX Dataset a collection of pre-training corpora refined by ProX gair-prox/DCLM-pro Viewer • Updated Feb 15 • 366M • 1.68k • 11 gair-prox/FineWeb-pro Viewer • Updated Sep 26, 2024 • 63.1M • 408 • 24 gair-prox/open-web-math-pro Viewer • Updated Sep 26, 2024 • 2.58M • 418 • 12 gair-prox/RedPajama-pro Viewer • Updated Sep 26, 2024 • 10.2M • 153 • 4
ProX Math Models base models trained on ProX curated openwebmath-pro. gair-prox/Mistral-7B-ProXMath Text Generation • 7B • Updated Sep 28, 2024 • 21 • 3 gair-prox/TinyLlama-1.1B-ProXMath 1B • Updated Oct 10, 2024 • 14 • 2 gair-prox/Llama-2-7B-ProXMath Text Generation • Updated Oct 10, 2024 • 46 • 1 gair-prox/CodeLlama-7B-ProXMath Updated Oct 10, 2024 • 16 • 1
ProX Refining Models Adapted small language models used to generate data refining programs gair-prox/web-doc-refining-lm Text Generation • 0.4B • Updated Oct 10, 2024 • 54 • 4 gair-prox/web-chunk-refining-lm Text Generation • 0.4B • Updated Oct 10, 2024 • 51 • 5 gair-prox/math-doc-refining-lm Text Generation • 0.8B • Updated Oct 10, 2024 • 19 • 2 gair-prox/math-chunk-refining-lm Text Generation • 0.4B • Updated Oct 10, 2024 • 18 • 1
ProX General Models base models trained on ProX curated data. gair-prox/FW-ProX-1.7B Text Generation • 2B • Updated Sep 28, 2024 • 40 • 4 gair-prox/RedPJ-ProX-0.7B 0.8B • Updated Oct 10, 2024 • 7 • 1 gair-prox/ProX-RedPJ-1.7B-25B 2B • Updated Sep 16, 2024 • 10 gair-prox/RedPJ-ProX-0.3B 0.4B • Updated Oct 10, 2024 • 11 • 2
ProX Dataset a collection of pre-training corpora refined by ProX gair-prox/DCLM-pro Viewer • Updated Feb 15 • 366M • 1.68k • 11 gair-prox/FineWeb-pro Viewer • Updated Sep 26, 2024 • 63.1M • 408 • 24 gair-prox/open-web-math-pro Viewer • Updated Sep 26, 2024 • 2.58M • 418 • 12 gair-prox/RedPajama-pro Viewer • Updated Sep 26, 2024 • 10.2M • 153 • 4
ProX Refining Models Adapted small language models used to generate data refining programs gair-prox/web-doc-refining-lm Text Generation • 0.4B • Updated Oct 10, 2024 • 54 • 4 gair-prox/web-chunk-refining-lm Text Generation • 0.4B • Updated Oct 10, 2024 • 51 • 5 gair-prox/math-doc-refining-lm Text Generation • 0.8B • Updated Oct 10, 2024 • 19 • 2 gair-prox/math-chunk-refining-lm Text Generation • 0.4B • Updated Oct 10, 2024 • 18 • 1
ProX Math Models base models trained on ProX curated openwebmath-pro. gair-prox/Mistral-7B-ProXMath Text Generation • 7B • Updated Sep 28, 2024 • 21 • 3 gair-prox/TinyLlama-1.1B-ProXMath 1B • Updated Oct 10, 2024 • 14 • 2 gair-prox/Llama-2-7B-ProXMath Text Generation • Updated Oct 10, 2024 • 46 • 1 gair-prox/CodeLlama-7B-ProXMath Updated Oct 10, 2024 • 16 • 1
ProX General Models base models trained on ProX curated data. gair-prox/FW-ProX-1.7B Text Generation • 2B • Updated Sep 28, 2024 • 40 • 4 gair-prox/RedPJ-ProX-0.7B 0.8B • Updated Oct 10, 2024 • 7 • 1 gair-prox/ProX-RedPJ-1.7B-25B 2B • Updated Sep 16, 2024 • 10 gair-prox/RedPJ-ProX-0.3B 0.4B • Updated Oct 10, 2024 • 11 • 2