Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
HuggingFaceTB
's Collections
🧠 SmolLM3
SmolLM3 pretraining datasets
SmolLM3 evaluation datasets
Dolma LongAttn Graded
Reasoning datasets
SmolLM2
SmolVLM2 📺 Smallest video LM ever 🤏🏻
📚 LLM pretraining datasets
SmolVLM
🧩 SmolLM2 Intermediate Checkpoints
The Ultimate Collection of Code Classifiers
SmolVLM 256M & 500M
📐 FineMath
💻 Local SmolLMs
🪐 SmolLM
Instruct datasets
🌌 Cosmopedia
Find textbooks in FineWeb with a classifier
FineWeb clustering & synthetic generations
Other: Stanford, OpenStax, khanAcademy, wikihow...
FW generation prompts
Wikipedia Science topics
Wikipedia textbooks
SFT Experiments
Decay mixture experiments
models
SmolLM3 pretraining datasets
updated
7 days ago
datasets used in SmolLM3 pretraining
Upvote
14
+4
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
3 days ago
•
3.5B
•
171k
•
713
mlfoundations/dclm-baseline-1.0
Preview
•
Updated
Jul 22, 2024
•
75.4k
•
225
epfml/FineWeb2-HQ
Viewer
•
Updated
Feb 19
•
380M
•
12.4k
•
14
HuggingFaceTB/finemath
Viewer
•
Updated
Feb 6
•
48.3M
•
22.1k
•
329
bigcode/the-stack-v2
Viewer
•
Updated
Apr 23, 2024
•
5.45B
•
3.05k
•
387
HuggingFaceTB/issues-kaggle-notebooks
Viewer
•
Updated
Mar 19
•
16.1M
•
240
•
10
LLM360/MegaMath
Viewer
•
Updated
Apr 9
•
217M
•
45.8k
•
95
HuggingFaceTB/stack-edu
Viewer
•
Updated
Mar 20
•
167M
•
2.17k
•
43
Note
Stage2 new datasets
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
Sep 6, 2024
•
237M
•
22.9k
•
347
allenai/dolmino-mix-1124
Viewer
•
Updated
Dec 17, 2024
•
165M
•
15.4k
•
64
nvidia/OpenMathReasoning
Viewer
•
Updated
May 27
•
5.68M
•
14.8k
•
305
nvidia/OpenCodeReasoning
Viewer
•
Updated
May 4
•
753k
•
3.26k
•
480
facebook/natural_reasoning
Viewer
•
Updated
Feb 21
•
1.15M
•
1.06k
•
508
Note
Stage 3 (decay) new datasets
Upvote
14
+10
Share collection
View history
Collection guide
Browse collections