codelion/fineweb-edu-10M
Viewer
•
Updated
•
9.46k
•
73
A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations.