guipenedo HF Staff commited on
Commit
de63a1d
Β·
verified Β·
1 Parent(s): 8c10697

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -15,4 +15,5 @@ This is the home of the 🍷 **FineData** team, a branch of the πŸ€— **Hugging F
15
  - **[πŸ“š FineWeb-Edu](https://huggingface.co/collections/HuggingFaceFW/fineweb-edu-6659c3f3d399d0e1d648adfd)**: a filtered subset of the most educational content from FineWeb.
16
  - **[πŸ₯‚ FineWeb2](https://huggingface.co/collections/HuggingFaceFW/fineweb2-6755657a481dae41e8fbba4d)**: an extension of FineWeb to over 1000 languages. See the [paper](https://arxiv.org/abs/2506.20920).
17
  - **[πŸ“„ FinePDFs](https://huggingface.co/collections/HuggingFaceFW/finepdfs-68bd02d20928419c1dc12296)**: 3T tokens of text data extracted from PDFs sourced from the Web.
18
- - **[🌐 FineWiki](https://huggingface.co/collections/HuggingFaceFW/finewiki-68f6615c6bb86563dcd5e846)**: an updated, better extracted version of Wikipedia in 300+ languages.
 
 
15
  - **[πŸ“š FineWeb-Edu](https://huggingface.co/collections/HuggingFaceFW/fineweb-edu-6659c3f3d399d0e1d648adfd)**: a filtered subset of the most educational content from FineWeb.
16
  - **[πŸ₯‚ FineWeb2](https://huggingface.co/collections/HuggingFaceFW/fineweb2-6755657a481dae41e8fbba4d)**: an extension of FineWeb to over 1000 languages. See the [paper](https://arxiv.org/abs/2506.20920).
17
  - **[πŸ“„ FinePDFs](https://huggingface.co/collections/HuggingFaceFW/finepdfs-68bd02d20928419c1dc12296)**: 3T tokens of text data extracted from PDFs sourced from the Web.
18
+ - **[🌐 FineWiki](https://huggingface.co/collections/HuggingFaceFW/finewiki-68f6615c6bb86563dcd5e846)**: an updated, better extracted version of Wikipedia in 300+ languages.
19
+ - **[πŸ“„ FinePDFs-Edu](https://huggingface.co/datasets/HuggingFaceFW/finepdfs-edu)**: 350B+ highly educational tokens filtered from πŸ“„ FinePDFs