Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
86
11
16
Guilherme Penedo
guipenedo
Follow
KayzedDobry's profile picture
msubramm's profile picture
iacolippo's profile picture
864 followers
·
22 following
gui_penedo
guipenedo
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 1 hour ago
HuggingFaceFW/fineweb
liked
a dataset
about 2 hours ago
open-r1/OpenThoughts-114k-math
updated
a dataset
about 4 hours ago
open-r1/OpenThoughts-114k-math
View all activity
Articles
FineWeb2-C: Help Build Better Language Models in Your Language
Dec 23, 2024
•
18
Organizations
guipenedo
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
HuggingFaceFW/fineweb-edu
7 days ago
New update returns a 500 server error using the datasets-server API
6
#18 opened about 1 month ago by
jonna32
New activity in
HuggingFaceFW/fineweb-2
10 days ago
Synthetic Data Generator
1
#5 opened 19 days ago by
kishorekashyap
New activity in
HuggingFaceFW/fineweb-2
22 days ago
Cannot load with datasets
3
#4 opened 22 days ago by
mbanon
New activity in
HuggingFaceFW/fineweb-edu
24 days ago
A lot of load errors after new update
14
#19 opened 24 days ago by
yzhangcs
Add "date" column to "default" subset
#20 opened 24 days ago by
lhoestq
New activity in
HuggingFaceFW/fineweb
about 1 month ago
Simple exact deduplication removes 2/3 of data.
4
#49 opened 6 months ago by
egor-pakhomov
Torrent?
3
#4 opened 9 months ago by
emilss
Any plan to train models on larger subset of dataset?
1
#8 opened 9 months ago by
mrfakename
Are copyrighted works included in this dataset?
4
#9 opened 9 months ago by
umm-maybe
Reprocessing for a new language
14
#12 opened 9 months ago by
pere
Training configs for data ablation study
2
#14 opened 9 months ago by
jimmyhbx
tiny-fineweb
3
#19 opened 9 months ago by
3thn
Unsafe files
1
#25 opened 9 months ago by
alielfilali01
"Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20" using fineweb by Karpathy
#28 opened 8 months ago by
clem
Regarding to the newly updated indexes(writen as deduplication issues)
5
#29 opened 8 months ago by
kimcando
Dedup
1
#32 opened 8 months ago by
shawnkx
Language subset
3
#33 opened 8 months ago by
talmor
How to compute the aggerate score?
1
#35 opened 8 months ago by
mornmirror
why do you apply "All filters except the (very destructive) terminal_punct"
3
#36 opened 8 months ago by
bpwl0121
Reproducibility of the work for other languages
3
#38 opened 8 months ago by
camillop
Load more