Directly usable outputs from the pipeline

dataproc5
classroom
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
What is this?
A dataprocessing pipeline that uses huggingface datsets as intermediate data store.
Metadata are designed to be updated like a DAG, where some depends on others.
Workflows are gradually being built over time and maybe we'll see hundreds of data repos one day.
How do I use it?
To load files in local, Huggingface as well as S3 a tool is being developed in progress.
Collections
2
models
0
None public yet
datasets
16
dataproc5/metrics-danbooru2025-id-url-pairs
Viewer
•
Updated
•
9.11M
•
79
dataproc5/metircs-danbooru2025-id-url-pairs
Viewer
•
Updated
•
9.11M
•
83
dataproc5/danbooru2025-tag-balanced-2k
Viewer
•
Updated
•
2k
•
93
dataproc5/danbooru2025-tag-balanced-210k
Viewer
•
Updated
•
263k
•
88
dataproc5/danbooru2025-tag-balanced-10k
Viewer
•
Updated
•
10k
•
84
dataproc5/danbooru2025-tag-balanced-100k
Viewer
•
Updated
•
48.7k
•
93
dataproc5/intermediate-danbooru2025-metadata-prioritized
Viewer
•
Updated
•
9.11M
•
121
dataproc5/intermediate-danbooru2025-balancing-tags
Viewer
•
Updated
•
9.11M
•
106
dataproc5/metrics-danbooru2025-alltime-tag-counts
Viewer
•
Updated
•
859k
•
97
•
1
dataproc5/intermediate-danbooru2025-row-priorities
Viewer
•
Updated
•
9.11M
•
69