Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Duplicated from
lhoestq/Common-Crawl-Pipeline-Creator
Nymbo
/
Common-Crawl-Pipeline-Creator
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
Common-Crawl-Pipeline-Creator
/
images
145 kB
1 contributor
History:
1 commit
lhoestq
HF Staff
view pipeline result
7eed258
about 1 year ago
00_1st_step_url_filtering.png
Safe
13.6 kB
view pipeline result
about 1 year ago
01_2nd_step_text_extraction.png
Safe
14.3 kB
view pipeline result
about 1 year ago
02_3rd_step_language_filtering.png
Safe
19.9 kB
view pipeline result
about 1 year ago
03_4th_step_gopher_filtering.png
Safe
22.3 kB
view pipeline result
about 1 year ago
10_8th_step_pii_removal.png
Safe
20.9 kB
view pipeline result
about 1 year ago
11_7th_step_custom_filters.png
Safe
17 kB
view pipeline result
about 1 year ago
12_6th_step_c4_filters.png
Safe
14.9 kB
view pipeline result
about 1 year ago
13_5th_step_gopher_filtering.png
Safe
22 kB
view pipeline result
about 1 year ago