Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
HuggingFaceFW 's Collections
πŸ₯‚ FineWeb2
🍷 FineWeb datasets
πŸ“š FineWeb-Edu
πŸ“€ Dataset comparison models
πŸ§ͺ FineWeb v1 data experiments

🍷 FineWeb datasets

updated 21 days ago
Upvote
24

  • Running
    958
    958

    FineWeb: decanting the web for the finest text data at scale

    🍷

    Generate high-quality web text data for LLM training


  • HuggingFaceFW/fineweb

    Viewer β€’ Updated Jan 31 β€’ 25B β€’ 399k β€’ 2.18k

  • HuggingFaceFW/fineweb-edu

    Viewer β€’ Updated Jan 31 β€’ 3.3B β€’ 134k β€’ 684

  • HuggingFaceFW/fineweb-edu-score-2

    Viewer β€’ Updated Apr 11 β€’ 13.1B β€’ 12k β€’ 76

  • The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

    Paper β€’ 2406.17557 β€’ Published Jun 25, 2024 β€’ 98

  • πŸ“€ Dataset comparison models

    Collection
    1.8B models trained on 350BT to compare different pretraining datasets β€’ 8 items β€’ Updated Jun 12, 2024 β€’ 38

  • πŸ§ͺ FineWeb v1 data experiments

    Collection
    Ablation models trained for our data experiments. β€’ 22 items β€’ Updated Jun 12, 2024 β€’ 5
Upvote
24
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs