Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
common-pile 's Collections
Common Pile v0.1
Common Pile v0.1 Raw Data
Common Pile v0.1 Filtered Data
Comma v0.1 Artifacts

Common Pile v0.1

updated 23 days ago

All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text

Upvote
25

  • Common Pile v0.1 Raw Data

    Collection
    8TB of public domain and openly licensed text • 30 items • Updated 23 days ago • 13

  • Common Pile v0.1 Filtered Data

    Collection
    An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 23 days ago • 13

  • Comma v0.1 Artifacts

    Collection
    A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated 23 days ago • 4

  • The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

    Paper • 2506.05209 • Published 23 days ago • 42
Upvote
25
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs