Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
common-pile 's Collections
Common Pile v0.1
Common Pile v0.1 Raw Data
Common Pile v0.1 Filtered Data
Comma v0.1 Artifacts

Common Pile v0.1

updated 4 days ago

A collection of artifacts related to the Common Pile v0.1—an 8TB dataset of public domain and openly licensed text

Upvote
5

  • Common Pile v0.1 Raw Data

    Collection
    8TB of public domain and openly licensed text • 30 items • Updated 4 days ago • 2

  • Common Pile v0.1 Filtered Data

    Collection
    An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 13 days ago • 2

  • Comma v0.1 Artifacts

    Collection
    A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated about 21 hours ago • 2
Upvote
5
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs