Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 3 days ago • 10
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated 3 days ago • 21