The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published 1 day ago • 22
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published Feb 26 • 7
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published Feb 26 • 19
view article Article 🚨 ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming By sted97 • Jun 25, 2024 • 5
view article Article Low Latency CPU Based Educational Value Classifier With Generic Educational Value By kenhktsui • Jun 12, 2024 • 9
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 43
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 130