Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated 5 days ago • 73
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 20 items • Updated 14 days ago • 126
👩💻 OlympicCoder Collection Reasoning datasets and models for competitive coding • 4 items • Updated Mar 11 • 16
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated 27 days ago • 104
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 61
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 223
WildChat-50m Collection All model responses associated with the WildChat-50m paper. • 55 items • Updated Jan 29 • 8
Financial Sentiment Analysis 💲📈 Collection Financial Sentiment Analysis models I created • 3 items • Updated Jan 16 • 4
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 63
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP Paper • 2408.04303 • Published Aug 8, 2024 • 21