SuperBPE Collection SuperBPE tokenizers and models trained with them • 7 items • Updated 6 days ago • 11
SANA-1.5 Collection SANA-1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer • 6 items • Updated 6 days ago • 2
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 212
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published 12 days ago • 26
EuroBERT Collection Scaling Multilingual Encoders for European Languages • 4 items • Updated 15 days ago • 10
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing Paper • 2502.14458 • Published Feb 20 • 2
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published 26 days ago • 20
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 55
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding Paper • 2402.05109 • Published Feb 7, 2024 • 1