RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published 14 days ago • 27
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published Mar 18 • 148
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset By sandhawalia and 1 other • Mar 11 • 80
view article Article FastRTC: The Real-Time Communication Library for Python By freddyaboulton and 1 other • Feb 25 • 161
Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data Paper • 2409.00096 • Published Aug 27, 2024 • 1
RNR: Teaching Large Language Models to Follow Roles and Rules Paper • 2409.13733 • Published Sep 10, 2024 • 1
Response Tuning: Aligning Large Language Models without Instruction Paper • 2410.02465 • Published Oct 3, 2024 • 13
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Paper • 2502.06060 • Published Feb 9 • 38
abliterated-v3 Collection Latest gen of the abliterated models I've produced • 17 items • Updated Jun 3, 2024 • 119
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training Paper • 2405.06932 • Published May 11, 2024 • 21
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Paper • 2404.07647 • Published Apr 11, 2024 • 4
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8, 2024 • 39