view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One By rishiraj • 29 days ago • 34
view article Article Transformers Are Getting Old: Variants and Alternatives Exist! By ProCreations • 21 days ago • 42
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published 30 days ago • 28
view article Article Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models. By tiiuae and 9 others • May 15 • 35
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset By sandhawalia and 1 other • Mar 11 • 97
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13 • 36
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published Feb 11 • 41
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 155
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 144
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control By danaaubakirova and 3 others • Feb 4 • 167