Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published 11 days ago • 78
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 18 days ago • 578
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • 17 days ago • 608
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published Jun 6 • 29
Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods Paper • 2505.17870 • Published May 23 • 5
Cache Me if You Can: Accelerating Diffusion Models through Block Caching Paper • 2312.03209 • Published Dec 6, 2023 • 21
RealHarm: A Collection of Real-World Language Model Application Failures Paper • 2504.10277 • Published Apr 14 • 11
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 318
Min P Sampling: Balancing Creativity and Coherence at High Temperature Paper • 2407.01082 • Published Jul 1, 2024 • 1
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 447
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Paper • 2502.18080 • Published Feb 25 • 2
view article Article From Files to Chunks: Improving Hugging Face Storage Efficiency By jsulz and 1 other • Nov 20, 2024 • 63
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 115
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 190
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 59