view article Article Blazing-Fast Code Editing via Multi-Layer Speculation By ganler and 3 others • 7 days ago • 15
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 17 days ago • 187
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published about 1 month ago • 327
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 125
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 59
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context Jul 23, 2024 • 227
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 72