DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 7 days ago • 102
Nougat: Neural Optical Understanding for Academic Documents Paper • 2308.13418 • Published Aug 25, 2023 • 38
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning Paper • 2503.07608 • Published 15 days ago • 19
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 186
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 122
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published Jan 16 • 35
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 104
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 127
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 145
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing Paper • 2412.04280 • Published Dec 5, 2024 • 14
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 62
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 46
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7, 2024 • 51