Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7, 2024 • 50
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 93
TULIP: Towards Unified Language-Image Pretraining Paper • 2503.15485 • Published 28 days ago • 44
Atlas: Multi-Scale Attention Improves Long Context Image Modeling Paper • 2503.12355 • Published Mar 16 • 11
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11 • 49
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 381
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published Jan 6 • 27
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published Dec 30, 2024 • 23
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published Dec 23, 2024 • 33
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published Oct 9, 2024 • 18
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 112
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion Paper • 2410.03825 • Published Oct 4, 2024 • 19
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 6
Contrastive Localized Language-Image Pre-Training Paper • 2410.02746 • Published Oct 3, 2024 • 36
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140
Language Models Learn to Mislead Humans via RLHF Paper • 2409.12822 • Published Sep 19, 2024 • 10