One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published 14 days ago • 59
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published 23 days ago • 90
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Paper • 2505.08617 • Published 24 days ago • 41
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 288
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Paper • 2505.10557 • Published 22 days ago • 46
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 26 days ago • 417
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 94
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published Apr 24 • 110
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published Apr 24 • 39
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published Apr 22 • 19
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published Apr 14 • 12
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published Apr 10 • 43
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation Paper • 2503.22675 • Published Mar 28 • 35
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models Paper • 2503.21380 • Published Mar 27 • 37
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published Mar 26 • 47