Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published 6 days ago • 35
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published 11 days ago • 54
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 13 days ago • 60
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Paper • 2504.05506 • Published 28 days ago • 21
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published 18 days ago • 48
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 18 days ago • 88
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 18 days ago • 21
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published 28 days ago • 81
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 28 days ago • 179
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper • 2504.02605 • Published Apr 3 • 45
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 50
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 34