AnyI2V: Animating Any Conditional Image with Motion Control Paper • 2507.02857 • Published 20 days ago • 11
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published 9 days ago • 33
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published 9 days ago • 48
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published 23 days ago • 78
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Paper • 2506.21356 • Published 27 days ago • 22