AnyI2V: Animating Any Conditional Image with Motion Control Paper • 2507.02857 • Published 21 days ago • 11
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published 10 days ago • 33
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published 11 days ago • 48
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published 24 days ago • 79
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Paper • 2506.21356 • Published 28 days ago • 22