AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17 • 40
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Paper • 2507.13332 • Published Jul 17 • 48
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Paper • 2507.13344 • Published Jul 17 • 55
π^3: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published Jul 17 • 64
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 72
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15 • 63
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22 • 62
Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory Paper • 2507.16713 • Published Jul 22 • 21
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper • 2507.16746 • Published Jul 22 • 33
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 37
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers Paper • 2507.08422 • Published Jul 11 • 35
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance Paper • 2507.18192 • Published about 1 month ago • 7
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published 29 days ago • 29
JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment Paper • 2507.20880 • Published 27 days ago • 10
Met^2Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems Paper • 2507.17189 • Published Jul 23 • 12
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning Paper • 2507.21049 • Published 26 days ago • 40
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published 27 days ago • 56