SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context Paper • 2411.16213 • Published Nov 25, 2024 • 2
VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models Paper • 2504.16359 • Published Apr 23 • 3
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video Paper • 2505.02064 • Published May 4 • 4
UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models Paper • 2505.14679 • Published May 20 • 6
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions Paper • 2505.15472 • Published May 21 • 3
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents Paper • 2508.19493 • Published Aug 27 • 11