Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos Paper • 2605.18984 • Published 16 days ago • 22
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published 21 days ago • 33
VIBE Model Results Collection This collection archives the raw output generations from various models evaluated on the VIBE benchmark. • 16 items • Updated Feb 1 • 3