arxiv:2506.00227

Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes

Published on May 30

· Submitted by

AnthonyGosselin on Jun 4

Upvote

Authors:

Anthony Gosselin ,

Luis Lara ,

Abstract

Ctrl-Crash, a controllable car crash video generation model using classifier-free guidance, achieves top performance in video quality and realism compared to existing diffusion-based methods.

AI-generated summary

Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.