Submitted by Saeed Ranjbar Alvar 5 From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model Huawei's Vancouver VBDAI Lab 3 2