VideoMAE-based Vehicle Collision Prediction Solution

Model Description

This repository contains a pretrained VideoMAEv2-giant model fine-tuned for the Nexar Safe Driving Video Analysis competition. The model is designed to predict collision and near-miss risks in driving videos.

Performance: 4th place on the Kaggle public leaderboard with a score of 0.886.

Usage

The model takes video frames as input and outputs a probability score indicating the likelihood of an imminent collision or near-miss event.

# Example usage (pseudo-code)
from transformers import VideoMAEForVideoClassification
import torch

model = VideoMAEForVideoClassification.from_pretrained("zhiyaowang/VideoMaev2-giant-nexar-solution")
# Process video frames (16 frames recommended)
frames = preprocess_video(video_path)  # Shape: [1, 16, 3, 224, 224]
with torch.no_grad():
    outputs = model(frames)
probability = torch.softmax(outputs.logits / 2.0, dim=1)  # Temperature scaling T=2.0

Model Training

Data Processing

Frame Extraction & Timestamps: Extract frame sequences and timestamps from each video.
Sliding Window: Applied a sliding window approach with 16 frames (window size) and 2 frames (stride).
Label Assignment: Windows with their last frame within 1.5 seconds before a collision/near-miss event were labeled positive.
Data Balancing: Randomly undersampled negative

zhiyaowang
/

VideoMaev2-giant-nexar-solution

VideoMAE-based Vehicle Collision Prediction Solution

Model Description

Usage

Model Training

Data Processing

Model tree for zhiyaowang/VideoMaev2-giant-nexar-solution

Dataset used to train zhiyaowang/VideoMaev2-giant-nexar-solution