VideoMAE-based Vehicle Collision Prediction Solution
Model Description
This repository contains a pretrained VideoMAEv2-giant model fine-tuned for the Nexar Safe Driving Video Analysis competition. The model is designed to predict collision and near-miss risks in driving videos.
Performance: 4th place on the Kaggle public leaderboard with a score of 0.886.
Usage
The model takes video frames as input and outputs a probability score indicating the likelihood of an imminent collision or near-miss event.
# Example usage (pseudo-code)
from transformers import VideoMAEForVideoClassification
import torch
model = VideoMAEForVideoClassification.from_pretrained("zhiyaowang/VideoMaev2-giant-nexar-solution")
# Process video frames (16 frames recommended)
frames = preprocess_video(video_path) # Shape: [1, 16, 3, 224, 224]
with torch.no_grad():
outputs = model(frames)
probability = torch.softmax(outputs.logits / 2.0, dim=1) # Temperature scaling T=2.0
Model Training
Data Processing
- Frame Extraction & Timestamps: Extract frame sequences and timestamps from each video.
- Sliding Window: Applied a sliding window approach with 16 frames (window size) and 2 frames (stride).
- Label Assignment: Windows with their last frame within 1.5 seconds before a collision/near-miss event were labeled positive.
- Data Balancing: Randomly undersampled negative
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for zhiyaowang/VideoMaev2-giant-nexar-solution
Base model
OpenGVLab/VideoMAEv2-giant