VideoMAE-based Vehicle Collision Prediction Solution

Model Description

This repository contains a pretrained VideoMAEv2-giant model fine-tuned for the Nexar Safe Driving Video Analysis competition. The model is designed to predict collision and near-miss risks in driving videos.

Performance: 4th place on the Kaggle public leaderboard with a score of 0.886.

Usage

The model takes video frames as input and outputs a probability score indicating the likelihood of an imminent collision or near-miss event.

# Example usage (pseudo-code)
from transformers import VideoMAEForVideoClassification
import torch

model = VideoMAEForVideoClassification.from_pretrained("zhiyaowang/VideoMaev2-giant-nexar-solution")
# Process video frames (16 frames recommended)
frames = preprocess_video(video_path)  # Shape: [1, 16, 3, 224, 224]
with torch.no_grad():
    outputs = model(frames)
probability = torch.softmax(outputs.logits / 2.0, dim=1)  # Temperature scaling T=2.0

Model Training

Data Processing

  • Frame Extraction & Timestamps: Extract frame sequences and timestamps from each video.
  • Sliding Window: Applied a sliding window approach with 16 frames (window size) and 2 frames (stride).
  • Label Assignment: Windows with their last frame within 1.5 seconds before a collision/near-miss event were labeled positive.
  • Data Balancing: Randomly undersampled negative
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zhiyaowang/VideoMaev2-giant-nexar-solution

Finetuned
(1)
this model

Dataset used to train zhiyaowang/VideoMaev2-giant-nexar-solution