Nikeytas/Videomae Crime Detector Production V1

This model is a fine-tuned version of MCG-NJU/videomae-base on the UCF Crime dataset with event-based binary classification. It achieves the following results on the evaluation set:

  • Loss: 0.8070
  • Accuracy: 0.6250
  • Precision: 0.6351
  • Recall: 0.6250
  • F1 Score: 0.6114

🎯 Model Overview

This VideoMAE model has been fine-tuned for binary violence detection in video content. The model classifies videos into two categories:

  • Violent Crime (1): Videos containing violent criminal activities
  • Non-Violent Incident (0): Videos with non-violent or normal activities

The model is based on the VideoMAE architecture and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.

πŸ“Š Dataset & Training

Dataset Composition

Total Videos: 300

  • Violent Crime Videos: 150
  • Non-Violent Incident Videos: 150

Class Balance: 50.0% violent crimes

Event Distribution:

  • Abuse: 34 videos
  • Arrest: 36 videos
  • Arson: 46 videos
  • Assault: 36 videos
  • Burglary: 70 videos
  • Explosion: 24 videos
  • Fighting: 30 videos
  • RoadAccidents: 86 videos
  • Robbery: 98 videos
  • Shoplifting: 36 videos
  • Stealing: 62 videos

Data Splits:

  • Training: 192 videos
  • Validation: 48 videos
  • Test: 60 videos

🎯 Performance

Performance Metrics

Validation Performance:

  • eval_loss: 0.8070
  • eval_accuracy: 0.6250
  • eval_precision: 0.6351
  • eval_recall: 0.6250
  • eval_f1: 0.6114
  • eval_runtime: 6.4319
  • eval_samples_per_second: 7.4630
  • eval_steps_per_second: 3.7310
  • epoch: 10.0000

Test Performance:

  • eval_loss: 0.6541
  • eval_accuracy: 0.6667
  • eval_precision: 0.6667
  • eval_recall: 0.6667
  • eval_f1: 0.6667
  • eval_runtime: 8.0508
  • eval_samples_per_second: 7.4530
  • eval_steps_per_second: 3.7260
  • epoch: 10.0000

Training Information:

  • Training Time: 19.8 minutes
  • Best Accuracy Achieved: 0.6667
  • Model Architecture: VideoMAE Base (fine-tuned)
  • Fine-tuning Approach: Event-based binary classification

πŸš€ Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

  • Learning Rate: 5e-05
  • Train Batch Size: 2
  • Eval Batch Size: 2
  • Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • LR Scheduler Type: Linear
  • Training Epochs: 10
  • Weight Decay: 0.01

Training Results

Training Loss Epoch Step Validation Loss Accuracy
0.6666666666666666 10.00 N/A 0.8070 0.6250

Framework Versions

  • Transformers: 4.30.2+
  • PyTorch: 2.0.1+
  • Datasets: Latest
  • Device: Apple Silicon MPS / CUDA / CPU (Auto-detected)

πŸš€ Quick Start

Installation

pip install transformers torch torchvision opencv-python pillow

Basic Usage

import torch
from transformers import AutoModelForVideoClassification, AutoProcessor
import cv2
import numpy as np

# Load model and processor
model = AutoModelForVideoClassification.from_pretrained("Nikeytas/videomae-crime-detector-production-v1")
processor = AutoProcessor.from_pretrained("Nikeytas/videomae-crime-detector-production-v1")

# Process video
def classify_video(video_path, num_frames=16):
    # Extract frames
    cap = cv2.VideoCapture(video_path)
    frames = []
    
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)
    
    for idx in indices:
        cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
        ret, frame = cap.read()
        if ret:
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frames.append(frame_rgb)
    
    cap.release()
    
    # Process with model
    inputs = processor(frames, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(predictions, dim=-1).item()
        confidence = predictions[0][predicted_class].item()
    
    label = "Violent Crime" if predicted_class == 1 else "Non-Violent"
    return label, confidence

# Example usage
video_path = "path/to/your/video.mp4"
prediction, confidence = classify_video(video_path)
print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")

Batch Processing

import os
from pathlib import Path

def process_video_directory(video_dir, output_file="results.txt"):
    results = []
    
    for video_file in Path(video_dir).glob("*.mp4"):
        try:
            prediction, confidence = classify_video(str(video_file))
            results.append({
                "file": video_file.name,
                "prediction": prediction,
                "confidence": confidence
            })
            print(f"βœ… {video_file.name}: {prediction} ({confidence:.3f})")
        except Exception as e:
            print(f"❌ Error processing {video_file.name}: {e}")
    
    # Save results
    with open(output_file, "w") as f:
        for result in results:
            f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")
    
    return results

# Process all videos in a directory
results = process_video_directory("./videos/")

πŸ“ˆ Technical Specifications

  • Base Model: MCG-NJU/videomae-base
  • Architecture: Vision Transformer (ViT) adapted for video
  • Input Resolution: 224x224 pixels per frame
  • Temporal Resolution: 16 frames per video clip
  • Output Classes: 2 (Binary classification)
  • Training Framework: HuggingFace Transformers
  • Optimization: AdamW optimizer with learning rate 5e-5

⚠️ Limitations

  1. Dataset Scope: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence
  2. Temporal Context: Uses 16-frame clips which may miss context in longer sequences
  3. Environmental Bias: Performance may vary with different lighting, camera angles, and video quality
  4. False Positives: May misclassify intense but non-violent activities (sports, action movies)
  5. Real-time Performance: Processing time depends on hardware capabilities

πŸ”’ Ethical Considerations

Intended Use

  • Primary: Research and development in video analysis
  • Secondary: Security system enhancement with human oversight
  • Educational: Computer vision and AI safety research

Prohibited Uses

  • Surveillance without consent: Do not use for unauthorized monitoring
  • Discriminatory profiling: Avoid bias against specific groups or communities
  • Automated punishment: Never use for automated legal or disciplinary actions
  • Privacy violation: Respect privacy laws and individual rights

Bias and Fairness

  • Model trained on specific dataset that may not represent all populations
  • Regular evaluation needed for bias detection and mitigation
  • Human oversight required for critical applications
  • Consider demographic representation in deployment scenarios

πŸ“ Model Card Information

  • Developed by: Research Team
  • Model Type: Video Classification (Binary)
  • Training Data: UCF Crime Dataset (Subset)
  • Training Date: 2025-06-01 23:46:55 UTC
  • Evaluation Metrics: Accuracy, Precision, Recall, F1-Score
  • Intended Users: Researchers, Security Professionals, Developers

πŸ“š Citation

If you use this model in your research, please cite:

@misc{Nikeytas_videomae_crime_detector_production_v1,
    title={VideoMAE Fine-tuned for Crime Detection},
    author={Research Team},
    year={2024},
    publisher={Hugging Face},
    url={https://huggingface.co/Nikeytas/videomae-crime-detector-production-v1}
}

🀝 Contributing

We welcome contributions to improve the model! Please:

  1. Report issues with specific examples
  2. Suggest improvements for bias reduction
  3. Share evaluation results on new datasets
  4. Contribute to documentation and examples

πŸ“ž Contact

For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.


Last updated: 2025-06-01 23:46:55 UTC Model version: 1.0 Framework: HuggingFace Transformers

Downloads last month
29
Safetensors
Model size
86.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Nikeytas/videomae-crime-detector-production-v1

Finetuned
(563)
this model

Dataset used to train Nikeytas/videomae-crime-detector-production-v1

Evaluation results