RF-DETR SoccerNet - Professional Soccer Object Detection
A state-of-the-art RF-DETR-Large model fine-tuned on the SoccerNet-Tracking dataset for detecting objects in soccer videos. This model achieves 85.7% mAP@50 and provides professional-grade analysis capabilities for soccer broadcasts.
π Model Performance
Metric | Value | Target |
---|---|---|
mAP@50 | 85.7% | 84.95% β |
mAP | 49.8% | - |
mAP@75 | 52.0% | - |
Training Time | ~14 hours | NVIDIA A100 40GB |
Parameters | 128M | RF-DETR-Large |
π― Detected Classes
The model can detect 4 essential classes in soccer videos:
- β½ Ball - Soccer ball detection with high precision
- π Player - Field players from both teams
- π¨ββοΈ Referee - Match officials
- π₯ Goalkeeper - Specialized goalkeeper detection
π Quick Start
Installation
pip install rfdetr pandas opencv-python pillow tqdm numpy torch torchvision
Basic Usage
from inference import RFDETRSoccerNet
# Initialize model (auto-detects CUDA/CPU)
model = RFDETRSoccerNet()
# Process video and get DataFrame
df = model.process_video('soccer_match.mp4', confidence_threshold=0.5)
# Display first 5 detections
print(df.head())
# Save results
model.save_results(df, 'match_analysis.csv')
Output DataFrame Format
The model returns a pandas DataFrame with comprehensive detection data:
Column | Description | Type |
---|---|---|
frame |
Frame number in video | int |
timestamp |
Time in seconds | float |
class_name |
Detected class | str |
class_id |
Class ID (0-3) | int |
x1, y1 |
Top-left corner coordinates | float |
x2, y2 |
Bottom-right corner coordinates | float |
width, height |
Bounding box dimensions | float |
confidence |
Detection confidence (0-1) | float |
center_x, center_y |
Object center coordinates | float |
area |
Bounding box area | float |
πΉ Video Processing Examples
Process Full Match
# Process entire match
df = model.process_video(
'full_match.mp4',
confidence_threshold=0.5,
save_results=True
)
print(f"Processed {len(df):,} detections")
print(df['class_name'].value_counts())
Fast Processing (Every 5th Frame)
# Process every 5th frame for speed
df = model.process_video(
'match.mp4',
frame_skip=5, # 5x faster processing
confidence_threshold=0.6
)
Limited Frame Processing
# Process first 10 minutes only
df = model.process_video(
'match.mp4',
max_frames=18000, # ~10 minutes at 30fps
confidence_threshold=0.5
)
πΌοΈ Image Processing
# Process single image
df = model.process_image('soccer_frame.jpg', confidence_threshold=0.5)
# Display results
for _, detection in df.iterrows():
print(f"{detection['class_name']}: {detection['confidence']:.2f}")
π Advanced Analysis
Ball Possession Analysis
# Analyze which players are near the ball
possession_df = model.analyze_ball_possession(
df,
distance_threshold=100 # pixels
)
print(f"Found {len(possession_df)} possession events")
Filter and Analyze Results
# Get high-confidence ball detections
ball_df = df[(df['class_name'] == 'ball') & (df['confidence'] > 0.8)]
# Calculate average players per frame
avg_players = df[df['class_name'] == 'player'].groupby('frame').size().mean()
# Find frames with goalkeepers
goalkeeper_frames = df[df['class_name'] == 'goalkeeper']['frame'].unique()
# Analyze referee positioning
referee_df = df[df['class_name'] == 'referee']
referee_activity = referee_df.groupby('frame').size()
Export in Different Formats
# Save as CSV (recommended for analysis)
model.save_results(df, 'detections.csv', format='csv')
# Save as JSON (with metadata)
model.save_results(df, 'detections.json', format='json')
# Save as Parquet (for big data)
model.save_results(df, 'detections.parquet', format='parquet')
π― Use Cases
Sports Analytics
- Player Tracking: Monitor individual player movements
- Ball Possession: Calculate possession percentages
- Formation Analysis: Study team formations and positions
- Heat Maps: Generate player movement heat maps
Broadcast Enhancement
- Automatic Highlighting: Identify key moments
- Statistics Overlay: Real-time player/ball statistics
- Tactical Analysis: Formation and strategy analysis
- Performance Metrics: Player distance, speed analysis
Research Applications
- Tactical Research: Academic sports analysis
- Computer Vision: Object detection benchmarking
- Dataset Creation: Generate labeled training data
- Video Analytics: Automated video processing pipelines
π Performance Benchmarks
Processing Speed
- GPU (RTX 4070): ~12-15 FPS
- GPU (A100): ~25-30 FPS
- CPU: ~2-3 FPS
Memory Usage
- Model Size: 1.46 GB
- GPU Memory: ~4-6 GB
- RAM: ~2-4 GB
Accuracy by Class
Class | Precision | Recall | F1-Score |
---|---|---|---|
Ball | 78.5% | 71.2% | 74.7% |
Player | 91.3% | 89.7% | 90.5% |
Referee | 85.2% | 82.1% | 83.6% |
Goalkeeper | 88.9% | 85.4% | 87.1% |
π οΈ Advanced Configuration
Custom Confidence Thresholds
# Class-specific confidence tuning
df = model.process_video('match.mp4')
# Filter by class-specific confidence
high_conf_players = df[(df['class_name'] == 'player') & (df['confidence'] > 0.7)]
high_conf_ball = df[(df['class_name'] == 'ball') & (df['confidence'] > 0.5)]
Batch Processing
import os
# Process multiple videos
video_files = ['match1.mp4', 'match2.mp4', 'match3.mp4']
for video in video_files:
print(f"Processing {video}...")
df = model.process_video(video, save_results=True)
print(f"Completed: {len(df)} detections")
π Integration Examples
With Pandas for Analysis
import pandas as pd
import matplotlib.pyplot as plt
# Process video
df = model.process_video('match.mp4')
# Create timeline analysis
timeline = df.groupby('timestamp')['class_name'].value_counts().unstack(fill_value=0)
timeline.plot(kind='line', figsize=(15, 8))
plt.title('Object Detection Timeline')
plt.show()
With OpenCV for Visualization
import cv2
# Load video and predictions
cap = cv2.VideoCapture('match.mp4')
df = model.process_video('match.mp4')
# Draw detections on video frames
for frame_num in range(100): # First 100 frames
ret, frame = cap.read()
if not ret:
break
# Get detections for this frame
frame_detections = df[df['frame'] == frame_num]
# Draw bounding boxes
for _, det in frame_detections.iterrows():
x1, y1, x2, y2 = int(det['x1']), int(det['y1']), int(det['x2']), int(det['y2'])
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, f"{det['class_name']}: {det['confidence']:.2f}",
(x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow('Detections', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
π§ Technical Details
Model Architecture
- Base: RF-DETR-Large (Real-time Detection Transformer)
- Backbone: DINOv2 with ResNet features
- Input Resolution: 1280x1280 pixels
- Output: 4 object classes with bounding boxes
Training Details
- Dataset: SoccerNet-Tracking 2023 (42,750 images)
- Hardware: NVIDIA A100 40GB
- Training Time: ~14 hours (4 epochs)
- Batch Size: 4
- Learning Rate: 1e-4
- Optimizer: AdamW
Data Preprocessing
- Augmentation: Random scaling, rotation, color jittering
- Normalization: ImageNet statistics
- Resolution: Multi-scale training (896-1280px)
π¨ Limitations and Recommendations
Known Limitations
- Optimized for broadcast footage: Best performance on professional soccer broadcasts
- Lighting sensitivity: May have reduced accuracy in poor lighting conditions
- Camera angle dependency: Trained primarily on standard broadcast angles
- Ball occlusion: Small ball may be missed when heavily occluded
Best Practices
- Confidence thresholds: Use 0.5 for general detection, 0.7+ for high precision
- Frame skipping: Use
frame_skip=5
for fast processing without significant accuracy loss - Resolution: Higher resolution videos (720p+) provide better results
- Preprocessing: Ensure good video quality and standard soccer broadcast setup
π Model Card
Model Details
- Developed by: Computer Vision Research Team
- Model type: Object Detection (RF-DETR)
- Language(s): N/A (Visual model)
- License: Apache 2.0
- Fine-tuned from: RF-DETR-Large (COCO pre-trained)
Intended Use
- Primary use: Soccer video analysis and sports analytics
- Primary users: Sports analysts, researchers, developers
- Out-of-scope: Non-soccer sports, amateur footage, real-time applications requiring <10ms latency
Training Data
- Dataset: SoccerNet-Tracking 2023
- Size: 42,750 annotated images
- Source: Professional soccer broadcasts
- Classes: 4 (ball, player, referee, goalkeeper)
Performance
- Test mAP@50: 85.7%
- Validation mAP: 49.8%
- Processing Speed: 12-30 FPS (GPU dependent)
Ethical Considerations
- Bias: Model trained on professional broadcasts may not generalize to amateur soccer
- Privacy: Ensure compliance with privacy laws when processing broadcast footage
- Fair use: Respect copyright and licensing of video content
π Support and Citation
Getting Help
- Issues: Report bugs and feature requests on GitHub
- Documentation: Comprehensive guides and examples included
- Community: Join our discussions for tips and best practices
Citation
If you use this model in your research, please cite:
@misc{rfdetr-soccernet-2025,
title={RF-DETR SoccerNet: High-Performance Soccer Object Detection},
author={Computer Vision Research Team},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/YOUR-USERNAME/rf-detr-soccernet}
}
Acknowledgments
- RF-DETR Architecture: Roboflow team for the excellent RF-DETR implementation
- SoccerNet Dataset: SoccerNet team for providing the comprehensive dataset
- Training Infrastructure: Google Colab Pro+ for A100 GPU access
- Community: Open source community for tools and feedback
π Changelog
v1.0.0 (2025-07-29)
- β Initial release with 85.7% mAP@50
- β Complete DataFrame-based inference API
- β Video and image processing capabilities
- β Ball possession analysis tools
- β Comprehensive documentation and examples
- β Multi-format export (CSV, JSON, Parquet)
Ready to analyze soccer like never before? πβ½
Get started with python example.py
and explore the power of AI-driven sports analytics!
- Downloads last month
- 12
Evaluation results
- Mean Average Precision at IoU 0.50 on SoccerNet-Tracking 2023self-reported85.700
- Mean Average Precision on SoccerNet-Tracking 2023self-reported49.800
- Mean Average Precision at IoU 0.75 on SoccerNet-Tracking 2023self-reported52.000