Matchcommentary: Automatic Soccer Game Commentary Generation Model
Model Overview
Matchcommentary is a multimodal learning-based automatic soccer game commentary generation model that generates fluent soccer commentary text based on video features. The model combines visual feature extraction, Q-Former architecture, and large language models to achieve high-quality soccer commentary generation.
Model Architecture
- Base Model: LLaMA-3-8B-Instruct
- Vision Encoder: Q-Former architecture
- Feature Dimension: 512-dimensional video features
- Window Size: 15-second video clips
- Query Tokens: 32 video query tokens
Usage
Install Dependencies
pip install torch transformers einops pycocoevalcap opencv-python numpy
Quick Start
from models.matchvoice_model import matchvoice_model
from matchvoice_dataset import MatchVoice_Dataset
import torch
# Load model
model = matchvoice_model(
llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
num_video_query_token=32,
num_features=512,
device="cuda:0",
inference=True
)
# Load checkpoint
checkpoint = torch.load("model_save_best_val_CIDEr.pth", map_location="cpu")
model.load_state_dict(checkpoint)
model.eval()
# Perform inference (requires prepared video features)
with torch.no_grad():
predictions = model(samples)
Complete Inference Pipeline
Using the provided inference1.py
script:
python inference1.py \
--feature_root ./features \
--ann_root ./dataset/MatchTime/train \
--model_ckpt model_save_best_val_CIDEr.pth \
--window 15 \
--batch_size 4 \
--num_video_query_token 32 \
--num_features 512 \
--csv_output_path ./inference_result/predictions.csv
Input Data Format
The model expects the following input format:
- Video Features: ResNet_PCA512 features with shape
[batch_size, time_length, feature_dim]
- Timestamp Information: Metadata including game time, event type, etc.
- Attention Mask: For handling variable-length sequences
Output Format
The model outputs a CSV file with the following columns:
league
: League and season informationgame
: Game namehalf
: First/second halftimestamp
: Event timestamptype
: Soccer event typeanonymized
: Ground truth annotationpredicted_res_{i}
: Model prediction results
Model Features
- Supports multiple video feature formats (ResNet, C3D, CLIP, etc.)
- Soccer-specific vocabulary constraint generation
- Supports both batch inference and single video inference
- Q-Former-based multimodal fusion architecture
Performance Metrics
Evaluation results on the MatchTime dataset:
- Achieved best validation CIDEr score
- Supports real-time soccer commentary generation
- Downloads last month
- 11
Model tree for abocide/matchcommentary
Base model
meta-llama/Meta-Llama-3-8B-Instruct