Matchcommentary: Automatic Soccer Game Commentary Generation Model

Model Overview

Matchcommentary is a multimodal learning-based automatic soccer game commentary generation model that generates fluent soccer commentary text based on video features. The model combines visual feature extraction, Q-Former architecture, and large language models to achieve high-quality soccer commentary generation.

Model Architecture

  • Base Model: LLaMA-3-8B-Instruct
  • Vision Encoder: Q-Former architecture
  • Feature Dimension: 512-dimensional video features
  • Window Size: 15-second video clips
  • Query Tokens: 32 video query tokens

Usage

Install Dependencies

pip install torch transformers einops pycocoevalcap opencv-python numpy

Quick Start

from models.matchvoice_model import matchvoice_model
from matchvoice_dataset import MatchVoice_Dataset
import torch

# Load model
model = matchvoice_model(
    llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    num_video_query_token=32,
    num_features=512,
    device="cuda:0",
    inference=True
)

# Load checkpoint
checkpoint = torch.load("model_save_best_val_CIDEr.pth", map_location="cpu")
model.load_state_dict(checkpoint)
model.eval()

# Perform inference (requires prepared video features)
with torch.no_grad():
    predictions = model(samples)

Complete Inference Pipeline

Using the provided inference1.py script:

python inference1.py \
    --feature_root ./features \
    --ann_root ./dataset/MatchTime/train \
    --model_ckpt model_save_best_val_CIDEr.pth \
    --window 15 \
    --batch_size 4 \
    --num_video_query_token 32 \
    --num_features 512 \
    --csv_output_path ./inference_result/predictions.csv

Input Data Format

The model expects the following input format:

  1. Video Features: ResNet_PCA512 features with shape [batch_size, time_length, feature_dim]
  2. Timestamp Information: Metadata including game time, event type, etc.
  3. Attention Mask: For handling variable-length sequences

Output Format

The model outputs a CSV file with the following columns:

  • league: League and season information
  • game: Game name
  • half: First/second half
  • timestamp: Event timestamp
  • type: Soccer event type
  • anonymized: Ground truth annotation
  • predicted_res_{i}: Model prediction results

Model Features

  • Supports multiple video feature formats (ResNet, C3D, CLIP, etc.)
  • Soccer-specific vocabulary constraint generation
  • Supports both batch inference and single video inference
  • Q-Former-based multimodal fusion architecture

Performance Metrics

Evaluation results on the MatchTime dataset:

  • Achieved best validation CIDEr score
  • Supports real-time soccer commentary generation
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for abocide/matchcommentary

Finetuned
(740)
this model