SwiftViT-150

MViT-v2 fine-tuned on 150 videos for common swift feeding behavior classification.

Model

Fine-tuned mvit_v2_s (Kinetics-400 pretrained) on single-camera nestbox footage. Achieves ~87% validation accuracy (in controlled settings) and demonstrates surprising cross-camera generalization despite training on a single viewpoint and on a miniscule dataset (150 samples).

Usage

import torch
import torchvision

model = torchvision.models.video.mvit_v2_s(weights=None)
model.head = torch.nn.Sequential(
    torch.nn.Dropout(0.5),
    torch.nn.Linear(768, 512),
    torch.nn.GELU(),
    torch.nn.Dropout(0.3),
    torch.nn.Linear(512, 3),
)

checkpoint = torch.load("swiftvit-150.pth")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

# Inference
with torch.no_grad():
    video = load_video()  # Shape: [C, T, H, W]
    output = model(video.unsqueeze(0))
    prediction = torch.argmax(output, dim=1)
    # 0: feeding, 1: possible_feeding, 2: not_feeding

Architecture

Base: MViT-v2 Small (24M params)
Head: Custom 768→512→3 with dropout
Input: 16 frames @ 224x224
Classes: 3 (feeding, possible_feeding, not_feeding)

Training

120 train / 30 val samples
Batch size: 4
Optimizer: AdamW (lr=1e-4, wd=0.05)
Scheduler: CosineAnnealingWarmRestarts
Mixed precision training on H100
Early stopping: 40 epoch patience

Performance

Train accuracy: 100%
Val accuracy: 87%
Unexpected cross-camera generalization observed

Dataset

Trained on swift-150 - 150 videos from GABLE nestbox camera (Ireland, 2020-2025).

Context

Part of climate research correlating swift feeding patterns with weather data at terrabyte scale. Ballinrobe Community School entry for REDACTED.

Citation

If you reference this work, cite:

@misc{swift150bcs,
  title={Swift-150: A Dataset for Common Swift Feeding Behavior Analysis},
  author={Odin Glynn-Martin, Culan O'Meara, Anas Rashid, Shayden D'Souza, Pádraig Foley and Mark Lally},
  year={2025},
  institution={Ballinrobe Community School},
  url={https://ballinrobecommunityschool.ie},
  note={REDACTED - Entry 2025}
}

License

Proprietary. See LICENSE for restrictions.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for odinglynn/swiftvit-150

Base model

timm/mvitv2_small.fb_in1k

Finetuned

(1)

this model

odinglynn
/

swiftvit-150