Model Card for Fine-Tuned YOLOv11m Animal Detection Model

This model is a fine-tuned version of YOLOv11m optimized for detection and classification of wildlife from low-altitude drone imagery. It has been trained to identify zebras (Plains and Grevy's), giraffes (reticulated and Masai), Persian onagers, and African Painted dogs with high accuracy across diverse environmental conditions.

Model Details

Model Description

Developed by: Jenna Kline
Model type: Object Detection and Classification
Language(s) (NLP): Not applicable (Computer Vision model)
Fine-tuned from model: YOLOv11m (ultralytics/yolo11m.pt)

Model Sources

Repository: https://github.com/Imageomics/mmla
Paper: MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset

Uses

Direct Use

This model is designed for direct use in wildlife monitoring applications, ecological research, and biodiversity studies. It can:

Detect and classify zebras, giraffes, onagers, and African wild dogs in low-altitude drone images
Monitor wildlife populations in their natural habitats
Automate animal ecology data collection using drones and computer vision
Support biodiversity assessments by identifying species present in field surveys

The model can be used by researchers, conservationists, wildlife managers, and citizen scientists to automate and scale up wildlife monitoring efforts, particularly in African ecosystems.

Downstream Use

This model can be integrated into larger ecological monitoring systems including:

Wildlife conservation monitoring platforms
Ecological research workflows
Environmental impact assessment tools

Out-of-Scope Use

This model is not suitable for:

Security or surveillance applications targeting humans
Applications where errors in detection could lead to harmful conservation decisions without human verification
Real-time detection systems requiring extremely low latency (model prioritizes accuracy over speed)
Detection of species not included in the training set (only trained on zebras, giraffes, onagers, and dogs)

Bias, Risks, and Limitations

Species representation bias: The model may perform better on species that were well-represented in the training data.
Environmental bias: Performance may degrade in environmental conditions not represented in the training data (e.g., extreme weather, unusual lighting).
Morphological bias: Similar-looking species may be confused with one another (particularly among equids like zebras and onagers).
Geospatial bias: The model may perform better in biomes similar to those present in the training data, particularly African savanna environments.
Seasonal bias: Detection accuracy may vary based on seasonal appearance changes in animals or environments.
Technical limitations: Performance depends on image quality, with reduced accuracy in low-resolution, blurry, or poorly exposed images.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model:

Always verify critical detections with human review, especially for rare species or conservation decision-making
Consider confidence scores when evaluating detections
Be cautious when applying the model to new geographic regions or habitats not represented in training data
Periodically validate model performance on new data to ensure continued reliability
Consider fine-tuning the model on domain-specific data when applying to new regions or species

How to Get Started with the Model

Use the code below to get started with the model:

from ultralytics import YOLO

# Load the model
model = YOLO('path/to/your/model.pt')

# Run inference on an image
results = model('path/to/image.jpg')

# Process results
for result in results:
    boxes = result.boxes  # Boxes object for bounding boxes outputs
    for box in boxes:
        x1, y1, x2, y2 = box.xyxy[0]  # get box coordinates
        conf = box.conf[0]  # confidence score
        cls = int(box.cls[0])  # class id
        class_name = model.names[cls]  # class name (Zebra, Giraffe, Onager, or Dog)
        print(f"Detected {class_name} with confidence {conf:.2f} at position {x1:.1f}, {y1:.1f}, {x2:.1f}, {y2:.1f}")
        
# Visualize results
results[0].plot()

Training Details

Training Data

Dataset is available at Hugging Face. See prepare_yolo_dataset.py for details on train/test splits.

Dataset splitting strategy

We applied a stratified 60/40 train-test split across species and locations to evaluate model generalizability. Data was collected from three distinct environments: Mpala Research Centre (location_1), Ol Pejeta Conservancy (location_2), and The Wilds Conservation Center (location_3). The dataset includes four target classes: Zebra, Giraffe, Onager, and African Wild Dog.

To prevent overlap in individual animals or environmental conditions between training and testing, we split video sessions at the file level—ensuring that no frames from a given session appear in both train and test sets. This also allows consistent per-frame sampling at a fixed interval (every 10th frame).

Training set includes:

Mpala (location_1): Multiple full sessions for Giraffes, Plains Zebras, and Grevy’s Zebras, including mixed-species scenes.-
Ol Pejeta (location_2): Full sessions of Plains Zebras.
The Wilds (location_3): 70% of sessions for Painted Dogs, Giraffes, and Persian Onagers.

Test set includes:

The Wilds (location_3): The remaining 30% of sessions, including additional Grevy’s Zebra sessions used exclusively for testing.
Mpala (location_1) and Ol Pejeta (location_2): Separate zebra and mixed-species sessions not used during training.

This careful division by session and location ensures that the model is evaluated on unseen environments, individuals, and contexts, making it a robust benchmark for testing generalization across ecological and geographic domains.

Training Procedure

Preprocessing

Images were resized to 640x640 pixels (as specified in the training script)
Standard YOLOv11 augmentation pipeline was applied

Training Hyperparameters

The model was trained with the following hyperparameters as specified in the training script:

Base model: YOLOv11m (yolo11m.pt)
Epochs: 50
Image size: 640
Dataset configuration: Custom YAML file defining 4 classes (Zebra, Giraffe, Onager, Dog)
Training regime: Default YOLOv11 training parameters

# Training script
from ultralytics import YOLO

model = YOLO("yolo11m.pt")
results = model.train(
    data="/data/dataset.yaml",
    epochs=50,
    imgsz=640,
)

Speeds, Sizes, Times

Training hardware: 2 Tesla V100-PCIE-16GB, 16144MiB
Training time: 2 hours, 11 minutes
Model size: YOLO11m summary - 231 layers, 20,056,092 parameters, 20,056,076 gradients, 68.2 GFLOPs
Inference speed: 0.1ms preprocess, 4.6ms inference, 0.0ms loss, 0.9ms postprocess per image on Tesla V100-PCIE-16GB, 16144MiB

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a held-out test set located at /fs/ess/PAS2136/Kenya-2023/yolo_benchmark/HerdYOLO/data/images/test containing:

7658 test images with instances of Zebra, Giraffe, Onager, and Dog

Factors

The evaluation disaggregated performance by:

Species category (Zebra, Giraffe, Onager, African wild dog)

Metrics

The model was evaluated using standard object detection metrics:

Precision: Ratio of true positives to all predicted positives
Recall: Ratio of true positives to all actual positives (ground truth)
mAP50: Mean Average Precision at IoU threshold of 0.5
mAP50-95: Mean Average Precision averaged over IoU thresholds from 0.5 to 0.95

Results

Summary

Overall mAP50: 80.1%
Overall mAP50-95: 48.8%
Per-class performance:
- Zebra: mAP50 = 67.5%, Precision = 76.5%, Recall = 64.7%
- Giraffe: mAP50 = 67.8%, Precision = 78.8%, Recall = 63.4%
- Onager: mAP50 = 85.7%, Precision = 93.9%, Recall = 77.6%
- Dog: mAP50 = 99.5%, Precision = 97.3%, Recall = 99.8%

Technical Specifications

Model Architecture and Objective

Base architecture: YOLOv11m
Detection heads: Standard YOLOv11 architecture
Classes: 4 (Zebra, Giraffe, Onager, Dog)

Compute Infrastructure

Software

Python 3.8+
PyTorch 2.0+
Ultralytics YOLOv11 framework
CUDA 11.7+ (for GPU acceleration)

Citation

BibTeX:

@software{mmla_finetuned_yolo11m,
  author = {Jenna Kline},
  title = {Fine-Tuned YOLOv11m Animal Detection Model},
  version = {1.0.0},
  year = {2025},
  url = {https://huggingface.co/imageomics/mmla}
}

Acknowledgements

This work was supported by both the Imageomics Institute and the AI and Biodiversity Change (ABC) Global Center. The Imageomics Institute is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). The ABC Global Center is funded by the US National Science Foundation under Award No. 2330423 and Natural Sciences and Engineering Research Council of Canada under Award No. 585136. This model draws on research supported by the Social Sciences and Humanities Research Council.

Additional support was provided by the National Ecological Observatory Network (NEON), a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. This material is based in part upon work supported by the National Science Foundation through the NEON Program.

Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.

Glossary

YOLO: You Only Look Once, a family of real-time object detection models
mAP: mean Average Precision, a standard metric for evaluating object detection models
IoU: Intersection over Union, a measure of overlap between predicted and ground truth bounding boxes
Onager: Also known as the Asian wild ass, a species of equid native to Asia
YOLOv11m: The medium-sized variant of the YOLOv11 architecture

Model Card Authors

Jenna Kline, The Ohio State University

Model Card Contact

kline.377 at osu dot edu

imageomics
/

mmla