YOLOv12‑x Object Detector
Ultralytics’s attention‑centric, real‑time object detection model YOLOv12‑x is now available on Hugging Face.
🧠 Model Description
YOLOv12‑x builds on the YOLO12 family by combining Area Attention and R‑ELAN modules to deliver state‑of‑the‑art detection accuracy with fewer parameters and FLOPs. Optional FlashAttention integration further reduces memory access overhead and boosts inference speed on modern NVIDIA GPUs citeturn0view0.
⚙️ Requirements
Python ≥ 3.8
PyTorch ≥ 1.10 (CUDA‑enabled)
CUDA ≥ 11.2 compatible GPU
Optional: FlashAttention (install via
pip install flash-attn
)Recommended GPU architectures for FlashAttention support:
- Turing (e.g. T4, Quadro RTX)
- Ampere (RTX 30 series, A30/40/100)
- Ada Lovelace (RTX 40 series)
- Hopper (H100/H200) citeturn0view0
System specs: ≥ 8 GB RAM, ≥ 50 GB free disk
🚀 Installation & Usage
pip install ultralytics
# (Optional for FlashAttention)
pip install flash-attn
Python example:
from ultralytics import YOLO
# Load a COCO-pretrained YOLO12x model
model = YOLO("yolo12x.pt")
# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
# Run inference with the YOLO12n model on the 'bus.jpg' image
results = model("path/to/bus.jpg")
CLI example:
yolo detect predict model=yolov12x.pt source=test.jpg imgsz=640 conf=0.25
📊 Performance & Use Cases
Benchmarked on COCO val2017 at 640 × 640 resolution on an NVIDIA T4 GPU:
Model | [email protected]:0.95 | Latency (ms) | Params (M) | FLOPs (B) | |
---|---|---|---|---|---|
YOLO12‑x | 55.2 % | 11.79 | 59.1 | 199.0 | citeturn0view0 |
YOLOv12‑x excels in scenarios demanding both high accuracy and near‑real‑time throughput:
- Autonomous vehicles
- Industrial inspection
- Surveillance & security systems
📚 References
@article{tian2025yolov12,
title={YOLOv12: Attention-Centric Real-Time Object Detectors},
author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
journal={arXiv preprint arXiv:2502.12524},
year={2025}
}
📝 Summary
Feature | Details |
---|---|
Model | YOLOv12‑x |
Architecture | Area Attention + R‑ELAN |
FlashAttention | Optional (GPU‑accelerated) |
Requirements | Python ≥ 3.8, PyTorch ≥ 1.10, CUDA ≥ 11.2 |
Use Cases | Real‑time object detection with high accuracy |
Files:
├── yolov12x.pt # Trained model weights
├── README.md # This file