Model Card for SpinePose Family
SpinePose is a family of 2D human pose estimation models trained to estimate a 37-keypoint skeleton, extending standard human body models to include the spine, pelvis, and feet regions in detail.
Four SpinePose variants (small, medium, large, and x-large) are available, with 0.72, 1.98, 4.22, and 17.37 GFLOPS respectively at inference time.
Model Details
Description
- Developed by: Muhammad Saif Ullah Khan
- Affiliation: Technical University of Kaiserslautern & DFKI
- Funding: DFKI GmbH
- Model Type: Top-down 2D keypoint estimator
- License: CC-BY-NC-4.0
- Frameworks: PyTorch, ONNX Runtime
- Input Resolution: 256Γ192 or 384Γ288 (depending on variant)
Sources
- Repository: github.com/dfki-av/spinepose
- Paper: CVPR Workshops 2025 (CVSPORTS)
- Demo: saifkhichi.com/research/spinepose
Intended Uses
Direct Use
- Human body and spine joint localization from RGB images or videos
- Real-time motion analysis for research, animation, or sports applications
- Augmentation of general-purpose pose estimators for anatomically rich tasks
Downstream Use
- Integration with clinical posture tracking systems
- 3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset)
- Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga)
Out-of-Scope Use
- Any medical diagnosis or treatment application without human oversight
- Full-body 3D reconstruction (requires separate lifting model)
- Unverified use in safety-critical systems
Bias, Risks, and Limitations
- Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses.
- Limited diversity in body types and cultural attire representation.
- Bias inherited from COCO/Body8 datasets used for pretraining the teachers.
Recommendations
Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias.
Getting Started
Installation
pip install spinepose
On Linux/Windows with CUDA available, install the GPU version:
pip install spinepose[gpu]
CLI Usage
spinepose -i /path/to/image_or_video -o /path/to/output
This automatically downloads the correct ONNX checkpoint.
Run spinepose -h
for detailed usage options.
Python API
import cv2
from spinepose import SpinePoseEstimator
# Initialize estimator (downloads ONNX model if not found locally)
estimator = SpinePoseEstimator(device='cuda')
# Perform inference on a single image
image = cv2.imread('path/to/image.jpg')
keypoints, scores = estimator.predict(image)
visualized = estimator.visualize(image, keypoints, scores)
cv2.imwrite('output.jpg', visualized)
For higher-level use:
from spinepose.inference import infer_image, infer_video
# Single image inference
infer_image('path/to/image.jpg', 'output.jpg')
# Video inference with optional temporal smoothing
infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True)
Evaluation
To reproduce results, prepare the following directory layout:
<PROJECT_DIR>/
ββ data/
β ββ spinetrack/
β ββ coco/
β ββ halpe/
ββ checkpoints/
ββ spinepose-s_32xb256-10e_spinetrack-256x192.pth
ββ spinepose-m_32xb256-10e_spinetrack-256x192.pth
ββ spinepose-l_32xb256-10e_spinetrack-256x192.pth
ββ spinepose-x_32xb128-10e_spinetrack-384x288.pth
Each PyTorch checkpoint contains both teacher
and student
weights, with only the student
used during inference. Exported ONNX checkpoints only contain the student
.
Metrics
We report Average Precision (AP) and Average Recall (AR) under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format.
Results
Method | Train Data | Kpts | COCO | Halpe26 | Body | Feet | Spine | Overall | Params (M) | FLOPs (G) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP | AR | AP | AR | AP | AR | AP | AR | AP | AR | AP | AR | |||||
SimCC-MBV2 | COCO | 17 | 62.0 | 67.8 | 33.2 | 43.9 | 72.1 | 75.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.1 | 2.29 | 0.31 |
RTMPose-t | Body8 | 26 | 65.9 | 71.3 | 68.0 | 73.2 | 76.9 | 80.0 | 74.1 | 79.7 | 0.0 | 0.0 | 15.8 | 17.9 | 3.51 | 0.37 |
RTMPose-s | Body8 | 26 | 69.7 | 74.7 | 72.0 | 76.7 | 80.9 | 83.6 | 78.9 | 83.5 | 0.0 | 0.0 | 17.2 | 19.4 | 5.70 | 0.70 |
SpinePose-s | SpineTrack | 37 | 68.2 | 73.1 | 70.6 | 75.2 | 79.1 | 82.1 | 77.5 | 82.9 | 89.6 | 90.7 | 84.2 | 86.2 | 5.98 | 0.72 |
SimCC-ViPNAS | COCO | 17 | 69.5 | 75.5 | 36.9 | 49.7 | 79.6 | 83.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 | 8.65 | 0.80 |
RTMPose-m | Body8 | 26 | 75.1 | 80.0 | 76.7 | 81.3 | 85.5 | 87.9 | 84.1 | 88.2 | 0.0 | 0.0 | 19.4 | 21.4 | 13.93 | 1.95 |
SpinePose-m | SpineTrack | 37 | 73.0 | 77.5 | 75.0 | 79.2 | 84.0 | 86.4 | 83.5 | 87.4 | 91.4 | 92.5 | 88.0 | 89.5 | 14.34 | 1.98 |
RTMPose-l | Body8 | 26 | 76.9 | 81.5 | 78.4 | 82.9 | 86.8 | 89.2 | 86.9 | 90.0 | 0.0 | 0.0 | 20.0 | 22.0 | 28.11 | 4.19 |
RTMW-m | Cocktail14 | 133 | 73.8 | 78.7 | 63.8 | 68.5 | 84.3 | 86.7 | 83.0 | 87.2 | 0.0 | 0.0 | 6.2 | 7.6 | 32.26 | 4.31 |
SimCC-ResNet50 | COCO | 17 | 72.1 | 78.2 | 38.7 | 51.6 | 81.8 | 85.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 | 36.75 | 5.50 |
SpinePose-l | SpineTrack | 37 | 75.2 | 79.5 | 77.0 | 81.1 | 85.4 | 87.7 | 85.5 | 89.2 | 91.0 | 92.2 | 88.4 | 90.0 | 28.66 | 4.22 |
SimCC-ResNet50* | COCO | 17 | 73.4 | 79.0 | 39.8 | 52.4 | 83.2 | 86.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.3 | 0.3 | 43.29 | 12.42 |
RTMPose-x* | Body8 | 26 | 78.8 | 83.4 | 80.0 | 84.4 | 88.6 | 90.6 | 88.4 | 91.4 | 0.0 | 0.0 | 21.0 | 22.9 | 50.00 | 17.29 |
RTMW-l* | Cocktail14 | 133 | 75.6 | 80.4 | 65.4 | 70.1 | 86.0 | 88.3 | 85.6 | 89.2 | 0.0 | 0.0 | 8.1 | 8.1 | 57.20 | 7.91 |
RTMW-l* | Cocktail14 | 133 | 77.2 | 82.3 | 66.6 | 71.8 | 87.3 | 89.9 | 88.3 | 91.3 | 0.0 | 0.0 | 8.6 | 8.6 | 57.35 | 17.69 |
SpinePose-x* | SpineTrack | 37 | 75.9 | 80.1 | 77.6 | 81.8 | 86.3 | 88.5 | 86.3 | 89.7 | 89.3 | 91.0 | 88.9 | 89.9 | 50.69 | 17.37 |
SpineTrack Dataset
The SpineTrack dataset comprises both real and synthetic data:
- SpineTrack-Real: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints.
- SpineTrack-Unreal: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations.
To download:
git lfs install
git clone https://huggingface.co/datasets/saifkhichi96/spinetrack
Alternatively, use wget
to download the dataset directly:
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip
In both cases, the dataset will download two zipped folders: annotations
(24.8 MB) and images
(19.4 GB), which can be unzipped to obtain the following structure:
spinetrack
βββ annotations/
β βββ person_keypoints_train-real-coco.json
β βββ person_keypoints_train-real-yoga.json
β βββ person_keypoints_train-unreal.json
β βββ person_keypoints_val2017.json
βββ images/
βββ train-real-coco/
βββ train-real-yoga/
βββ train-unreal/
βββ val2017/
All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks.
The synthetic subset was primarily employed within the active learning pipeline used to bootstrap and refine annotations for real-world images.
All released SpinePose models were trained exclusively on the real portion of the dataset.
A small number of annotations in the synthetic subset are corrupted.
We recommend avoiding their use until the updated labels are released in the next dataset version.
Citation
If you use SpinePose or SpineTrack in your research, please cite:
BibTeX:
@InProceedings{Khan_2025_CVPR,
author = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
title = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2025},
pages = {6171-6180}
}
APA:
Khan, M. S. U., KrauΓ, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181).