Model Card for SpinePose Family

SpinePose is a family of 2D human pose estimation models trained to estimate a 37-keypoint skeleton, extending standard human body models to include the spine, pelvis, and feet regions in detail.

Four SpinePose variants (small, medium, large, and x-large) are available, with 0.72, 1.98, 4.22, and 17.37 GFLOPS respectively at inference time.

Model Details

Description

Developed by: Muhammad Saif Ullah Khan
Affiliation: Technical University of Kaiserslautern & DFKI
Funding: DFKI GmbH
Model Type: Top-down 2D keypoint estimator
License: CC-BY-NC-4.0
Frameworks: PyTorch, ONNX Runtime
Input Resolution: 256×192 or 384×288 (depending on variant)

Sources

Repository: github.com/dfki-av/spinepose
Paper: CVPR Workshops 2025 (CVSPORTS)
Demo: saifkhichi.com/research/spinepose

Intended Uses

Direct Use

Human body and spine joint localization from RGB images or videos
Real-time motion analysis for research, animation, or sports applications
Augmentation of general-purpose pose estimators for anatomically rich tasks

Downstream Use

Integration with clinical posture tracking systems
3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset)
Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga)

Out-of-Scope Use

Any medical diagnosis or treatment application without human oversight
Full-body 3D reconstruction (requires separate lifting model)
Unverified use in safety-critical systems

Bias, Risks, and Limitations

Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses.
Limited diversity in body types and cultural attire representation.
Bias inherited from COCO/Body8 datasets used for pretraining the teachers.

Recommendations

Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias.

Getting Started

Installation

pip install spinepose

On Linux/Windows with CUDA available, install the GPU version:

pip install spinepose[gpu]

CLI Usage

spinepose -i /path/to/image_or_video -o /path/to/output

This automatically downloads the correct ONNX checkpoint. Run spinepose -h for detailed usage options.

Python API

import cv2
from spinepose import SpinePoseEstimator

# Initialize estimator (downloads ONNX model if not found locally)
estimator = SpinePoseEstimator(device='cuda')

# Perform inference on a single image
image = cv2.imread('path/to/image.jpg')
keypoints, scores = estimator.predict(image)
visualized = estimator.visualize(image, keypoints, scores)
cv2.imwrite('output.jpg', visualized)

For higher-level use:

from spinepose.inference import infer_image, infer_video

# Single image inference
infer_image('path/to/image.jpg', 'output.jpg')

# Video inference with optional temporal smoothing
infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True)

Evaluation

To reproduce results, prepare the following directory layout:

<PROJECT_DIR>/
├─ data/
│  ├─ spinetrack/
│  ├─ coco/
│  └─ halpe/
└─ checkpoints/
   ├─ spinepose-s_32xb256-10e_spinetrack-256x192.pth
   ├─ spinepose-m_32xb256-10e_spinetrack-256x192.pth
   ├─ spinepose-l_32xb256-10e_spinetrack-256x192.pth
   └─ spinepose-x_32xb128-10e_spinetrack-384x288.pth

Each PyTorch checkpoint contains both teacher and student weights, with only the student used during inference. Exported ONNX checkpoints only contain the student.

Metrics

We report Average Precision (AP) and Average Recall (AR) under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format.

Results

Method	Train Data	Kpts	COCO		Halpe26		Body		Feet		Spine		Overall		Params (M)	FLOPs (G)
			AP	AR	AP	AR	AP	AR	AP	AR	AP	AR	AP	AR
SimCC-MBV2	COCO	17	62.0	67.8	33.2	43.9	72.1	75.6	0.0	0.0	0.0	0.0	0.1	0.1	2.29	0.31
RTMPose-t	Body8	26	65.9	71.3	68.0	73.2	76.9	80.0	74.1	79.7	0.0	0.0	15.8	17.9	3.51	0.37
RTMPose-s	Body8	26	69.7	74.7	72.0	76.7	80.9	83.6	78.9	83.5	0.0	0.0	17.2	19.4	5.70	0.70
SpinePose-s	SpineTrack	37	68.2	73.1	70.6	75.2	79.1	82.1	77.5	82.9	89.6	90.7	84.2	86.2	5.98	0.72

SimCC-ViPNAS	COCO	17	69.5	75.5	36.9	49.7	79.6	83.0	0.0	0.0	0.0	0.0	0.2	0.2	8.65	0.80
RTMPose-m	Body8	26	75.1	80.0	76.7	81.3	85.5	87.9	84.1	88.2	0.0	0.0	19.4	21.4	13.93	1.95
SpinePose-m	SpineTrack	37	73.0	77.5	75.0	79.2	84.0	86.4	83.5	87.4	91.4	92.5	88.0	89.5	14.34	1.98

RTMPose-l	Body8	26	76.9	81.5	78.4	82.9	86.8	89.2	86.9	90.0	0.0	0.0	20.0	22.0	28.11	4.19
RTMW-m	Cocktail14	133	73.8	78.7	63.8	68.5	84.3	86.7	83.0	87.2	0.0	0.0	6.2	7.6	32.26	4.31
SimCC-ResNet50	COCO	17	72.1	78.2	38.7	51.6	81.8	85.2	0.0	0.0	0.0	0.0	0.2	0.2	36.75	5.50
SpinePose-l	SpineTrack	37	75.2	79.5	77.0	81.1	85.4	87.7	85.5	89.2	91.0	92.2	88.4	90.0	28.66	4.22

SimCC-ResNet50*	COCO	17	73.4	79.0	39.8	52.4	83.2	86.2	0.0	0.0	0.0	0.0	0.3	0.3	43.29	12.42
RTMPose-x*	Body8	26	78.8	83.4	80.0	84.4	88.6	90.6	88.4	91.4	0.0	0.0	21.0	22.9	50.00	17.29
RTMW-l*	Cocktail14	133	75.6	80.4	65.4	70.1	86.0	88.3	85.6	89.2	0.0	0.0	8.1	8.1	57.20	7.91
RTMW-l*	Cocktail14	133	77.2	82.3	66.6	71.8	87.3	89.9	88.3	91.3	0.0	0.0	8.6	8.6	57.35	17.69
SpinePose-x*	SpineTrack	37	75.9	80.1	77.6	81.8	86.3	88.5	86.3	89.7	89.3	91.0	88.9	89.9	50.69	17.37

SpineTrack Dataset

The SpineTrack dataset comprises both real and synthetic data:

SpineTrack-Real: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints.
SpineTrack-Unreal: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations.

To download:

git lfs install
git clone https://huggingface.co/datasets/saifkhichi96/spinetrack

Alternatively, use wget to download the dataset directly:

wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip

In both cases, the dataset will download two zipped folders: annotations (24.8 MB) and images (19.4 GB), which can be unzipped to obtain the following structure:

spinetrack
├── annotations/
│   ├── person_keypoints_train-real-coco.json
│   ├── person_keypoints_train-real-yoga.json
│   ├── person_keypoints_train-unreal.json
│   └── person_keypoints_val2017.json
└── images/
    ├── train-real-coco/
    ├── train-real-yoga/
    ├── train-unreal/
    └── val2017/

All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks.

The synthetic subset was primarily employed within the active learning pipeline used to bootstrap and refine annotations for real-world images.
All released SpinePose models were trained exclusively on the real portion of the dataset.

A small number of annotations in the synthetic subset are corrupted.
We recommend avoiding their use until the updated labels are released in the next dataset version.

Citation

If you use SpinePose or SpineTrack in your research, please cite:

BibTeX:

@InProceedings{Khan_2025_CVPR,
    author    = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
    title     = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6171-6180}
}

APA:

Khan, M. S. U., Krauß, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181).

Model Card Contact

Muhammad Saif Ullah Khan

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Keypoint Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dfki-av/spinepose

Tau-J/RTMPose

Merge model

this model

dfki-av
/

spinepose