Model Card for SpinePose Family

SpinePose is a family of 2D human pose estimation models trained to estimate a 37-keypoint skeleton, extending standard human body models to include the spine, pelvis, and feet regions in detail.

Four SpinePose variants (small, medium, large, and x-large) are available, with 0.72, 1.98, 4.22, and 17.37 GFLOPS respectively at inference time.


Model Details

Description

  • Developed by: Muhammad Saif Ullah Khan
  • Affiliation: Technical University of Kaiserslautern & DFKI
  • Funding: DFKI GmbH
  • Model Type: Top-down 2D keypoint estimator
  • License: CC-BY-NC-4.0
  • Frameworks: PyTorch, ONNX Runtime
  • Input Resolution: 256Γ—192 or 384Γ—288 (depending on variant)

Sources


Intended Uses

Direct Use

  • Human body and spine joint localization from RGB images or videos
  • Real-time motion analysis for research, animation, or sports applications
  • Augmentation of general-purpose pose estimators for anatomically rich tasks

Downstream Use

  • Integration with clinical posture tracking systems
  • 3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset)
  • Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga)

Out-of-Scope Use

  • Any medical diagnosis or treatment application without human oversight
  • Full-body 3D reconstruction (requires separate lifting model)
  • Unverified use in safety-critical systems

Bias, Risks, and Limitations

  • Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses.
  • Limited diversity in body types and cultural attire representation.
  • Bias inherited from COCO/Body8 datasets used for pretraining the teachers.

Recommendations

Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias.


Getting Started

Installation

pip install spinepose

On Linux/Windows with CUDA available, install the GPU version:

pip install spinepose[gpu]

CLI Usage

spinepose -i /path/to/image_or_video -o /path/to/output

This automatically downloads the correct ONNX checkpoint. Run spinepose -h for detailed usage options.

Python API

import cv2
from spinepose import SpinePoseEstimator

# Initialize estimator (downloads ONNX model if not found locally)
estimator = SpinePoseEstimator(device='cuda')

# Perform inference on a single image
image = cv2.imread('path/to/image.jpg')
keypoints, scores = estimator.predict(image)
visualized = estimator.visualize(image, keypoints, scores)
cv2.imwrite('output.jpg', visualized)

For higher-level use:

from spinepose.inference import infer_image, infer_video

# Single image inference
infer_image('path/to/image.jpg', 'output.jpg')

# Video inference with optional temporal smoothing
infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True)

Evaluation

To reproduce results, prepare the following directory layout:

<PROJECT_DIR>/
β”œβ”€ data/
β”‚  β”œβ”€ spinetrack/
β”‚  β”œβ”€ coco/
β”‚  └─ halpe/
└─ checkpoints/
   β”œβ”€ spinepose-s_32xb256-10e_spinetrack-256x192.pth
   β”œβ”€ spinepose-m_32xb256-10e_spinetrack-256x192.pth
   β”œβ”€ spinepose-l_32xb256-10e_spinetrack-256x192.pth
   └─ spinepose-x_32xb128-10e_spinetrack-384x288.pth

Each PyTorch checkpoint contains both teacher and student weights, with only the student used during inference. Exported ONNX checkpoints only contain the student.

Metrics

We report Average Precision (AP) and Average Recall (AR) under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format.

Results

Method Train Data Kpts COCO Halpe26 Body Feet Spine Overall Params (M) FLOPs (G)
APAR APAR APAR APAR APAR APAR
SimCC-MBV2COCO1762.067.833.243.972.175.60.00.00.00.00.10.12.290.31
RTMPose-tBody82665.971.368.073.276.980.074.179.70.00.015.817.93.510.37
RTMPose-sBody82669.774.772.076.780.983.678.983.50.00.017.219.45.700.70
SpinePose-sSpineTrack3768.273.170.675.279.182.177.582.989.690.784.286.25.980.72
SimCC-ViPNASCOCO1769.575.536.949.779.683.00.00.00.00.00.20.28.650.80
RTMPose-mBody82675.180.076.781.385.587.984.188.20.00.019.421.413.931.95
SpinePose-mSpineTrack3773.077.575.079.284.086.483.587.491.492.588.089.514.341.98
RTMPose-lBody82676.981.578.482.986.889.286.990.00.00.020.022.028.114.19
RTMW-mCocktail1413373.878.763.868.584.386.783.087.20.00.06.27.632.264.31
SimCC-ResNet50COCO1772.178.238.751.681.885.20.00.00.00.00.20.236.755.50
SpinePose-lSpineTrack3775.279.577.081.185.487.785.589.291.092.288.490.028.664.22
SimCC-ResNet50*COCO1773.479.039.852.483.286.20.00.00.00.00.30.343.2912.42
RTMPose-x*Body82678.883.480.084.488.690.688.491.40.00.021.022.950.0017.29
RTMW-l*Cocktail1413375.680.465.470.186.088.385.689.20.00.08.18.157.207.91
RTMW-l*Cocktail1413377.282.366.671.887.389.988.391.30.00.08.68.657.3517.69
SpinePose-x*SpineTrack3775.980.177.681.886.388.586.389.789.391.088.989.950.6917.37

SpineTrack Dataset

The SpineTrack dataset comprises both real and synthetic data:

  • SpineTrack-Real: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints.
  • SpineTrack-Unreal: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations.

To download:

git lfs install
git clone https://huggingface.co/datasets/saifkhichi96/spinetrack

Alternatively, use wget to download the dataset directly:

wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip

In both cases, the dataset will download two zipped folders: annotations (24.8 MB) and images (19.4 GB), which can be unzipped to obtain the following structure:

spinetrack
β”œβ”€β”€ annotations/
β”‚   β”œβ”€β”€ person_keypoints_train-real-coco.json
β”‚   β”œβ”€β”€ person_keypoints_train-real-yoga.json
β”‚   β”œβ”€β”€ person_keypoints_train-unreal.json
β”‚   └── person_keypoints_val2017.json
└── images/
    β”œβ”€β”€ train-real-coco/
    β”œβ”€β”€ train-real-yoga/
    β”œβ”€β”€ train-unreal/
    └── val2017/

All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks.

The synthetic subset was primarily employed within the active learning pipeline used to bootstrap and refine annotations for real-world images.
All released SpinePose models were trained exclusively on the real portion of the dataset.

A small number of annotations in the synthetic subset are corrupted.
We recommend avoiding their use until the updated labels are released in the next dataset version.

Citation

If you use SpinePose or SpineTrack in your research, please cite:

BibTeX:

@InProceedings{Khan_2025_CVPR,
    author    = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
    title     = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6171-6180}
}

APA:

Khan, M. S. U., Krauß, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181).

Model Card Contact

Muhammad Saif Ullah Khan

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dfki-av/spinepose

Merge model
this model

Dataset used to train dfki-av/spinepose