SkeletonDiffusion Model Card

This model card focuses on the model associated with the SkeletonDiffusion model, from Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction, codebase available here.

SkeletonDiffusion is a probabilistic human motion prediction model that takes as input 0.5s of human motion and generates future motions of 2s with a inference time of 0.4s. SkeletonDiffusion generates motions that are at the same time realistic and diverse. It is a latent diffusion model that with a custom graph attention architecture trained with nonisotropic Gaussian diffusion.

We provide a model for each dataset mentioned in the paper (AMASS, FreeMan, Human3.6M), and a further model trained on AMASS with hands joints (AMASS-MANO).

Online demo

The model trained on AMASS is accessible in a demo workflow that predicts future motions from videos. The demo extracts 3D human poses from video via Neural Localizer Fields (NLF) by Sarandi et al., and SkeletonDiffusion generates future motions conditioned on the extracted poses: SkeletonDiffusion has not been trained with real-world, noisy data, but despite this fact it can handle most cases reasonably.

SkeletonDiffusion
/

ModelCheckpoints

SkeletonDiffusion Model Card

Online demo

Usage

Direct use

Train and Inference

Dataset used to train SkeletonDiffusion/ModelCheckpoints