SkeletonDiffusion Model Card
This model card focuses on the model associated with the SkeletonDiffusion model, from Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction, codebase available here.
SkeletonDiffusion is a probabilistic human motion prediction model that takes as input 0.5s of human motion and generates future motions of 2s with a inference time of 0.4s. SkeletonDiffusion generates motions that are at the same time realistic and diverse. It is a latent diffusion model that with a custom graph attention architecture trained with nonisotropic Gaussian diffusion.
We provide a model for each dataset mentioned in the paper (AMASS, FreeMan, Human3.6M), and a further model trained on AMASS with hands joints (AMASS-MANO).

Online demo
The model trained on AMASS is accessible in a demo workflow that predicts future motions from videos. The demo extracts 3D human poses from video via Neural Localizer Fields (NLF) by Sarandi et al., and SkeletonDiffusion generates future motions conditioned on the extracted poses: SkeletonDiffusion has not been trained with real-world, noisy data, but despite this fact it can handle most cases reasonably.
Usage
Direct use
You can use the model for purposes under the BSD 2-Clause License.
Train and Inference
Please refer to our GitHub codebase for both usecases.