WorldVLA: Towards Autoregressive Action World Model
If our project helps you, please give us a star β on GitHub to support us. ππ
π Introduction
WorldVLA is an autoregressive action world model that unifies action and image understanding and generation. WorldVLA intergrates Vision-Language-Action (VLA) model (action model) and world model in one single framework.

Action Model Results (Text + Image -> Action)
Action Model generates actions given the text instruction and image observations.
![]() |
![]() |
![]() |
World Model Results (Action + Image -> Image)
World Model generates the next frame given the current frame and action control.
![]() |
![]() |
![]() |
Input: Action sequence of "Open the top drawer and put the bowl inside". | Input: Action sequence of "Push the plate to the front of the stove". | Input: Action sequence of "Put the bowl on the stove". |
Model Zoo
Model (256 * 256) | HF Link | Success Rate (%) |
---|---|---|
LIBERO-Spatial | Alibaba-DAMO-Academy/WorldVLA/model_256/libero_spatial | 85.6 |
LIBERO-Object | Alibaba-DAMO-Academy/WorldVLA/model_256/libero_object | 89.0 |
LIBERO-Goal | Alibaba-DAMO-Academy/WorldVLA/model_256/libero_goal | 82.6 |
LIBERO-Long | Alibaba-DAMO-Academy/WorldVLA/model_256/libero_10 | 59.0 |
Model (512 * 512) | HF Link | Success Rate (%) |
---|---|---|
LIBERO-Spatial | Alibaba-DAMO-Academy/WorldVLA/model_512/libero_spatial | 87.6 |
LIBERO-Object | Alibaba-DAMO-Academy/WorldVLA/model_512/libero_object | 96.2 |
LIBERO-Goal | Alibaba-DAMO-Academy/WorldVLA/model_512/libero_goal | 83.4 |
LIBERO-Long | Alibaba-DAMO-Academy/WorldVLA/model_512/libero_10 | 60.0 |
Citation
If you find the project helpful for your research, please consider citing our paper:
@article{cen2025worldvla,
title={WorldVLA: Towards Autoregressive Action World Model},
author={Cen, Jun and Yu, Chaohui and Yuan, Hangjie and Jiang, Yuming and Huang, Siteng and Guo, Jiayan and Li, Xin and Song, Yibing and Luo, Hao and Wang, Fan and others},
journal={arXiv preprint arXiv:2506.21539},
year={2025}
}
Acknowledgment
This project builds upon Lumina-mGPT, Chemeleon, and OpenVLA. We thank these teams for their open-source contributions.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Alibaba-DAMO-Academy/WorldVLA
Base model
facebook/chameleon-7b