Papers
arxiv:2509.13347

OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft

Published on Sep 13
Authors:
,
,
,
,
,
,

Abstract

A novel Chain of Action framework unifies high-level planning and low-level control in Vision-Language-Action models, improving task success rates in a diverse set of Minecraft tasks.

AI-generated summary

The choice of action spaces is a critical yet unresolved challenge in developing capable, end-to-end trainable agents. This paper first presents a large-scale, systematic comparison of prominent abstracted action spaces and tokenizers for Vision-Language-Action (VLA) or hierarchical agent models in the open-ended Minecraft. Our analysis reveals that no single action space is universally optimal; instead, the most effective abstraction is highly task-dependent, creating a dilemma for building generalist agents. To resolve this, we introduce Chain of Action (CoA), a novel framework that unifies high-level planning and low-level control within a single, monolithic VLA model. CoA treats an abstracted action not as a command for a separate policy, but as an intermediate reasoning step--akin to a chain of thought--that guides the generation of the final, executable action. Furthermore, we demonstrate that an All-in-One agent trained on a diverse mixture of action spaces using the CoA paradigm learns a more robust and generalizable policy. This unified agent achieves a new state-of-the-art, improving the overall task success rate over strong, specialized baselines. To foster reproducible research, we release the OpenHA (Open Hierarchical Agents) suite, which includes our comprehensive benchmark of over 800 distinct tasks, curated datasets, source code, and all pretrained model checkpoints at https://github.com/CraftJarvis/OpenHA

Community

πŸš€ Excited to share OpenHA here!
OpenHA is our new open-source framework for building hierarchical agentic models in Minecraft. It combines two key ideas:
πŸ”₯ Chain of Action (CoA) β€” using abstracted actions as thoughts to connect high-level reasoning with low-level control.
πŸ”₯ All-in-One training β€” unifying various actions space into a single model, so one agent can generalize across many task domains instead of being a specialist.

The showcased results come from minecraft-openha-qwen2vl-7b-2509, which demonstrates precise control, outperforming models trained on a single action type.

We’re releasing everything openly β€” checkpoints, datasets, and code. For details, check out our repository.
Looking forward to your feedback and discussions from the community! πŸ’‘

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 5

Browse 5 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.13347 in a Space README.md to link it from this page.

Collections including this paper 1