OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft
Abstract
A novel Chain of Action framework unifies high-level planning and low-level control in Vision-Language-Action models, improving task success rates in a diverse set of Minecraft tasks.
The choice of action spaces is a critical yet unresolved challenge in developing capable, end-to-end trainable agents. This paper first presents a large-scale, systematic comparison of prominent abstracted action spaces and tokenizers for Vision-Language-Action (VLA) or hierarchical agent models in the open-ended Minecraft. Our analysis reveals that no single action space is universally optimal; instead, the most effective abstraction is highly task-dependent, creating a dilemma for building generalist agents. To resolve this, we introduce Chain of Action (CoA), a novel framework that unifies high-level planning and low-level control within a single, monolithic VLA model. CoA treats an abstracted action not as a command for a separate policy, but as an intermediate reasoning step--akin to a chain of thought--that guides the generation of the final, executable action. Furthermore, we demonstrate that an All-in-One agent trained on a diverse mixture of action spaces using the CoA paradigm learns a more robust and generalizable policy. This unified agent achieves a new state-of-the-art, improving the overall task success rate over strong, specialized baselines. To foster reproducible research, we release the OpenHA (Open Hierarchical Agents) suite, which includes our comprehensive benchmark of over 800 distinct tasks, curated datasets, source code, and all pretrained model checkpoints at https://github.com/CraftJarvis/OpenHA
Community
π Excited to share OpenHA here!
OpenHA is our new open-source framework for building hierarchical agentic models in Minecraft. It combines two key ideas:
π₯ Chain of Action (CoA) β using abstracted actions as thoughts to connect high-level reasoning with low-level control.
π₯ All-in-One training β unifying various actions space into a single model, so one agent can generalize across many task domains instead of being a specialist.
The showcased results come from minecraft-openha-qwen2vl-7b-2509, which demonstrates precise control, outperforming models trained on a single action type.
Weβre releasing everything openly β checkpoints, datasets, and code. For details, check out our repository.
Looking forward to your feedback and discussions from the community! π‘
Models citing this paper 4
Datasets citing this paper 5
Browse 5 datasets citing this paperSpaces citing this paper 0
No Space linking this paper