File size: 3,273 Bytes

---
pipeline_tag: text-generation
tags:
- minecraft
- action-prediction
- grounded-instruction-following
- task-oriented-dialog
- blocks-world
- embodied-ai
- synthetic-data
- spatial-reasoning
language:
- en
license: llama3
base_model:
- meta-llama/Meta-Llama-3-8B
metrics:
- f1
---

# Llama-CRAFTS: A Minecraft Builder Action Prediction Model

**Llama-CRAFTS** (**C**ontext **R**ich **A**nd **F**ine-**T**uned On **S**ynthetic Data) is a Llama-3-8B model fine-tuned for the **Builder Action Prediction (BAP)** task in Minecraft. The model predicts a sequence of block placements or removals based on the current game context.

This model establishes a new **state-of-the-art** on the task, achieving an F1 score of **53.0**—a 6-point improvement over the previous SOTA ([Nebula](https://arxiv.org/abs/2406.18164)). Its development is part of a holistic re-examination of the BAP task itself, introducing an improved evaluation framework, new synthetic datasets, and enhanced modeling techniques, thereby forming **BAP v2**, an enchanced task framework.

### Key Features:
* **State-of-the-Art Performance**: Achieves the highest score on the BAP v2 benchmark.
* **Trained on Rich Synthetic Data**: In addition to the original Minecraft BAP data, Llama-CRAFTS was fine-tuned on three novel synthetic datasets specifically designed to teach complex spatial reasoning and instruction following.
* **Context-Rich Inputs**: The model leverages richer textual input representations of the game context, which proved crucial for improving spatial awareness.

## Model Details

### Model Description
* **Model type**: A Llama-3-8B model fine-tuned using QLoRA.
* **Language(s)**: English
* **Finetuned from model**: `meta-llama/Meta-Llama-3-8B`

### Training Data

Llama-CRAFTS was trained on the **BAP v2 training set**, which is a combination of:

- **The original BAP Dataset:** The original human-human dialogue and game logs in the Minecraft Dialogue Corpus
- **Three Synthetic Datasets:** Novel datasets generated to provide rich, targeted examples of spatial language for instruction following. These were crucial for overcoming data scarcity and teaching the model spatial skills.

### Evaluation

The model was evaluated on the **BAP v2 benchmark**, which features a cleaner test set and fairer, more insightful metrics to better assess model capabilities, including spatial reasoning.

## Model Sources

- **Paper:** [*BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues*](https://arxiv.org/abs/2501.10836)
- **Code and Data:** [https://github.com/prashant-jayan21/bap-v2](https://github.com/prashant-jayan21/bap-v2)
- **Blog:** https://www.alphaxiv.org/overview/2501.10836v3

## Citation

If you use this model, please cite our work:

```bibtex
@misc{jayannavar2025bapv2enhancedtask,
      title={BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues}, 
      author={Prashant Jayannavar and Liliang Ren and Marisa Hudspeth and Risham Sidhu and Charlotte Lambert and Ariel Cordes and Elizabeth Kaplan and Anjali Narayan-Chen and Julia Hockenmaier},
      year={2025},
      eprint={2501.10836},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.10836}, 
}
```