prashant-jayan21
/

Llama-CRAFTS

Text Generation

action-prediction

grounded-instruction-following

task-oriented-dialog

spatial-reasoning

Model card Files Files and versions

Llama-CRAFTS / README.md

prashant-jayan21's picture

prashant-jayan21

Update README.md

5643219 verified about 2 months ago

|

history blame contribute delete

3.27 kB

	---
	pipeline_tag: text-generation
	tags:
	- minecraft
	- action-prediction
	- grounded-instruction-following
	- task-oriented-dialog
	- blocks-world
	- embodied-ai
	- synthetic-data
	- spatial-reasoning
	language:
	- en
	license: llama3
	base_model:
	- meta-llama/Meta-Llama-3-8B
	metrics:
	- f1
	---

	# Llama-CRAFTS: A Minecraft Builder Action Prediction Model

	Llama-CRAFTS (Context Rich And Fine-Tuned On Synthetic Data) is a Llama-3-8B model fine-tuned for the Builder Action Prediction (BAP) task in Minecraft. The model predicts a sequence of block placements or removals based on the current game context.

	This model establishes a new state-of-the-art on the task, achieving an F1 score of 53.0—a 6-point improvement over the previous SOTA ([Nebula](https://arxiv.org/abs/2406.18164)). Its development is part of a holistic re-examination of the BAP task itself, introducing an improved evaluation framework, new synthetic datasets, and enhanced modeling techniques, thereby forming BAP v2, an enchanced task framework.

	### Key Features:
	* State-of-the-Art Performance: Achieves the highest score on the BAP v2 benchmark.
	* Trained on Rich Synthetic Data: In addition to the original Minecraft BAP data, Llama-CRAFTS was fine-tuned on three novel synthetic datasets specifically designed to teach complex spatial reasoning and instruction following.
	* Context-Rich Inputs: The model leverages richer textual input representations of the game context, which proved crucial for improving spatial awareness.

	## Model Details

	### Model Description
	* Model type: A Llama-3-8B model fine-tuned using QLoRA.
	* Language(s): English
	* Finetuned from model: `meta-llama/Meta-Llama-3-8B`

	### Training Data

	Llama-CRAFTS was trained on the BAP v2 training set, which is a combination of:

	- The original BAP Dataset: The original human-human dialogue and game logs in the Minecraft Dialogue Corpus
	- Three Synthetic Datasets: Novel datasets generated to provide rich, targeted examples of spatial language for instruction following. These were crucial for overcoming data scarcity and teaching the model spatial skills.

	### Evaluation

	The model was evaluated on the BAP v2 benchmark, which features a cleaner test set and fairer, more insightful metrics to better assess model capabilities, including spatial reasoning.

	## Model Sources

	- Paper: [BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues](https://arxiv.org/abs/2501.10836)
	- Code and Data: [https://github.com/prashant-jayan21/bap-v2](https://github.com/prashant-jayan21/bap-v2)
	- Blog: https://www.alphaxiv.org/overview/2501.10836v3

	## Citation

	If you use this model, please cite our work:

	```bibtex
	@misc{jayannavar2025bapv2enhancedtask,
	title={BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues},
	author={Prashant Jayannavar and Liliang Ren and Marisa Hudspeth and Risham Sidhu and Charlotte Lambert and Ariel Cordes and Elizabeth Kaplan and Anjali Narayan-Chen and Julia Hockenmaier},
	year={2025},
	eprint={2501.10836},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2501.10836},
	}
	```