TESS 2 RM

This model is the reward model used for reward guidance decoding. This model was finetuned from Mistral 7B v0.1, first instruction tuning using the Tulu 2 SFT mixture, and then RM-trained using the preference dataset mixture found here. For more details, please check out our paper TESS-2: A Large-Scale, Generalist Diffusion Language Model.

Using this model

This model is intended to be used with the repository https://github.com/hamishivi/tess-2 for guiding diffusion LM generations.

To run to this, first clone https://github.com/hamishivi/tess-2.

Then, to run guidance with TESS 2 and this RM

export OPENAI_API_KEY=<your openai key>
export IS_ALPACA_EVAL_2=False
shell_scripts/run_guidance.sh hamishivi/tess2-v0.1 hamishivi/tess_mistral_rm 0.5 alpaca_eval

Note that this requires a 80GB GPU to fit everything into memory.

Citation

If you find this work useful, please cite this work as follows.

@misc{taeivison2025tess2,
  title={{TESS 2: A Large-Scale Generalist Diffusion Language Model}},
  author={Jaesung Tae and Hamish Ivison and Sachin Kumar and Arman Cohan},
  year={2025},
  eprint={2502.13917},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2502.13917},
 }

hamishivi
/

mistral_tess2_rm

TESS 2 RM

Using this model

Citation

Model tree for hamishivi/mistral_tess2_rm

Datasets used to train hamishivi/mistral_tess2_rm

Collection including hamishivi/mistral_tess2_rm

TESS 2