TESS 2 v0.3 Symbolic - A Math-specific Tuned Diffusion LM
This model is the TESS 2 model trained on GSM8k symbolic data found here, adapted from here. This model is a simplex-based diffusion model adapted from Mistral v0.1 7B, further trained on Dolma 1.7 and Tulu 2 SFT data. For more details, please check out our paper TESS-2: A Large-Scale, Generalist Diffusion Language Model. This is the model based on Mistral v0.3 and trained on GSM8k data.
This model will only work with our custom codebase found here -- please go there to see details on how to run training and inference.
Using this model
To run this model, first clone https://github.com/hamishivi/tess-2.
Then, after creating a python environment with the correct packages, you can run inference via a ui with:
./shell_scripts/run_interactive_demo.sh hamishivi/tess2-v0.3
This allows you to directly interact with the model, and shows the diffusion generation process. For training or other evaluations, please see our main repository.
Citation
If you find this work useful, please cite this work as follows.
@misc{taeivison2025tess2,
title={{TESS 2: A Large-Scale Generalist Diffusion Language Model}},
author={Jaesung Tae and Hamish Ivison and Sachin Kumar and Arman Cohan},
year={2025},
eprint={2502.13917},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.13917},
}
- Downloads last month
- 3