yztian
/

VisTrans

Model card Files Files and versions Community

VisTrans / README.md

nielsr's picture

nielsr HF Staff

Add translation tag

87836d2 verified 14 days ago

|

1.67 kB

	---
	license: mit
	datasets:
	- yztian/PRIM
	language:
	- en
	- de
	- fr
	- cs
	- ro
	- ru
	metrics:
	- bleu
	- comet
	pipeline_tag: translation
	---

	# PRIM: Towards Practical In-Image Multilingual Machine Translation (EMNLP 2025 Main)

	> [!NOTE]
	> 📄Paper [arXiv](https://arxiv.org/abs/2509.05146) \| 💻Code [GitHub](https://github.com/BITHLP/PRIM)

	## Introduction

	This repository provides the VisTrans model, trained as part of our work PRIM: Towards Practical In-Image Multilingual Machine Translation.

	The VisTrans model is an end-to-end model for In-Image Machine Translation, which handles the visual text and background information in the image separately, with a two-stage training and multi-task learning strategy.

	The model is trained on [MTedIIMT](https://huggingface.co/datasets/yztian/MTedIIMT), and tested on [PRIM](https://huggingface.co/datasets/yztian/PRIM) (see `./PRIM` directory). It is also trained and tested on [IIMT30k](https://huggingface.co/datasets/yztian/IIMT30k) (see `./IIMT30k` directory).

	## Inference

	For inference and detailed usage instructions, please refer to our [GitHub repository](https://github.com/BITHLP/PRIM).


	## Citation

	If you find our work helpful, we would greatly appreciate it if you could cite our paper:

	```bibtex
	@misc{tian2025primpracticalinimagemultilingual,
	title={PRIM: Towards Practical In-Image Multilingual Machine Translation},
	author={Yanzhi Tian and Zeming Liu and Zhengyang Liu and Chong Feng and Xin Li and Heyan Huang and Yuhang Guo},
	year={2025},
	eprint={2509.05146},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2509.05146},
	}
	```