Translation
VisTrans / README.md
nielsr's picture
nielsr HF Staff
Add translation tag
87836d2 verified
|
raw
history blame
1.67 kB
---
license: mit
datasets:
- yztian/PRIM
language:
- en
- de
- fr
- cs
- ro
- ru
metrics:
- bleu
- comet
pipeline_tag: translation
---
# PRIM: Towards Practical In-Image Multilingual Machine Translation (EMNLP 2025 Main)
> [!NOTE]
> 📄Paper [arXiv](https://arxiv.org/abs/2509.05146) | 💻Code [GitHub](https://github.com/BITHLP/PRIM)
## Introduction
This repository provides the **VisTrans model**, trained as part of our work *PRIM: Towards Practical In-Image Multilingual Machine Translation*.
The VisTrans model is an end-to-end model for In-Image Machine Translation, which handles the visual text and background information in the image separately, with a two-stage training and multi-task learning strategy.
The model is trained on [MTedIIMT](https://huggingface.co/datasets/yztian/MTedIIMT), and tested on [PRIM](https://huggingface.co/datasets/yztian/PRIM) (see `./PRIM` directory). It is also trained and tested on [IIMT30k](https://huggingface.co/datasets/yztian/IIMT30k) (see `./IIMT30k` directory).
## Inference
For inference and detailed usage instructions, please refer to our [GitHub repository](https://github.com/BITHLP/PRIM).
## Citation
If you find our work helpful, we would greatly appreciate it if you could cite our paper:
```bibtex
@misc{tian2025primpracticalinimagemultilingual,
title={PRIM: Towards Practical In-Image Multilingual Machine Translation},
author={Yanzhi Tian and Zeming Liu and Zhengyang Liu and Chong Feng and Xin Li and Heyan Huang and Yuhang Guo},
year={2025},
eprint={2509.05146},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.05146},
}
```