YAML Metadata
Error:
"datasets[1]" with value "ParaCrawl 99k" is not valid. If possible, use a dataset id from https://hf.co/datasets.
YAML Metadata
Error:
"datasets[5]" with value "Biomedical Domain Corpora" is not valid. If possible, use a dataset id from https://hf.co/datasets.
Introduction
This repository brings an implementation of T5 for translation in PT-EN tasks using a modest hardware setup. We propose some changes in tokenizator and post-processing that improves the result and used a Portuguese pretrained model for the translation. You can collect more informations in our repository. Also, check our paper!
Usage
Just follow "Use in Transformers" instructions. It is necessary to add a few words before to define the task to T5.
You can also create a pipeline for it. An example with the phrase " Eu gosto de comer arroz" is:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("unicamp-dl/translation-pt-en-t5")
model = AutoModelForSeq2SeqLM.from_pretrained("unicamp-dl/translation-pt-en-t5")
pten_pipeline = pipeline('text2text-generation', model=model, tokenizer=tokenizer)
pten_pipeline("translate Portuguese to English: Eu gosto de comer arroz.")
Citation
@inproceedings{lopes-etal-2020-lite,
title = "Lite Training Strategies for {P}ortuguese-{E}nglish and {E}nglish-{P}ortuguese Translation",
author = "Lopes, Alexandre and
Nogueira, Rodrigo and
Lotufo, Roberto and
Pedrini, Helio",
booktitle = "Proceedings of the Fifth Conference on Machine Translation",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.wmt-1.90",
pages = "833--840",
}
- Downloads last month
- 336
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.