--- license: mit --- # PaLe-MADLAD The MADLAD-400 model fine-tuned to translate from Proper Karelian, Livvi, Ludian, and Veps to Russian and vice versa. We call this model **Pa**ragraph-**Le**vel as we trained it on paragraphs comprising multiple sentences. The model demonstrates the capacity to handle gender-neutral pronouns (presenting a major obstacle in translating from Finno-Ugric languages) and other discourse-level phenomena. ## Example Usage for Inference ```` from transformers import AutoModelForSeq2SeqLM, AutoTokenizer model = AutoModelForSeq2SeqLM.from_pretrained('tartuNLP/pale-madlad-mt') tokenizer = AutoTokenizer.from_pretrained('tartuNLP/pale-madlad-mt') # You need to explicitly prepend a target language tag to the input string in the format <2xx>, where xx stands for the language code. # Language codes: 'krl' for Proper Karelian, 'lud' for Ludian, 'olo' for Livvi, 'vep' for Veps, 'ru' for Russian, 'en' for English. text = '<2krl>' + 'Здравствуйте!' inputs = tokenizer(text, return_tensors='pt').input_ids outputs = model.generate(inputs) tokenizer.decode(outputs[0], skip_special_tokens=True) # Output: Terveh! ```` Please cite the following paper if you use this model in your work: ``` @inproceedings{ pashchenko2024paragraphlevel, title={Paragraph-Level Machine Translation for Low-Resource Finno-Ugric Languages}, author={Dmytro Pashchenko and Lisa Yankovskaya and Mark Fishel}, booktitle={The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies}, year={2024}, url={https://openreview.net/forum?id=uTFJsQpNZk} } ```