mbart-large-50-finetuned-es-en-2

This model is a fine-tuned version of facebook/mbart-large-50 on the Helsinki-NLP/opus-100.

Model description

mBART-50 is a multilingual Sequence-to-Sequence model pre-trained using the "Multilingual Denoising Pretraining" objective. It was introduced in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper.

mBART-50 is a multilingual Sequence-to-Sequence model. It was created to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning on one direction, a pre-trained model is fine-tuned many directions simultaneously. mBART-50 is created using the original mBART model and extended to add extra 25 languages to support multilingual machine translation models of 50 languages. The pre-training objective is explained below. Multilingual Denoising Pretraining: The model incorporates N languages by concatenating data: D = {D1, ..., DN } where each Di is a collection of monolingual documents in language i. The source documents are noised using two schemes, first randomly shuffling the original sentences' order, and second a novel in-filling scheme, where spans of text are replaced with a single mask token. The model is then tasked to reconstruct the original text. 35% of each instance's words are masked by random sampling a span length according to a Poisson distribution (λ = 3.5). The decoder input is the original text with one position offset. A language id symbol LID is used as the initial token to predict the sentence.

Intended uses & limitations

More information needed

Training and evaluation data

10,000 random samples from the dataset

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

BLEU SCORE: 25.6431

Framework versions

  • Transformers 4.40.0.dev0
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
58
Safetensors
Model size
611M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Gilito21/mbart-large-50-finetuned-es-en-2

Finetuned
(137)
this model

Dataset used to train Gilito21/mbart-large-50-finetuned-es-en-2