javi8979 commited on
Commit
de751fb
·
verified ·
1 Parent(s): 9cd8273

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -39,7 +39,7 @@ This is the model card of Plume (**P**arallel **L**ang**u**age **M**od**e**l) wi
39
 
40
  ## Summary
41
 
42
- Plume is the first LLM trained for Neural Machine Translation with only parallel Catalan-Centric data from scratch. It is a language model with the same architecture as Gemma 2B. The model is trained for general translation tasks at sentence level. For more information about training, architecture and interpretability of the model check out the paper; "Investigating the translation capabilities of Large Language Models trained on parallel data only". The preprint is available on [arXiv]().
43
 
44
  - **Developed by:** The Language Technologies Unit from Barcelona Supercomputing Center (BSC).
45
  - **Languages:** Spanish, French, Italian, Portuguese, Galician, German, English, and Basque.
@@ -49,7 +49,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
49
 
50
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
51
 
52
- For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv](https://arxiv.org/abs/2406.09140).
53
 
54
  ## Intended Uses and Limitations
55
 
@@ -98,11 +98,11 @@ For training, the learning rate is warmed up from 1e-7 to a maximum of 3e-4 over
98
  | Warmup Steps | 2000 |
99
 
100
 
101
- More training details are specified in the [paper](). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
102
 
103
  ## Evaluation
104
 
105
- Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper]().
106
 
107
  | Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
108
  |----------------------|-------------|--------------|------------|-------------|
 
39
 
40
  ## Summary
41
 
42
+ Plume is the first LLM trained for Neural Machine Translation with only parallel Catalan-Centric data from scratch. It is a language model with the same architecture as Gemma 2B. The model is trained for general translation tasks at sentence level. For more information about training, architecture and interpretability of the model check out the paper; "Investigating the translation capabilities of Large Language Models trained on parallel data only". The preprint is available on [arXiv](https://arxiv.org/abs/2406.09140).
43
 
44
  - **Developed by:** The Language Technologies Unit from Barcelona Supercomputing Center (BSC).
45
  - **Languages:** Spanish, French, Italian, Portuguese, Galician, German, English, and Basque.
 
49
 
50
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
51
 
52
+ For more details regarding the model architecture, the dataset and model interpretability take a look at the paper.
53
 
54
  ## Intended Uses and Limitations
55
 
 
98
  | Warmup Steps | 2000 |
99
 
100
 
101
+ More training details are specified in the [paper](https://arxiv.org/abs/2406.09140). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
102
 
103
  ## Evaluation
104
 
105
+ Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper](https://arxiv.org/abs/2406.09140).
106
 
107
  | Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
108
  |----------------------|-------------|--------------|------------|-------------|