Helsinki-NLP
/

opus-mt-tc-base-gmw-gmw

@@ -10,11 +10,11 @@ license: cc-by-4.0
 ---
 # opus-mt-tc-base-gmw-gmw
-Neural machine translation model for translating from West Germanic languages to West Germanic languages.
-This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation writtin in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
-* Publications: [OPUS-MT – Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) , [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/)
 ## Model info
@@ -22,10 +22,11 @@ This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus
 * source language(s): afr deu eng fry gos hrx ltz nds nld pdc yid
 * target language(s): afr deu eng fry nds nld
 * valid target language labels: >>afr<< >>ang_Latn<< >>deu<< >>eng<< >>fry<< >>ltz<< >>nds<< >>nld<< >>sco<< >>yid<<
-* model: transformer
-* data: opus
 * tokenization: SentencePiece (spm32k,spm32k)
 * original model: [opus-2021-02-23.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.zip)
 This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of `>>id<<` (id = valid target language ID), e.g. `>>afr<<`
@@ -36,11 +37,14 @@ You can use OPUS-MT models with the transformers pipelines, for example:
 ```python
 from transformers import pipeline
 pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-gmw-gmw")
-print(pipe(">>afr<< Replace this with text in an accepted source language.")
 ```
 ## Benchmarks
 | langpair | testset | BLEU  | chr-F | #sent | #words | BP |
 |----------|---------|-------|-------|-------|--------|----|
 | afr-deu | Tatoeba-test 	| 48.5 	| 0.677 	| 1583 	| 9105 	| 1.000 |
@@ -99,12 +103,13 @@ print(pipe(">>afr<< Replace this with text in an accepted source language.")
 | pdc-eng | Tatoeba-test 	| 24.3 	| 0.402 	| 53 	| 399 	| 1.000 |
 | yid-nld | Tatoeba-test 	| 21.3 	| 0.402 	| 55 	| 323 	| 1.000 |
-* test set translations: [opus-2021-02-23.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.test.txt)
-* test set scores: [opus-2021-02-23.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.eval.txt)
 ## Model conversion info
 * transformers version: 4.12.3
 * OPUS-MT git hash: b250e2e
-* port time: Thu Jan 27 22:41:37 EET 2022
 * port machine: LM0-400-22516.local

 ---
 # opus-mt-tc-base-gmw-gmw
+Neural machine translation model for translating from West Germanic languages (gmw) to West Germanic languages (gmw).
+This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
+* Publications: [OPUS-MT – Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
 ## Model info
 * source language(s): afr deu eng fry gos hrx ltz nds nld pdc yid
 * target language(s): afr deu eng fry nds nld
 * valid target language labels: >>afr<< >>ang_Latn<< >>deu<< >>eng<< >>fry<< >>ltz<< >>nds<< >>nld<< >>sco<< >>yid<<
+* model: transformer (base)
+* data: opus ([source](https://github.com/Helsinki-NLP/Tatoeba-Challenge))
 * tokenization: SentencePiece (spm32k,spm32k)
 * original model: [opus-2021-02-23.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.zip)
+* more information: [OPUS-MT gmw-gmw README](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/gmw-gmw/README.md)
 This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of `>>id<<` (id = valid target language ID), e.g. `>>afr<<`
 ```python
 from transformers import pipeline
 pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-gmw-gmw")
+print(pipe(">>afr<< Replace this with text in an accepted source language."))
 ```
 ## Benchmarks
+* test set translations: [opus-2021-02-23.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.test.txt)
+* test set scores: [opus-2021-02-23.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.eval.txt)
 | langpair | testset | BLEU  | chr-F | #sent | #words | BP |
 |----------|---------|-------|-------|-------|--------|----|
 | afr-deu | Tatoeba-test 	| 48.5 	| 0.677 	| 1583 	| 9105 	| 1.000 |
 | pdc-eng | Tatoeba-test 	| 24.3 	| 0.402 	| 53 	| 399 	| 1.000 |
 | yid-nld | Tatoeba-test 	| 21.3 	| 0.402 	| 55 	| 323 	| 1.000 |
+## Acknowledgements
+The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
 ## Model conversion info
 * transformers version: 4.12.3
 * OPUS-MT git hash: b250e2e
+* port time: Thu Jan 27 23:08:33 EET 2022
 * port machine: LM0-400-22516.local