tiedeman commited on
Commit
6e11a5f
1 Parent(s): 08affb0

small change in model card

Browse files
Files changed (1) hide show
  1. README.md +14 -9
README.md CHANGED
@@ -10,11 +10,11 @@ license: cc-by-4.0
10
  ---
11
  # opus-mt-tc-base-gmw-gmw
12
 
13
- Neural machine translation model for translating from West Germanic languages to West Germanic languages.
14
 
15
- This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation writtin in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
16
 
17
- * Publications: [OPUS-MT – Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) , [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/)
18
 
19
  ## Model info
20
 
@@ -22,10 +22,11 @@ This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus
22
  * source language(s): afr deu eng fry gos hrx ltz nds nld pdc yid
23
  * target language(s): afr deu eng fry nds nld
24
  * valid target language labels: >>afr<< >>ang_Latn<< >>deu<< >>eng<< >>fry<< >>ltz<< >>nds<< >>nld<< >>sco<< >>yid<<
25
- * model: transformer
26
- * data: opus
27
  * tokenization: SentencePiece (spm32k,spm32k)
28
  * original model: [opus-2021-02-23.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.zip)
 
29
 
30
  This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of `>>id<<` (id = valid target language ID), e.g. `>>afr<<`
31
 
@@ -36,11 +37,14 @@ You can use OPUS-MT models with the transformers pipelines, for example:
36
  ```python
37
  from transformers import pipeline
38
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-gmw-gmw")
39
- print(pipe(">>afr<< Replace this with text in an accepted source language.")
40
  ```
41
 
42
  ## Benchmarks
43
 
 
 
 
44
  | langpair | testset | BLEU | chr-F | #sent | #words | BP |
45
  |----------|---------|-------|-------|-------|--------|----|
46
  | afr-deu | Tatoeba-test | 48.5 | 0.677 | 1583 | 9105 | 1.000 |
@@ -99,12 +103,13 @@ print(pipe(">>afr<< Replace this with text in an accepted source language.")
99
  | pdc-eng | Tatoeba-test | 24.3 | 0.402 | 53 | 399 | 1.000 |
100
  | yid-nld | Tatoeba-test | 21.3 | 0.402 | 55 | 323 | 1.000 |
101
 
102
- * test set translations: [opus-2021-02-23.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.test.txt)
103
- * test set scores: [opus-2021-02-23.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.eval.txt)
 
104
 
105
  ## Model conversion info
106
 
107
  * transformers version: 4.12.3
108
  * OPUS-MT git hash: b250e2e
109
- * port time: Thu Jan 27 22:41:37 EET 2022
110
  * port machine: LM0-400-22516.local
 
10
  ---
11
  # opus-mt-tc-base-gmw-gmw
12
 
13
+ Neural machine translation model for translating from West Germanic languages (gmw) to West Germanic languages (gmw).
14
 
15
+ This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
16
 
17
+ * Publications: [OPUS-MT – Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
18
 
19
  ## Model info
20
 
 
22
  * source language(s): afr deu eng fry gos hrx ltz nds nld pdc yid
23
  * target language(s): afr deu eng fry nds nld
24
  * valid target language labels: >>afr<< >>ang_Latn<< >>deu<< >>eng<< >>fry<< >>ltz<< >>nds<< >>nld<< >>sco<< >>yid<<
25
+ * model: transformer (base)
26
+ * data: opus ([source](https://github.com/Helsinki-NLP/Tatoeba-Challenge))
27
  * tokenization: SentencePiece (spm32k,spm32k)
28
  * original model: [opus-2021-02-23.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.zip)
29
+ * more information: [OPUS-MT gmw-gmw README](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/gmw-gmw/README.md)
30
 
31
  This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of `>>id<<` (id = valid target language ID), e.g. `>>afr<<`
32
 
 
37
  ```python
38
  from transformers import pipeline
39
  pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-gmw-gmw")
40
+ print(pipe(">>afr<< Replace this with text in an accepted source language."))
41
  ```
42
 
43
  ## Benchmarks
44
 
45
+ * test set translations: [opus-2021-02-23.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.test.txt)
46
+ * test set scores: [opus-2021-02-23.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.eval.txt)
47
+
48
  | langpair | testset | BLEU | chr-F | #sent | #words | BP |
49
  |----------|---------|-------|-------|-------|--------|----|
50
  | afr-deu | Tatoeba-test | 48.5 | 0.677 | 1583 | 9105 | 1.000 |
 
103
  | pdc-eng | Tatoeba-test | 24.3 | 0.402 | 53 | 399 | 1.000 |
104
  | yid-nld | Tatoeba-test | 21.3 | 0.402 | 55 | 323 | 1.000 |
105
 
106
+ ## Acknowledgements
107
+
108
+ The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
109
 
110
  ## Model conversion info
111
 
112
  * transformers version: 4.12.3
113
  * OPUS-MT git hash: b250e2e
114
+ * port time: Thu Jan 27 23:08:33 EET 2022
115
  * port machine: LM0-400-22516.local