small change in model card
Browse files
README.md
CHANGED
@@ -10,11 +10,11 @@ license: cc-by-4.0
|
|
10 |
---
|
11 |
# opus-mt-tc-base-gmw-gmw
|
12 |
|
13 |
-
Neural machine translation model for translating from West Germanic languages to West Germanic languages.
|
14 |
|
15 |
-
This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation
|
16 |
|
17 |
-
* Publications: [OPUS-MT – Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/)
|
18 |
|
19 |
## Model info
|
20 |
|
@@ -22,10 +22,11 @@ This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus
|
|
22 |
* source language(s): afr deu eng fry gos hrx ltz nds nld pdc yid
|
23 |
* target language(s): afr deu eng fry nds nld
|
24 |
* valid target language labels: >>afr<< >>ang_Latn<< >>deu<< >>eng<< >>fry<< >>ltz<< >>nds<< >>nld<< >>sco<< >>yid<<
|
25 |
-
* model: transformer
|
26 |
-
* data: opus
|
27 |
* tokenization: SentencePiece (spm32k,spm32k)
|
28 |
* original model: [opus-2021-02-23.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.zip)
|
|
|
29 |
|
30 |
This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of `>>id<<` (id = valid target language ID), e.g. `>>afr<<`
|
31 |
|
@@ -36,11 +37,14 @@ You can use OPUS-MT models with the transformers pipelines, for example:
|
|
36 |
```python
|
37 |
from transformers import pipeline
|
38 |
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-gmw-gmw")
|
39 |
-
print(pipe(">>afr<< Replace this with text in an accepted source language.")
|
40 |
```
|
41 |
|
42 |
## Benchmarks
|
43 |
|
|
|
|
|
|
|
44 |
| langpair | testset | BLEU | chr-F | #sent | #words | BP |
|
45 |
|----------|---------|-------|-------|-------|--------|----|
|
46 |
| afr-deu | Tatoeba-test | 48.5 | 0.677 | 1583 | 9105 | 1.000 |
|
@@ -99,12 +103,13 @@ print(pipe(">>afr<< Replace this with text in an accepted source language.")
|
|
99 |
| pdc-eng | Tatoeba-test | 24.3 | 0.402 | 53 | 399 | 1.000 |
|
100 |
| yid-nld | Tatoeba-test | 21.3 | 0.402 | 55 | 323 | 1.000 |
|
101 |
|
102 |
-
|
103 |
-
|
|
|
104 |
|
105 |
## Model conversion info
|
106 |
|
107 |
* transformers version: 4.12.3
|
108 |
* OPUS-MT git hash: b250e2e
|
109 |
-
* port time: Thu Jan 27
|
110 |
* port machine: LM0-400-22516.local
|
|
|
10 |
---
|
11 |
# opus-mt-tc-base-gmw-gmw
|
12 |
|
13 |
+
Neural machine translation model for translating from West Germanic languages (gmw) to West Germanic languages (gmw).
|
14 |
|
15 |
+
This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
|
16 |
|
17 |
+
* Publications: [OPUS-MT – Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
|
18 |
|
19 |
## Model info
|
20 |
|
|
|
22 |
* source language(s): afr deu eng fry gos hrx ltz nds nld pdc yid
|
23 |
* target language(s): afr deu eng fry nds nld
|
24 |
* valid target language labels: >>afr<< >>ang_Latn<< >>deu<< >>eng<< >>fry<< >>ltz<< >>nds<< >>nld<< >>sco<< >>yid<<
|
25 |
+
* model: transformer (base)
|
26 |
+
* data: opus ([source](https://github.com/Helsinki-NLP/Tatoeba-Challenge))
|
27 |
* tokenization: SentencePiece (spm32k,spm32k)
|
28 |
* original model: [opus-2021-02-23.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.zip)
|
29 |
+
* more information: [OPUS-MT gmw-gmw README](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/gmw-gmw/README.md)
|
30 |
|
31 |
This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of `>>id<<` (id = valid target language ID), e.g. `>>afr<<`
|
32 |
|
|
|
37 |
```python
|
38 |
from transformers import pipeline
|
39 |
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-base-gmw-gmw")
|
40 |
+
print(pipe(">>afr<< Replace this with text in an accepted source language."))
|
41 |
```
|
42 |
|
43 |
## Benchmarks
|
44 |
|
45 |
+
* test set translations: [opus-2021-02-23.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.test.txt)
|
46 |
+
* test set scores: [opus-2021-02-23.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-gmw/opus-2021-02-23.eval.txt)
|
47 |
+
|
48 |
| langpair | testset | BLEU | chr-F | #sent | #words | BP |
|
49 |
|----------|---------|-------|-------|-------|--------|----|
|
50 |
| afr-deu | Tatoeba-test | 48.5 | 0.677 | 1583 | 9105 | 1.000 |
|
|
|
103 |
| pdc-eng | Tatoeba-test | 24.3 | 0.402 | 53 | 399 | 1.000 |
|
104 |
| yid-nld | Tatoeba-test | 21.3 | 0.402 | 55 | 323 | 1.000 |
|
105 |
|
106 |
+
## Acknowledgements
|
107 |
+
|
108 |
+
The work is supported by the [European Language Grid](https://www.european-language-grid.eu/) as [pilot project 2866](https://live.european-language-grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural-language-understanding-with-cross-lingual-grounding), funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the [MeMAD project](https://memad.eu/), funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland.
|
109 |
|
110 |
## Model conversion info
|
111 |
|
112 |
* transformers version: 4.12.3
|
113 |
* OPUS-MT git hash: b250e2e
|
114 |
+
* port time: Thu Jan 27 23:08:33 EET 2022
|
115 |
* port machine: LM0-400-22516.local
|