File size: 1,766 Bytes
1461e7f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
license: mit
language:
- en
- nl
tags:
- machine-translation
- low-resource
- creativity
library_name: transformers
pipeline_tag: translation
model-index:
- name: EN-FR → EN-NL • Non-Creative
results:
- task:
type: machine-translation
name: Translation
dataset:
name: Dutch Parallel Corpus Journalistic texts
type: Helsinki-NLP/open_subtitles
split: test
metrics:
- type: sacrebleu
name: SacreBLEU
value: 9.950
greater_is_better: true
---
# EN-DE parent ➜ EN-NL fine-tuned on creative corpus
**Authors:** Niek Holter
**Thesis:** “Transferring Creativity”
## Summary
This model starts from Helsinki-NLP’s MarianMT `opus-mt-en-fr` and is fine-tuned on a 10k-sentence **non-creative** English–Dutch corpus (Journalistic texts).
It is one of four systems trained for my bachelor’s thesis to study how transfer-learning settings affect MT creativity.
| Parent model | Fine-tune data | BLEU | COMET | Transformer Creativity Score |
|-------------|----------------|------|-------|------------------|
| en-de | Creative | 9.950 | 0.574 | 0.34 |
## Intended use
* Research on creative MT and low-resource transfer learning
## Training details
* Hardware : NVIDIA GTX 1070 (CUDA 12.1)
* Epochs : Early-stopped ≤ 200 (patience 5)
* LR / batch : 2 e-5 / 16
* Script : [`finetuning.py`](./finetuning.py)
* Env : [`environment.yml`](./environment.yml)
## Data
* **Non-Creative corpus** 10k sentences from DPC Journalistic texts.
* Sentence-level 1:1 alignments; deduplicated to avoid leakage.
See https://github.com/muniekstache/Transfer-Creativity.git for full pipeline. |