File size: 1,784 Bytes
9e8ba14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1c1b6a
9e8ba14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: mit
language:
  - en
  - nl
tags:
  - machine-translation
  - low-resource
  - creativity
library_name: transformers
pipeline_tag: translation
model-index:
  - name: EN-DE  EN-NL  Creative
    results:
      - task:
          type: machine-translation
          name: Translation
        dataset:
          name: Dutch Parallel Corpus + OpenSubtitles (creative subset)
          type: Helsinki-NLP/open_subtitles
          split: test
        metrics:
          - type: sacrebleu
            name: SacreBLEU
            value: 18.35
            greater_is_better: true
---

# EN-DE parent ➜ EN-NL fine-tuned on creative corpus

**Authors:** Niek Holter  
**Thesis:** “Transferring Creativity”

## Summary
This model starts from Helsinki-NLP’s MarianMT `opus-mt-en-de` and is fine-tuned on a 10k-sentence **creative** English–Dutch corpus (fiction + subtitles).  
It is one of four systems trained for my bachelor’s thesis to study how transfer-learning settings affect MT creativity.

| Parent model | Fine-tune data | BLEU | COMET | Transformed Creativity Score |
|-------------|----------------|------|-------|------------------|
| en-de       | Creative       | 18.4 | 0.662 | 0.42 |

## Intended use
* Research on creative MT and low-resource transfer learning

## Training details
* Hardware  : NVIDIA GTX 1070 (CUDA 12.1)  
* Epochs     : Early-stopped ≤ 200 (patience 5)  
* LR / batch : 2 e-5 / 16  
* Script     : [`finetuning.py`](./finetuning.py)  
* Env        : [`environment.yml`](./environment.yml)

## Data
* **Creative corpus** (7.6 k fiction sentences from DPC + 2.4 k OpenSubtitles).  
* Sentence-level 1:1 alignments; deduplicated to avoid leakage.  
See https://github.com/muniekstache/Transfer-Creativity.git for full pipeline.