Muniekstache commited on
Commit
9e8ba14
·
verified ·
1 Parent(s): d3cc3b3

create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - nl
6
+ tags:
7
+ - machine-translation
8
+ - low-resource
9
+ - creativity
10
+ library_name: transformers
11
+ pipeline_tag: translation
12
+ model-index:
13
+ - name: EN-DE → EN-NL • Creative
14
+ results:
15
+ - task:
16
+ type: machine-translation
17
+ name: Translation
18
+ dataset:
19
+ name: Dutch Parallel Corpus + OpenSubtitles (creative subset)
20
+ type: Helsinki-NLP/open_subtitles
21
+ split: test
22
+ metrics:
23
+ - type: sacrebleu
24
+ name: SacreBLEU
25
+ value: 18.35
26
+ greater_is_better: true
27
+ ---
28
+
29
+ # EN-DE parent ➜ EN-NL fine-tuned on creative corpus
30
+
31
+ **Authors:** Niek Holter
32
+ **Thesis:** “Transferring Creativity”
33
+
34
+ ## Summary
35
+ This model starts from Helsinki-NLP’s MarianMT `opus-mt-en-de` and is fine-tuned on a 10k-sentence **creative** English–Dutch corpus (fiction + subtitles).
36
+ It is one of four systems trained for my bachelor’s thesis to study how transfer-learning settings affect MT creativity.
37
+
38
+ | Parent model | Fine-tune data | BLEU | COMET | Transformer Creativity Score |
39
+ |-------------|----------------|------|-------|------------------|
40
+ | en-de | Creative | 18.4 | 0.662 | 0.42 |
41
+
42
+ ## Intended use
43
+ * Research on creative MT and low-resource transfer learning
44
+
45
+ ## Training details
46
+ * Hardware  : NVIDIA GTX 1070 (CUDA 12.1)
47
+ * Epochs : Early-stopped ≤ 200 (patience 5)
48
+ * LR / batch : 2 e-5 / 16
49
+ * Script : [`finetuning.py`](./finetuning.py)
50
+ * Env : [`environment.yml`](./environment.yml)
51
+
52
+ ## Data
53
+ * **Creative corpus** (7.6 k fiction sentences from DPC + 2.4 k OpenSubtitles).
54
+ * Sentence-level 1:1 alignments; deduplicated to avoid leakage.
55
+ See https://github.com/muniekstache/Transfer-Creativity.git for full pipeline.