chunwoolee0 commited on
Commit
304fb94
ยท
1 Parent(s): 8f4a423

End of training

Browse files
Files changed (3) hide show
  1. README.md +19 -40
  2. generation_config.json +1 -1
  3. pytorch_model.bin +1 -1
README.md CHANGED
@@ -11,48 +11,30 @@ model-index:
11
  results: []
12
  ---
13
 
 
 
 
14
  # mt5_small_bongsoo_en_ko
15
 
16
- This model is a fine-tuned version of [chunwoolee0/mt5_small_bongsoo_en_ko](https://huggingface.co/chunwoolee/mt5_small_bongsoo_en_ko)
17
- on the [bongsoo/news_talk_en_ko](https://huggingface.co/datasets/bongsoo/news_talk_en_ko) dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 2.8778
20
- - Rouge1: 0.1662
21
- - Rouge2: 0.0237
22
- - Rougel: 0.1647
23
- - Sacrebleu: 0.4694
24
-
25
- See [translation_en_ko_mt5_small_bongsoo_news_talk.ipynb
26
- ](https://github.com/chunwoolee0/ko-nlp/blob/main/translation_en_ko_mt5_small_bongsoo_news_talk.ipynb)
27
 
28
  ## Model description
29
 
30
- mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages
31
 
32
  ## Intended uses & limitations
33
 
34
- Translation from English to Korean
35
-
36
- ## Usage
37
-
38
- You can use this model directly with a pipeline for translation language modeling:
39
-
40
- ```python
41
- >>> from transformers import pipeline
42
- >>> translator = pipeline('translation', model='chunwoolee0/ke_t5_base_bongsoo_en_ko')
43
-
44
- >>> translator("Let us go for a walk after lunch.")
45
- [{'translation_text': '์‹๋‹น์— ์•‰์•„์„œ ๋ฐค์— ๊ฐ”๋‹ค.'}]
46
-
47
- >>> translator("Skinner's reward is mostly eye-watering.")
48
- [{'translation_text': '๋ฒค๋”์˜ ์„ ๋ฌผ์€ ๋„ˆ๋ฌด ๋งˆ์Œ์ด ์ ๋ฆฐ๋‹ค.'}]
49
 
50
  ## Training and evaluation data
51
 
52
- The value of max_length is critical to the training. The usual value of 128 used for Indo-European languages causes a
53
- greate trouble in gpu usage. Therefore it should be reduced to 64 in order to succeed.
54
- Another problem comes from the usual split of data into 80% for train and 20% for validation. By this, the evaluation
55
- step takes too much time. Here 99% and 1% split is used without change in the evaluation.
56
 
57
  ## Training procedure
58
 
@@ -73,20 +55,17 @@ The following hyperparameters were used during training:
73
 
74
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Sacrebleu |
75
  |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|
76
- | 3.8338 | 0.16 | 500 | 2.9626 | 0.1475 | 0.0184 | 0.1455 | 0.4243 |
77
- | 3.7865 | 0.32 | 1000 | 2.9305 | 0.1529 | 0.0181 | 0.1508 | 0.4435 |
78
- | 3.7436 | 0.48 | 1500 | 2.9067 | 0.1572 | 0.019 | 0.155 | 0.4464 |
79
- | 3.7207 | 0.65 | 2000 | 2.8924 | 0.165 | 0.0233 | 0.1629 | 0.4532 |
80
- | 3.7022 | 0.81 | 2500 | 2.8825 | 0.1647 | 0.0231 | 0.1627 | 0.4504 |
81
- | 3.69 | 0.97 | 3000 | 2.8778 | 0.1662 | 0.0237 | 0.1647 | 0.4694 |
82
 
83
- The mT5 model of google cannot be used for Korean although it is trained over 101 languages. Finetuning
84
- using very large data set by bongsoo/news_talk_en_ko still yield results of garbage. One should use other
85
- models like the ke-t5 by KETI(ํ•œ๊ตญ์ „์ž์—ฐ๊ตฌ์›).
86
 
87
  ### Framework versions
88
 
89
- - Transformers 4.32.0
90
  - Pytorch 2.0.1+cu118
91
  - Datasets 2.14.4
92
  - Tokenizers 0.13.3
 
11
  results: []
12
  ---
13
 
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
  # mt5_small_bongsoo_en_ko
18
 
19
+ This model is a fine-tuned version of [chunwoolee0/mt5_small_bongsoo_en_ko](https://huggingface.co/chunwoolee0/mt5_small_bongsoo_en_ko) on the None dataset.
 
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 2.7805
22
+ - Rouge1: 0.1932
23
+ - Rouge2: 0.0394
24
+ - Rougel: 0.1895
25
+ - Sacrebleu: 0.4518
 
 
 
26
 
27
  ## Model description
28
 
29
+ More information needed
30
 
31
  ## Intended uses & limitations
32
 
33
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Training and evaluation data
36
 
37
+ More information needed
 
 
 
38
 
39
  ## Training procedure
40
 
 
55
 
56
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Sacrebleu |
57
  |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|
58
+ | 3.7067 | 0.16 | 500 | 2.8501 | 0.1852 | 0.0373 | 0.1814 | 0.4147 |
59
+ | 3.6609 | 0.32 | 1000 | 2.8230 | 0.1887 | 0.0383 | 0.1852 | 0.4362 |
60
+ | 3.6269 | 0.48 | 1500 | 2.8030 | 0.1911 | 0.0367 | 0.1874 | 0.4482 |
61
+ | 3.6052 | 0.65 | 2000 | 2.7882 | 0.1931 | 0.0383 | 0.1893 | 0.4458 |
62
+ | 3.5882 | 0.81 | 2500 | 2.7805 | 0.1932 | 0.0394 | 0.1895 | 0.4518 |
63
+ | 3.585 | 0.97 | 3000 | 2.7771 | 0.1925 | 0.0401 | 0.1886 | 0.4499 |
64
 
 
 
 
65
 
66
  ### Framework versions
67
 
68
+ - Transformers 4.32.1
69
  - Pytorch 2.0.1+cu118
70
  - Datasets 2.14.4
71
  - Tokenizers 0.13.3
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "decoder_start_token_id": 0,
3
  "eos_token_id": 1,
4
  "pad_token_id": 0,
5
- "transformers_version": "4.32.0"
6
  }
 
2
  "decoder_start_token_id": 0,
3
  "eos_token_id": 1,
4
  "pad_token_id": 0,
5
+ "transformers_version": "4.32.1"
6
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b4c5efa232223e9fb93618d2ca8c482880de765dbd7ebcde9406de7e58c641e3
3
  size 1200772613
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94260ecd072528727b62c781b2ddd568f52fe66f79a3d0772f9d0e063da18bf3
3
  size 1200772613