End of training

Browse files

Files changed (3) hide show

README.md +19 -40
generation_config.json +1 -1
pytorch_model.bin +1 -1

README.md CHANGED Viewed

@@ -11,48 +11,30 @@ model-index:
   results: []
 ---
 # mt5_small_bongsoo_en_ko
-This model is a fine-tuned version of [chunwoolee0/mt5_small_bongsoo_en_ko](https://huggingface.co/chunwoolee/mt5_small_bongsoo_en_ko)
-on the [bongsoo/news_talk_en_ko](https://huggingface.co/datasets/bongsoo/news_talk_en_ko) dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.8778
-- Rouge1: 0.1662
-- Rouge2: 0.0237
-- Rougel: 0.1647
-- Sacrebleu: 0.4694
-See [translation_en_ko_mt5_small_bongsoo_news_talk.ipynb
-](https://github.com/chunwoolee0/ko-nlp/blob/main/translation_en_ko_mt5_small_bongsoo_news_talk.ipynb)
 ## Model description
-mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages
 ## Intended uses & limitations
-Translation from English to Korean
-## Usage
-You can use this model directly with a pipeline for translation language modeling:
-```python
->>> from transformers import pipeline
->>> translator = pipeline('translation', model='chunwoolee0/ke_t5_base_bongsoo_en_ko')
->>> translator("Let us go for a walk after lunch.")
-[{'translation_text': '식당에 앉아서 밤에 갔다.'}]
->>> translator("Skinner's reward is mostly eye-watering.")
-[{'translation_text': '벤더의 선물은 너무 마음이 쏠린다.'}]
 ## Training and evaluation data
-The value of max_length is critical to the training. The usual value of 128 used for Indo-European languages causes a
-greate trouble in gpu usage. Therefore it should be reduced to 64 in order to succeed.
-Another problem comes from the usual split of data into 80% for train and 20% for validation. By this, the evaluation
-step takes too much time. Here 99% and 1% split is used without change in the evaluation.
 ## Training procedure
@@ -73,20 +55,17 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Sacrebleu |
 |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|
-| 3.8338        | 0.16  | 500  | 2.9626          | 0.1475 | 0.0184 | 0.1455 | 0.4243    |
-| 3.7865        | 0.32  | 1000 | 2.9305          | 0.1529 | 0.0181 | 0.1508 | 0.4435    |
-| 3.7436        | 0.48  | 1500 | 2.9067          | 0.1572 | 0.019  | 0.155  | 0.4464    |
-| 3.7207        | 0.65  | 2000 | 2.8924          | 0.165  | 0.0233 | 0.1629 | 0.4532    |
-| 3.7022        | 0.81  | 2500 | 2.8825          | 0.1647 | 0.0231 | 0.1627 | 0.4504    |
-| 3.69          | 0.97  | 3000 | 2.8778          | 0.1662 | 0.0237 | 0.1647 | 0.4694    |
-The mT5 model of google cannot be used for Korean although it is trained over 101 languages. Finetuning
-using very large data set by bongsoo/news_talk_en_ko still yield results of garbage. One should use other
-models like the ke-t5 by KETI(한국전자연구원).
 ### Framework versions
-- Transformers 4.32.0
 - Pytorch 2.0.1+cu118
 - Datasets 2.14.4
 - Tokenizers 0.13.3

   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
 # mt5_small_bongsoo_en_ko
+This model is a fine-tuned version of [chunwoolee0/mt5_small_bongsoo_en_ko](https://huggingface.co/chunwoolee0/mt5_small_bongsoo_en_ko) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.7805
+- Rouge1: 0.1932
+- Rouge2: 0.0394
+- Rougel: 0.1895
+- Sacrebleu: 0.4518
 ## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Sacrebleu |
 |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|
+| 3.7067        | 0.16  | 500  | 2.8501          | 0.1852 | 0.0373 | 0.1814 | 0.4147    |
+| 3.6609        | 0.32  | 1000 | 2.8230          | 0.1887 | 0.0383 | 0.1852 | 0.4362    |
+| 3.6269        | 0.48  | 1500 | 2.8030          | 0.1911 | 0.0367 | 0.1874 | 0.4482    |
+| 3.6052        | 0.65  | 2000 | 2.7882          | 0.1931 | 0.0383 | 0.1893 | 0.4458    |
+| 3.5882        | 0.81  | 2500 | 2.7805          | 0.1932 | 0.0394 | 0.1895 | 0.4518    |
+| 3.585         | 0.97  | 3000 | 2.7771          | 0.1925 | 0.0401 | 0.1886 | 0.4499    |
 ### Framework versions
+- Transformers 4.32.1
 - Pytorch 2.0.1+cu118
 - Datasets 2.14.4
 - Tokenizers 0.13.3

generation_config.json CHANGED Viewed

@@ -2,5 +2,5 @@
   "decoder_start_token_id": 0,
   "eos_token_id": 1,
   "pad_token_id": 0,
-  "transformers_version": "4.32.0"
 }

   "decoder_start_token_id": 0,
   "eos_token_id": 1,
   "pad_token_id": 0,
+  "transformers_version": "4.32.1"
 }

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b4c5efa232223e9fb93618d2ca8c482880de765dbd7ebcde9406de7e58c641e3
 size 1200772613

 version https://git-lfs.github.com/spec/v1
+oid sha256:94260ecd072528727b62c781b2ddd568f52fe66f79a3d0772f9d0e063da18bf3
 size 1200772613