End of training

ca0cd18 verified over 1 year ago

3.3 kB

	---
	base_model: csebuetnlp/mT5_multilingual_XLSum
	tags:
	- generated_from_trainer
	model-index:
	- name: GeneralNews_1_loadbest
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# GeneralNews_1_loadbest

	This model is a fine-tuned version of [csebuetnlp/mT5_multilingual_XLSum](https://huggingface.co/csebuetnlp/mT5_multilingual_XLSum) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.9834

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 4.1541 \| 0.25 \| 200 \| 3.4209 \|
	\| 3.5494 \| 0.51 \| 400 \| 3.1702 \|
	\| 3.2618 \| 0.76 \| 600 \| 3.0273 \|
	\| 3.5983 \| 1.01 \| 800 \| 2.9550 \|
	\| 3.3355 \| 1.26 \| 1000 \| 2.8883 \|
	\| 3.4976 \| 1.52 \| 1200 \| 2.8653 \|
	\| 3.1001 \| 1.77 \| 1400 \| 2.8543 \|
	\| 2.282 \| 2.02 \| 1600 \| 2.7953 \|
	\| 2.5724 \| 2.27 \| 1800 \| 2.7866 \|
	\| 2.7474 \| 2.53 \| 2000 \| 2.7778 \|
	\| 3.0323 \| 2.78 \| 2200 \| 2.7901 \|
	\| 2.3032 \| 3.03 \| 2400 \| 2.7641 \|
	\| 2.5042 \| 3.28 \| 2600 \| 2.8059 \|
	\| 1.9857 \| 3.54 \| 2800 \| 2.7847 \|
	\| 2.5909 \| 3.79 \| 3000 \| 2.8045 \|
	\| 2.2105 \| 4.04 \| 3200 \| 2.8051 \|
	\| 2.1151 \| 4.29 \| 3400 \| 2.8331 \|
	\| 1.9858 \| 4.55 \| 3600 \| 2.8292 \|
	\| 1.9633 \| 4.8 \| 3800 \| 2.8133 \|
	\| 2.0282 \| 5.05 \| 4000 \| 2.8317 \|
	\| 2.0988 \| 5.3 \| 4200 \| 2.8781 \|
	\| 2.0699 \| 5.56 \| 4400 \| 2.8627 \|
	\| 2.1769 \| 5.81 \| 4600 \| 2.8388 \|
	\| 1.7436 \| 6.06 \| 4800 \| 2.8899 \|
	\| 1.8312 \| 6.31 \| 5000 \| 2.9223 \|
	\| 1.841 \| 6.57 \| 5200 \| 2.8970 \|
	\| 2.0157 \| 6.82 \| 5400 \| 2.8754 \|
	\| 2.1223 \| 7.07 \| 5600 \| 2.8958 \|
	\| 1.6103 \| 7.32 \| 5800 \| 2.9247 \|
	\| 1.7702 \| 7.58 \| 6000 \| 2.9562 \|
	\| 1.537 \| 7.83 \| 6200 \| 2.9597 \|
	\| 1.933 \| 8.08 \| 6400 \| 2.9585 \|
	\| 1.3947 \| 8.33 \| 6600 \| 2.9841 \|
	\| 1.639 \| 8.59 \| 6800 \| 2.9723 \|
	\| 1.6441 \| 8.84 \| 7000 \| 2.9770 \|
	\| 1.4509 \| 9.09 \| 7200 \| 2.9865 \|
	\| 1.6212 \| 9.34 \| 7400 \| 2.9890 \|
	\| 1.8013 \| 9.6 \| 7600 \| 2.9877 \|
	\| 1.3722 \| 9.85 \| 7800 \| 2.9834 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.1

	---
	base_model: csebuetnlp/mT5_multilingual_XLSum
	tags:
	- generated_from_trainer
	model-index:
	- name: GeneralNews_1_loadbest
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# GeneralNews_1_loadbest

	This model is a fine-tuned version of [csebuetnlp/mT5_multilingual_XLSum](https://huggingface.co/csebuetnlp/mT5_multilingual_XLSum) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.9834

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 4.1541 \| 0.25 \| 200 \| 3.4209 \|
	\| 3.5494 \| 0.51 \| 400 \| 3.1702 \|
	\| 3.2618 \| 0.76 \| 600 \| 3.0273 \|
	\| 3.5983 \| 1.01 \| 800 \| 2.9550 \|
	\| 3.3355 \| 1.26 \| 1000 \| 2.8883 \|
	\| 3.4976 \| 1.52 \| 1200 \| 2.8653 \|
	\| 3.1001 \| 1.77 \| 1400 \| 2.8543 \|
	\| 2.282 \| 2.02 \| 1600 \| 2.7953 \|
	\| 2.5724 \| 2.27 \| 1800 \| 2.7866 \|
	\| 2.7474 \| 2.53 \| 2000 \| 2.7778 \|
	\| 3.0323 \| 2.78 \| 2200 \| 2.7901 \|
	\| 2.3032 \| 3.03 \| 2400 \| 2.7641 \|
	\| 2.5042 \| 3.28 \| 2600 \| 2.8059 \|
	\| 1.9857 \| 3.54 \| 2800 \| 2.7847 \|
	\| 2.5909 \| 3.79 \| 3000 \| 2.8045 \|
	\| 2.2105 \| 4.04 \| 3200 \| 2.8051 \|
	\| 2.1151 \| 4.29 \| 3400 \| 2.8331 \|
	\| 1.9858 \| 4.55 \| 3600 \| 2.8292 \|
	\| 1.9633 \| 4.8 \| 3800 \| 2.8133 \|
	\| 2.0282 \| 5.05 \| 4000 \| 2.8317 \|
	\| 2.0988 \| 5.3 \| 4200 \| 2.8781 \|
	\| 2.0699 \| 5.56 \| 4400 \| 2.8627 \|
	\| 2.1769 \| 5.81 \| 4600 \| 2.8388 \|
	\| 1.7436 \| 6.06 \| 4800 \| 2.8899 \|
	\| 1.8312 \| 6.31 \| 5000 \| 2.9223 \|
	\| 1.841 \| 6.57 \| 5200 \| 2.8970 \|
	\| 2.0157 \| 6.82 \| 5400 \| 2.8754 \|
	\| 2.1223 \| 7.07 \| 5600 \| 2.8958 \|
	\| 1.6103 \| 7.32 \| 5800 \| 2.9247 \|
	\| 1.7702 \| 7.58 \| 6000 \| 2.9562 \|
	\| 1.537 \| 7.83 \| 6200 \| 2.9597 \|
	\| 1.933 \| 8.08 \| 6400 \| 2.9585 \|
	\| 1.3947 \| 8.33 \| 6600 \| 2.9841 \|
	\| 1.639 \| 8.59 \| 6800 \| 2.9723 \|
	\| 1.6441 \| 8.84 \| 7000 \| 2.9770 \|
	\| 1.4509 \| 9.09 \| 7200 \| 2.9865 \|
	\| 1.6212 \| 9.34 \| 7400 \| 2.9890 \|
	\| 1.8013 \| 9.6 \| 7600 \| 2.9877 \|
	\| 1.3722 \| 9.85 \| 7800 \| 2.9834 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.1