sft-base_loss-t5-v1_1-base-mle0-ul0-tox0-e4

This model is a fine-tuned version of google/t5-v1_1-base on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
lr_scheduler_warmup_steps: 5
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss
6.89	0.2899	200	3.1273
6.8047	0.5797	400	3.4536
4.0718	0.8696	600	1.7809
2.9926	1.1594	800	1.5160
2.4307	1.4493	1000	1.3290
1.9824	1.7391	1200	1.2237
1.8589	2.0290	1400	1.1363
1.7644	2.3188	1600	1.1028
1.5996	2.6087	1800	1.0860
1.4636	2.8986	2000	1.0699
1.3986	3.1884	2200	1.0776
1.3767	3.4783	2400	1.0204
1.3042	3.7681	2600	1.0475
1.3342	4.0580	2800	1.0547
1.2306	4.3478	3000	1.0423
1.2201	4.6377	3200	1.0424
1.2224	4.9275	3400	1.0388
1.205	5.2174	3600	1.0178
1.0739	5.5072	3800	1.0303
1.0681	5.7971	4000	1.0307
1.0863	6.0870	4200	1.0071
1.0393	6.3768	4400	1.0509
1.0076	6.6667	4600	1.0143
1.0255	6.9565	4800	1.0196
0.9258	7.2464	5000	1.0367
0.9698	7.5362	5200	1.0203
0.978	7.8261	5400	1.0055
0.9228	8.1159	5600	1.0372
0.9173	8.4058	5800	1.0240
0.8497	8.6957	6000	1.0433
0.8383	8.9855	6200	1.0269
0.8392	9.2754	6400	1.0480
0.8204	9.5652	6600	1.0442
0.8157	9.8551	6800	1.0387