t5-l-gn-rtrvl-ruby-29-5-padding-few-shot-2

This model is a fine-tuned version of google-t5/t5-large on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 64
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 128
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 6

Training Loss	Epoch	Step	Validation Loss
0.3477	0.4322	1000	0.4067
0.3287	0.8645	2000	0.3843
0.3102	1.2965	3000	0.3735
0.2955	1.7288	4000	0.3642
0.2799	2.1608	5000	0.3524
0.2647	2.5930	6000	0.3475
0.2549	3.0251	7000	0.3414
0.2501	3.4573	8000	0.3374
0.2394	3.8896	9000	0.3314
0.2334	4.3216	10000	0.3296
0.2269	4.7538	11000	0.3274
0.2267	5.1859	12000	0.3252
0.2207	5.6181	13000	0.3244