Se124M10KInfPrompt_endtoken_ls

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
num_epochs: 50
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training Loss	Epoch	Step	Validation Loss
19.0863	1.0	267	2.1942
17.6413	2.0	534	2.1318
17.3454	3.0	801	2.1143
17.2455	4.0	1068	2.0979
17.112	5.0	1335	2.0918
17.0311	6.0	1602	2.0852
16.9714	7.0	1869	2.0805
16.8883	8.0	2136	2.0760
16.8675	9.0	2403	2.0727
16.8491	10.0	2670	2.0699
16.8653	11.0	2937	2.0698
16.7795	12.0	3204	2.0718
16.8033	13.0	3471	2.0635
16.7715	14.0	3738	2.0644
16.7677	15.0	4005	2.0632
16.7682	16.0	4272	2.0615
16.7473	17.0	4539	2.0598
16.7306	18.0	4806	2.0615
16.6896	19.0	5073	2.0586
16.7027	20.0	5340	2.0589
16.6991	21.0	5607	2.0581
16.6864	22.0	5874	2.0573
16.6749	23.0	6141	2.0562
16.6714	24.0	6408	2.0551
16.6603	25.0	6675	2.0546
16.6801	26.0	6942	2.0542
16.6263	27.0	7209	2.0541
16.6436	28.0	7476	2.0531
16.6471	29.0	7743	2.0523
16.6412	30.0	8010	2.0549
16.6017	31.0	8277	2.0529
16.6352	32.0	8544	2.0510
16.5937	33.0	8811	2.0522
16.6165	34.0	9078	2.0511
16.5961	35.0	9345	2.0518
16.5675	36.0	9612	2.0514
16.5565	37.0	9879	2.0499
16.6215	38.0	10146	2.0504
16.6133	39.0	10413	2.0505
16.5901	40.0	10680	2.0492
16.5841	41.0	10947	2.0500
16.5856	42.0	11214	2.0493
16.5775	43.0	11481	2.0494
16.5873	44.0	11748	2.0497
16.5285	45.0	12015	2.0494