Se124M100KInfPrompt_endtoken2

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.8989	1.0	267	0.7585
0.7706	2.0	534	0.7264
0.7441	3.0	801	0.7160
0.7327	4.0	1068	0.7091
0.7175	5.0	1335	0.7024
0.7118	6.0	1602	0.6990
0.7079	7.0	1869	0.6931
0.6982	8.0	2136	0.6904
0.6977	9.0	2403	0.6891
0.6971	10.0	2670	0.6869
0.6992	11.0	2937	0.6850
0.6889	12.0	3204	0.6849
0.6924	13.0	3471	0.6845
0.6894	14.0	3738	0.6834
0.6886	15.0	4005	0.6791
0.6906	16.0	4272	0.6812
0.6868	17.0	4539	0.6796
0.6852	18.0	4806	0.6789
0.6797	19.0	5073	0.6784
0.6813	20.0	5340	0.6775
0.6823	21.0	5607	0.6776
0.6803	22.0	5874	0.6758
0.6782	23.0	6141	0.6768
0.6786	24.0	6408	0.6747
0.677	25.0	6675	0.6740
0.68	26.0	6942	0.6742
0.6733	27.0	7209	0.6735
0.6744	28.0	7476	0.6734
0.6746	29.0	7743	0.6737
0.674	30.0	8010	0.6753
0.6694	31.0	8277	0.6731
0.6731	32.0	8544	0.6734
0.6683	33.0	8811	0.6723
0.6712	34.0	9078	0.6723
0.668	35.0	9345	0.6720
0.6647	36.0	9612	0.6723
0.664	37.0	9879	0.6713
0.6707	38.0	10146	0.6724
0.6704	39.0	10413	0.6715
0.6675	40.0	10680	0.6715
0.6673	41.0	10947	0.6718
0.6656	42.0	11214	0.6713
0.6659	43.0	11481	0.6715
0.667	44.0	11748	0.6714
0.6596	45.0	12015	0.6709
0.6673	46.0	12282	0.6710
0.6666	47.0	12549	0.6710
0.6661	48.0	12816	0.6709
0.6637	49.0	13083	0.6709
0.665	49.8143	13300	0.6709