song_a_day_gpt2_short

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 500
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss
4.4216	0.2632	50	4.1325
4.3933	0.5263	100	4.0637
4.2387	0.7895	150	3.9803
4.1548	1.0526	200	3.9093
4.0862	1.3158	250	3.8413
4.0405	1.5789	300	3.7875
3.9918	1.8421	350	3.7422
3.9639	2.1053	400	3.7023
3.9044	2.3684	450	3.6685
3.8639	2.6316	500	3.6412
3.8709	2.8947	550	3.6170
3.7859	3.1579	600	3.5920
3.8243	3.4211	650	3.5791
3.7859	3.6842	700	3.5668
3.7982	3.9474	750	3.5601
3.7572	4.2105	800	3.5552
3.761	4.4737	850	3.5521
3.7662	4.7368	900	3.5516
3.7633	5.0	950	3.5513

Safetensors

Model size

124M params

Tensor type

F32

Base model

Finetuned

(1862)

this model