song_a_day_gpt2_short

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5513

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
4.4216 0.2632 50 4.1325
4.3933 0.5263 100 4.0637
4.2387 0.7895 150 3.9803
4.1548 1.0526 200 3.9093
4.0862 1.3158 250 3.8413
4.0405 1.5789 300 3.7875
3.9918 1.8421 350 3.7422
3.9639 2.1053 400 3.7023
3.9044 2.3684 450 3.6685
3.8639 2.6316 500 3.6412
3.8709 2.8947 550 3.6170
3.7859 3.1579 600 3.5920
3.8243 3.4211 650 3.5791
3.7859 3.6842 700 3.5668
3.7982 3.9474 750 3.5601
3.7572 4.2105 800 3.5552
3.761 4.4737 850 3.5521
3.7662 4.7368 900 3.5516
3.7633 5.0 950 3.5513

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Jonathanmann/song_a_day_gpt2_short

Finetuned
(1862)
this model