Se124M100KInfPrompt_endtoken2

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6709

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.8989 1.0 267 0.7585
0.7706 2.0 534 0.7264
0.7441 3.0 801 0.7160
0.7327 4.0 1068 0.7091
0.7175 5.0 1335 0.7024
0.7118 6.0 1602 0.6990
0.7079 7.0 1869 0.6931
0.6982 8.0 2136 0.6904
0.6977 9.0 2403 0.6891
0.6971 10.0 2670 0.6869
0.6992 11.0 2937 0.6850
0.6889 12.0 3204 0.6849
0.6924 13.0 3471 0.6845
0.6894 14.0 3738 0.6834
0.6886 15.0 4005 0.6791
0.6906 16.0 4272 0.6812
0.6868 17.0 4539 0.6796
0.6852 18.0 4806 0.6789
0.6797 19.0 5073 0.6784
0.6813 20.0 5340 0.6775
0.6823 21.0 5607 0.6776
0.6803 22.0 5874 0.6758
0.6782 23.0 6141 0.6768
0.6786 24.0 6408 0.6747
0.677 25.0 6675 0.6740
0.68 26.0 6942 0.6742
0.6733 27.0 7209 0.6735
0.6744 28.0 7476 0.6734
0.6746 29.0 7743 0.6737
0.674 30.0 8010 0.6753
0.6694 31.0 8277 0.6731
0.6731 32.0 8544 0.6734
0.6683 33.0 8811 0.6723
0.6712 34.0 9078 0.6723
0.668 35.0 9345 0.6720
0.6647 36.0 9612 0.6723
0.664 37.0 9879 0.6713
0.6707 38.0 10146 0.6724
0.6704 39.0 10413 0.6715
0.6675 40.0 10680 0.6715
0.6673 41.0 10947 0.6718
0.6656 42.0 11214 0.6713
0.6659 43.0 11481 0.6715
0.667 44.0 11748 0.6714
0.6596 45.0 12015 0.6709
0.6673 46.0 12282 0.6710
0.6666 47.0 12549 0.6710
0.6661 48.0 12816 0.6709
0.6637 49.0 13083 0.6709
0.665 49.8143 13300 0.6709

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu118
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for augustocsc/Se124M100KInfPrompt_endtoken2

Adapter
(1666)
this model