Se124M10KInfPrompt_endtoken_ls

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0494

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 50
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
19.0863 1.0 267 2.1942
17.6413 2.0 534 2.1318
17.3454 3.0 801 2.1143
17.2455 4.0 1068 2.0979
17.112 5.0 1335 2.0918
17.0311 6.0 1602 2.0852
16.9714 7.0 1869 2.0805
16.8883 8.0 2136 2.0760
16.8675 9.0 2403 2.0727
16.8491 10.0 2670 2.0699
16.8653 11.0 2937 2.0698
16.7795 12.0 3204 2.0718
16.8033 13.0 3471 2.0635
16.7715 14.0 3738 2.0644
16.7677 15.0 4005 2.0632
16.7682 16.0 4272 2.0615
16.7473 17.0 4539 2.0598
16.7306 18.0 4806 2.0615
16.6896 19.0 5073 2.0586
16.7027 20.0 5340 2.0589
16.6991 21.0 5607 2.0581
16.6864 22.0 5874 2.0573
16.6749 23.0 6141 2.0562
16.6714 24.0 6408 2.0551
16.6603 25.0 6675 2.0546
16.6801 26.0 6942 2.0542
16.6263 27.0 7209 2.0541
16.6436 28.0 7476 2.0531
16.6471 29.0 7743 2.0523
16.6412 30.0 8010 2.0549
16.6017 31.0 8277 2.0529
16.6352 32.0 8544 2.0510
16.5937 33.0 8811 2.0522
16.6165 34.0 9078 2.0511
16.5961 35.0 9345 2.0518
16.5675 36.0 9612 2.0514
16.5565 37.0 9879 2.0499
16.6215 38.0 10146 2.0504
16.6133 39.0 10413 2.0505
16.5901 40.0 10680 2.0492
16.5841 41.0 10947 2.0500
16.5856 42.0 11214 2.0493
16.5775 43.0 11481 2.0494
16.5873 44.0 11748 2.0497
16.5285 45.0 12015 2.0494

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu118
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for augustocsc/Se124M10KInfPrompt_endtoken_ls

Adapter
(1666)
this model