long_first_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.9978
  • Accuracy: 0.3029

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
6.3012 0.9997 1789 4.8455 0.2361
4.3382 1.9995 3578 4.4897 0.2574
3.9213 2.9999 5368 4.2905 0.2726
3.6894 3.9997 7157 4.1901 0.2807
3.5972 4.9994 8946 4.1275 0.2875
3.4782 5.9998 10736 4.1143 0.2896
3.408 6.9996 12525 4.0568 0.2945
3.3615 7.9999 14315 4.0610 0.2923
3.3271 8.9997 16104 4.0475 0.2953
3.2786 9.9995 17893 4.0470 0.2945
3.2554 10.9999 19683 4.0410 0.2956
3.2438 11.9997 21472 4.0579 0.2945
3.2387 12.9994 23261 4.0322 0.2972
3.2326 13.9998 25051 4.0446 0.2958
3.1863 14.9996 26840 4.0195 0.2968
3.1876 15.9999 28630 4.0171 0.2974
3.1947 16.9997 30419 4.0327 0.2962
3.1998 17.9995 32208 4.0206 0.2981
3.1444 18.9999 33998 3.9929 0.3008
2.9822 19.9957 35780 3.9978 0.3029

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.20.0
Downloads last month
11
Safetensors
Model size
110M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support