short_first_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3005
  • Accuracy: 0.3879

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
6.1631 0.9997 1789 4.4155 0.2878
4.1584 1.9995 3578 3.9416 0.3233
3.7403 2.9999 5368 3.6939 0.3450
3.5113 3.9997 7157 3.5700 0.3573
3.4222 4.9994 8946 3.4928 0.3641
3.3102 5.9998 10736 3.4535 0.3679
3.2452 6.9996 12525 3.4313 0.3698
3.2019 7.9999 14315 3.4132 0.3718
3.171 8.9997 16104 3.3968 0.3739
3.1243 9.9995 17893 3.3861 0.3748
3.1023 10.9999 19683 3.3813 0.3758
3.0926 11.9997 21472 3.3752 0.3767
3.0887 12.9994 23261 3.3672 0.3776
3.0837 13.9998 25051 3.3726 0.3764
3.0384 14.9996 26840 3.3657 0.3774
3.0408 15.9999 28630 3.3631 0.3776
3.0488 16.9997 30419 3.3656 0.3779
3.0537 17.9995 32208 3.3497 0.3797
3.0 18.9999 33998 3.3133 0.3841
2.8427 19.9957 35780 3.3005 0.3879

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.20.0
Downloads last month
11
Safetensors
Model size
110M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support