short_first_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.3005
- Accuracy: 0.3879
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.1631 | 0.9997 | 1789 | 4.4155 | 0.2878 |
4.1584 | 1.9995 | 3578 | 3.9416 | 0.3233 |
3.7403 | 2.9999 | 5368 | 3.6939 | 0.3450 |
3.5113 | 3.9997 | 7157 | 3.5700 | 0.3573 |
3.4222 | 4.9994 | 8946 | 3.4928 | 0.3641 |
3.3102 | 5.9998 | 10736 | 3.4535 | 0.3679 |
3.2452 | 6.9996 | 12525 | 3.4313 | 0.3698 |
3.2019 | 7.9999 | 14315 | 3.4132 | 0.3718 |
3.171 | 8.9997 | 16104 | 3.3968 | 0.3739 |
3.1243 | 9.9995 | 17893 | 3.3861 | 0.3748 |
3.1023 | 10.9999 | 19683 | 3.3813 | 0.3758 |
3.0926 | 11.9997 | 21472 | 3.3752 | 0.3767 |
3.0887 | 12.9994 | 23261 | 3.3672 | 0.3776 |
3.0837 | 13.9998 | 25051 | 3.3726 | 0.3764 |
3.0384 | 14.9996 | 26840 | 3.3657 | 0.3774 |
3.0408 | 15.9999 | 28630 | 3.3631 | 0.3776 |
3.0488 | 16.9997 | 30419 | 3.3656 | 0.3779 |
3.0537 | 17.9995 | 32208 | 3.3497 | 0.3797 |
3.0 | 18.9999 | 33998 | 3.3133 | 0.3841 |
2.8427 | 19.9957 | 35780 | 3.3005 | 0.3879 |
Framework versions
- Transformers 4.46.2
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.20.0
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support