long_first_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.9978
- Accuracy: 0.3029
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.3012 | 0.9997 | 1789 | 4.8455 | 0.2361 |
4.3382 | 1.9995 | 3578 | 4.4897 | 0.2574 |
3.9213 | 2.9999 | 5368 | 4.2905 | 0.2726 |
3.6894 | 3.9997 | 7157 | 4.1901 | 0.2807 |
3.5972 | 4.9994 | 8946 | 4.1275 | 0.2875 |
3.4782 | 5.9998 | 10736 | 4.1143 | 0.2896 |
3.408 | 6.9996 | 12525 | 4.0568 | 0.2945 |
3.3615 | 7.9999 | 14315 | 4.0610 | 0.2923 |
3.3271 | 8.9997 | 16104 | 4.0475 | 0.2953 |
3.2786 | 9.9995 | 17893 | 4.0470 | 0.2945 |
3.2554 | 10.9999 | 19683 | 4.0410 | 0.2956 |
3.2438 | 11.9997 | 21472 | 4.0579 | 0.2945 |
3.2387 | 12.9994 | 23261 | 4.0322 | 0.2972 |
3.2326 | 13.9998 | 25051 | 4.0446 | 0.2958 |
3.1863 | 14.9996 | 26840 | 4.0195 | 0.2968 |
3.1876 | 15.9999 | 28630 | 4.0171 | 0.2974 |
3.1947 | 16.9997 | 30419 | 4.0327 | 0.2962 |
3.1998 | 17.9995 | 32208 | 4.0206 | 0.2981 |
3.1444 | 18.9999 | 33998 | 3.9929 | 0.3008 |
2.9822 | 19.9957 | 35780 | 3.9978 | 0.3029 |
Framework versions
- Transformers 4.46.2
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.20.0
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support