train_cola_1752763927
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:
- Loss: 0.2606
- Num Input Tokens Seen: 3865624
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 123
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
7.4373 | 0.5 | 481 | 7.5052 | 194688 |
0.7072 | 1.0 | 962 | 0.7585 | 386904 |
0.3897 | 1.5 | 1443 | 0.3602 | 580120 |
0.3012 | 2.0 | 1924 | 0.3055 | 773864 |
0.3103 | 2.5 | 2405 | 0.2883 | 967976 |
0.2726 | 3.0 | 2886 | 0.2818 | 1160888 |
0.2773 | 3.5 | 3367 | 0.2715 | 1353912 |
0.2896 | 4.0 | 3848 | 0.2687 | 1548544 |
0.2146 | 4.5 | 4329 | 0.2640 | 1741248 |
0.2677 | 5.0 | 4810 | 0.2682 | 1934608 |
0.2588 | 5.5 | 5291 | 0.2631 | 2127376 |
0.2811 | 6.0 | 5772 | 0.2645 | 2320672 |
0.2592 | 6.5 | 6253 | 0.2663 | 2514528 |
0.29 | 7.0 | 6734 | 0.2631 | 2707248 |
0.2671 | 7.5 | 7215 | 0.2617 | 2900784 |
0.2536 | 8.0 | 7696 | 0.2627 | 3093824 |
0.2995 | 8.5 | 8177 | 0.2606 | 3285824 |
0.2462 | 9.0 | 8658 | 0.2606 | 3479432 |
0.2886 | 9.5 | 9139 | 0.2608 | 3672712 |
0.2682 | 10.0 | 9620 | 0.2617 | 3865624 |
Framework versions
- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.1+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_cola_1752763927
Base model
meta-llama/Meta-Llama-3-8B-Instruct