child_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3875

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 2.1459 2000 6.2006
6.0988 4.2918 4000 4.4864
6.0988 6.4378 6000 4.0450
3.809 8.5837 8000 3.7977
3.809 10.7296 10000 3.6229
3.368 12.8755 12000 3.4840
3.368 15.0215 14000 3.3730
3.1113 17.1674 16000 3.2718
3.1113 19.3133 18000 3.1845
2.916 21.4592 20000 3.1091
2.916 23.6052 22000 3.0571
2.7615 25.7511 24000 3.0031
2.7615 27.8970 26000 2.9622
2.6375 30.0429 28000 2.9277
2.6375 32.1888 30000 2.9047
2.5336 34.3348 32000 2.8888
2.5336 36.4807 34000 2.8873
2.4456 38.6266 36000 2.8729
2.4456 40.7725 38000 2.8654
2.3643 42.9185 40000 2.8761
2.3643 45.0644 42000 2.8874
2.2761 47.2103 44000 2.9046
2.2761 49.3562 46000 2.9111
2.1878 51.5021 48000 2.9330
2.1878 53.6481 50000 2.9497
2.1083 55.7940 52000 2.9701
2.1083 57.9399 54000 2.9880
2.0358 60.0858 56000 3.0221
2.0358 62.2318 58000 3.0519
1.9684 64.3777 60000 3.0712
1.9684 66.5236 62000 3.0901
1.9112 68.6695 64000 3.1114
1.9112 70.8155 66000 3.1317
1.8591 72.9614 68000 3.1540
1.8591 75.1073 70000 3.1873
1.8075 77.2532 72000 3.2064
1.8075 79.3991 74000 3.2267
1.7658 81.5451 76000 3.2442
1.7658 83.6910 78000 3.2605
1.7261 85.8369 80000 3.2768
1.7261 87.9828 82000 3.2917
1.6897 90.1288 84000 3.3144
1.6897 92.2747 86000 3.3288
1.6562 94.4206 88000 3.3447
1.6562 96.5665 90000 3.3496
1.6284 98.7124 92000 3.3609
1.6284 100.8584 94000 3.3714
1.6032 103.0043 96000 3.3779
1.6032 105.1502 98000 3.3861
1.5821 107.2961 100000 3.3875

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
0
Safetensors
Model size
12.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support