en_mlm_child_30_new

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.3952 2000 7.2962
7.2342 2.7904 4000 6.0346
7.2342 4.1856 6000 5.9052
5.7167 5.5807 8000 5.7987
5.7167 6.9759 10000 5.7122
5.5262 8.3711 12000 5.6412
5.5262 9.7663 14000 5.5936
5.3872 11.1615 16000 5.5304
5.3872 12.5567 18000 5.4717
5.2756 13.9519 20000 5.4238
5.2756 15.3471 22000 5.3846
5.154 16.7422 24000 5.2159
5.154 18.1374 26000 4.9780
4.7609 19.5326 28000 4.3088
4.7609 20.9278 30000 3.7411
3.6928 22.3230 32000 3.4477
3.6928 23.7182 34000 3.1948
3.0474 25.1134 36000 3.0640
3.0474 26.5085 38000 2.9524
2.7755 27.9037 40000 2.8535
2.7755 29.2989 42000 2.7650
2.5986 30.6941 44000 2.6769
2.5986 32.0893 46000 2.6227
2.4606 33.4845 48000 2.5682
2.4606 34.8797 50000 2.5149
2.3506 36.2749 52000 2.4691
2.3506 37.6700 54000 2.4395
2.2717 39.0652 56000 2.4192
2.2717 40.4604 58000 2.3797
2.2036 41.8556 60000 2.3556
2.2036 43.2508 62000 2.3366
2.1481 44.6460 64000 2.3093
2.1481 46.0412 66000 2.2859
2.1029 47.4363 68000 2.2736
2.1029 48.8315 70000 2.2488
2.0646 50.2267 72000 2.2464
2.0646 51.6219 74000 2.2329
2.0274 53.0171 76000 2.2133
2.0274 54.4123 78000 2.1916
1.999 55.8075 80000 2.1929
1.999 57.2027 82000 2.2003
1.9726 58.5978 84000 2.1797
1.9726 59.9930 86000 2.1607
1.9504 61.3882 88000 2.1396
1.9504 62.7834 90000 2.1378
1.9321 64.1786 92000 2.1362
1.9321 65.5738 94000 2.1394
1.9159 66.9690 96000 2.1435
1.9159 68.3641 98000 2.1298
1.9071 69.7593 100000 2.1528

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
14
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
Examples
Mask token: <mask>