de_mlm_child_13_new

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9684

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.2096 2000 7.0534
7.005 2.4191 4000 5.9491
7.005 3.6287 6000 5.8401
5.5533 4.8382 8000 5.7121
5.5533 6.0478 10000 5.6198
5.3687 7.2573 12000 5.5387
5.3687 8.4669 14000 5.4706
5.2269 9.6764 16000 5.4217
5.2269 10.8860 18000 5.3631
5.1229 12.0956 20000 5.3213
5.1229 13.3051 22000 5.2743
5.0061 14.5147 24000 4.9732
5.0061 15.7242 26000 4.2903
4.0671 16.9338 28000 3.7955
4.0671 18.1433 30000 3.5115
3.3253 19.3529 32000 3.3390
3.3253 20.5624 34000 3.1444
2.969 21.7720 36000 2.9896
2.969 22.9816 38000 2.8436
2.6828 24.1911 40000 2.7259
2.6828 25.4007 42000 2.6287
2.4634 26.6102 44000 2.5300
2.4634 27.8198 46000 2.4772
2.317 29.0293 48000 2.4356
2.317 30.2389 50000 2.3599
2.2136 31.4484 52000 2.3320
2.2136 32.6580 54000 2.3017
2.1362 33.8676 56000 2.2597
2.1362 35.0771 58000 2.2461
2.0753 36.2867 60000 2.2101
2.0753 37.4962 62000 2.1714
2.0214 38.7058 64000 2.1669
2.0214 39.9153 66000 2.1505
1.9757 41.1249 68000 2.1361
1.9757 42.3344 70000 2.0909
1.9433 43.5440 72000 2.0859
1.9433 44.7536 74000 2.0732
1.9082 45.9631 76000 2.0745
1.9082 47.1727 78000 2.0481
1.8821 48.3822 80000 2.0327
1.8821 49.5918 82000 2.0211
1.8581 50.8013 84000 2.0263
1.8581 52.0109 86000 2.0064
1.8359 53.2204 88000 2.0034
1.8359 54.4300 90000 1.9978
1.8159 55.6396 92000 1.9832
1.8159 56.8491 94000 1.9772
1.8051 58.0587 96000 1.9930
1.8051 59.2682 98000 1.9752
1.794 60.4778 100000 1.9684

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
15
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
Examples
Mask token: <mask>