de_mlm_child_30_new

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9961

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.2096 2000 7.0624
7.0103 2.4191 4000 5.9907
7.0103 3.6287 6000 5.8268
5.5573 4.8382 8000 5.7135
5.5573 6.0478 10000 5.6182
5.3684 7.2573 12000 5.5548
5.3684 8.4669 14000 5.4931
5.2311 9.6764 16000 5.4276
5.2311 10.8860 18000 5.3897
5.1257 12.0956 20000 5.3456
5.1257 13.3051 22000 5.3070
5.0524 14.5147 24000 5.2622
5.0524 15.7242 26000 5.2159
4.9502 16.9338 28000 5.0671
4.9502 18.1433 30000 4.2052
4.0862 19.3529 32000 3.7250
4.0862 20.5624 34000 3.4164
3.2461 21.7720 36000 3.2153
3.2461 22.9816 38000 2.9981
2.8492 24.1911 40000 2.8842
2.8492 25.4007 42000 2.7470
2.5828 26.6102 44000 2.6714
2.5828 27.8198 46000 2.5692
2.4091 29.0293 48000 2.5008
2.4091 30.2389 50000 2.4384
2.2855 31.4484 52000 2.3737
2.2855 32.6580 54000 2.3480
2.1975 33.8676 56000 2.3211
2.1975 35.0771 58000 2.2830
2.1246 36.2867 60000 2.2453
2.1246 37.4962 62000 2.2185
2.0701 38.7058 64000 2.1977
2.0701 39.9153 66000 2.1722
2.0222 41.1249 68000 2.1386
2.0222 42.3344 70000 2.1359
1.9823 43.5440 72000 2.1266
1.9823 44.7536 74000 2.1028
1.9462 45.9631 76000 2.0943
1.9462 47.1727 78000 2.0773
1.9189 48.3822 80000 2.0864
1.9189 49.5918 82000 2.0530
1.8958 50.8013 84000 2.0451
1.8958 52.0109 86000 2.0476
1.8715 53.2204 88000 2.0231
1.8715 54.4300 90000 2.0168
1.8562 55.6396 92000 2.0104
1.8562 56.8491 94000 2.0088
1.8417 58.0587 96000 2.0087
1.8417 59.2858 98000 2.0008
1.8314 60.4953 100000 1.9961

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
14
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
Examples
Mask token: <mask>