child_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9591

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.2096 2000 7.0127
6.9808 2.4191 4000 5.9681
6.9808 3.6287 6000 5.8316
5.5564 4.8382 8000 5.7332
5.5564 6.0478 10000 5.6365
5.377 7.2573 12000 5.5691
5.377 8.4669 14000 5.4849
5.2322 9.6764 16000 5.4389
5.2322 10.8860 18000 5.3896
5.1296 12.0956 20000 5.3329
5.1296 13.3051 22000 5.3073
5.0584 14.5147 24000 5.2727
5.0584 15.7242 26000 5.2309
4.9626 16.9338 28000 5.1419
4.9626 18.1433 30000 4.6512
4.3128 19.3529 32000 3.7585
4.3128 20.5624 34000 3.4438
3.2698 21.7720 36000 3.2057
3.2698 22.9816 38000 3.0099
2.8429 24.1911 40000 2.8444
2.8429 25.4007 42000 2.7175
2.5687 26.6102 44000 2.6260
2.5687 27.8198 46000 2.5344
2.3909 29.0293 48000 2.4779
2.3909 30.2389 50000 2.4240
2.2679 31.4484 52000 2.3533
2.2679 32.6580 54000 2.3366
2.1814 33.8676 56000 2.2728
2.1814 35.0771 58000 2.2539
2.1098 36.2867 60000 2.2392
2.1098 37.4962 62000 2.2166
2.0509 38.7058 64000 2.1762
2.0509 39.9153 66000 2.1642
2.0029 41.1249 68000 2.1580
2.0029 42.3344 70000 2.1081
1.9585 43.5440 72000 2.1156
1.9585 44.7536 74000 2.0962
1.9297 45.9631 76000 2.0839
1.9297 47.1727 78000 2.0424
1.8972 48.3822 80000 2.0514
1.8972 49.5918 82000 2.0307
1.8746 50.8013 84000 2.0233
1.8746 52.0109 86000 2.0000
1.8496 53.2204 88000 2.0079
1.8496 54.4300 90000 2.0084
1.8335 55.6396 92000 1.9988
1.8335 56.8491 94000 1.9736
1.8211 58.0587 96000 1.9785
1.8211 59.2682 98000 1.9865
1.8094 60.4778 100000 1.9591

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
12
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
Examples
Mask token: <mask>