4053ccba4766f3012b943e0f2a6ba1cc

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [en-no] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1552
  • Data Size: 1.0
  • Epoch Runtime: 21.0298
  • Bleu: 7.9534

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.3903 0 2.4919 0.0012
No log 1 87 16.2652 0.0078 3.0402 0.0020
No log 2 174 15.7682 0.0156 3.1024 0.0020
No log 3 261 15.1807 0.0312 3.5727 0.0012
No log 4 348 16.0009 0.0625 4.7808 0.0012
0.6847 5 435 13.4265 0.125 5.6736 0.0012
3.4724 6 522 9.1619 0.25 7.9955 0.0045
4.0559 7 609 7.2011 0.5 12.7481 0.0064
5.0514 8.0 696 5.6457 1.0 21.7061 0.0198
5.5651 9.0 783 3.4591 1.0 20.4395 1.1073
3.9865 10.0 870 2.7854 1.0 21.3194 2.8684
3.436 11.0 957 2.5479 1.0 21.7505 4.3926
3.2065 12.0 1044 2.4041 1.0 22.9681 5.1570
2.9505 13.0 1131 2.3554 1.0 21.5941 5.3145
2.8146 14.0 1218 2.3420 1.0 20.0549 5.6208
2.6455 15.0 1305 2.2882 1.0 19.9195 5.8671
2.5617 16.0 1392 2.2256 1.0 20.2517 6.4782
2.4692 17.0 1479 2.2113 1.0 20.2933 6.7027
2.3672 18.0 1566 2.1917 1.0 21.1114 6.9196
2.3172 19.0 1653 2.1784 1.0 20.8452 6.9665
2.2092 20.0 1740 2.1631 1.0 21.1128 6.9706
2.1428 21.0 1827 2.1567 1.0 20.8195 7.2431
2.109 22.0 1914 2.1541 1.0 21.3080 7.2208
2.0272 23.0 2001 2.1516 1.0 22.4732 7.4292
1.9716 24.0 2088 2.1522 1.0 23.0559 7.4031
1.9407 25.0 2175 2.1517 1.0 21.1709 7.6519
1.8952 26.0 2262 2.1393 1.0 21.6689 7.7828
1.8579 27.0 2349 2.1565 1.0 22.4174 7.7715
1.7953 28.0 2436 2.1438 1.0 22.9117 7.7857
1.7689 29.0 2523 2.1526 1.0 24.4328 7.9907
1.7437 30.0 2610 2.1552 1.0 21.0298 7.9534

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
7
Safetensors
Model size
1.0B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/4053ccba4766f3012b943e0f2a6ba1cc

Base model

google/mt5-base
Finetuned
(280)
this model