mms-1b-allFT-mixat-tri-ara

This model is a fine-tuned version of facebook/mms-1b-all on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6956
  • Wer: 1.0091

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
31.0714 1.0 210 24.4350 1.0003
26.8056 2.0 420 18.3719 1.0005
14.3797 3.0 630 5.9675 1.0
4.878 4.0 840 4.0668 1.0
4.0485 5.0 1050 3.8871 1.0
3.9462 6.0 1260 3.8183 1.0
3.8848 7.0 1470 3.7649 1.0
3.8365 8.0 1680 3.7236 1.0
3.7944 9.0 1890 3.6770 0.9998
3.7476 10.0 2100 3.6489 0.9995
3.7137 11.0 2310 3.6138 0.9995
3.6835 12.0 2520 3.5817 0.9995
3.6532 13.0 2730 3.5456 0.9994
3.6131 14.0 2940 3.4966 0.9995
3.5702 15.0 3150 3.4312 0.9994
3.5056 16.0 3360 3.3456 0.9994
3.4263 17.0 3570 3.2426 0.9994
3.3568 18.0 3780 3.1248 0.9997
3.2472 19.0 3990 3.0322 1.0011
3.1795 20.0 4200 2.9561 1.0105
3.1379 21.0 4410 2.8892 1.0114
3.0804 22.0 4620 2.8381 1.0106
3.0336 23.0 4830 2.8054 1.0148
3.0099 24.0 5040 2.7714 1.0117
2.9848 25.0 5250 2.7454 1.0114
2.9579 26.0 5460 2.7276 1.0111
2.9503 27.0 5670 2.7132 1.0112
2.934 28.0 5880 2.7030 1.0103
2.9293 29.0 6090 2.6973 1.0089
2.9726 29.8597 6270 2.6956 1.0091

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 2.16.1
  • Tokenizers 0.21.1
Downloads last month
10
Safetensors
Model size
965M params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sqrk/mms-1b-allFT-mixat-tri-ara

Finetuned
(291)
this model