wav2vec2-base-librispeech-model

This model is a fine-tuned version of facebook/wav2vec2-base-960h on the LIBRI10H - ENG dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8515
  • Wer: 0.7226

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 100.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
4.7426 1.1565 200 2.8968 1.0
2.7493 2.3130 400 2.2712 0.9987
2.0118 3.4696 600 1.6905 0.9768
1.7815 4.6261 800 1.5406 0.9588
1.667 5.7826 1000 1.4410 0.9385
1.5898 6.9391 1200 1.3799 0.9282
1.5366 8.0928 1400 1.3415 0.9165
1.4917 9.2493 1600 1.3144 0.9205
1.455 10.4058 1800 1.2746 0.9068
1.4266 11.5623 2000 1.2521 0.9102
1.3925 12.7188 2200 1.2213 0.8971
1.3754 13.8754 2400 1.2028 0.8938
1.3452 15.0290 2600 1.1931 0.8826
1.3265 16.1855 2800 1.1682 0.8860
1.3106 17.3420 3000 1.1645 0.8752
1.2917 18.4986 3200 1.1686 0.8780
1.2745 19.6551 3400 1.1385 0.8670
1.2639 20.8116 3600 1.1301 0.8666
1.2432 21.9681 3800 1.1173 0.8670
1.2294 23.1217 4000 1.1098 0.8620
1.2203 24.2783 4200 1.1077 0.8711
1.2037 25.4348 4400 1.0964 0.8626
1.1965 26.5913 4600 1.0910 0.8581
1.181 27.7478 4800 1.0842 0.8533
1.1711 28.9043 5000 1.0692 0.8465
1.1573 30.0580 5200 1.0724 0.8464
1.1472 31.2145 5400 1.0529 0.8404
1.1375 32.3710 5600 1.0506 0.8403
1.1276 33.5275 5800 1.0432 0.8398
1.1149 34.6841 6000 1.0371 0.8330
1.1099 35.8406 6200 1.0372 0.8341
1.0959 36.9971 6400 1.0296 0.8370
1.0838 38.1507 6600 1.0136 0.8232
1.0761 39.3072 6800 1.0355 0.8288
1.069 40.4638 7000 1.0072 0.8211
1.0624 41.6203 7200 1.0019 0.8217
1.0502 42.7768 7400 1.0021 0.8329
1.0423 43.9333 7600 0.9960 0.8153
1.0334 45.0870 7800 0.9903 0.8134
1.0203 46.2435 8000 0.9787 0.8116
1.0212 47.4 8200 0.9690 0.8029
1.0062 48.5565 8400 0.9864 0.8030
1.0029 49.7130 8600 0.9658 0.8000
0.9922 50.8696 8800 0.9552 0.7964
0.9784 52.0232 9000 0.9563 0.7978
0.9761 53.1797 9200 0.9442 0.7898
0.9649 54.3362 9400 0.9495 0.7898
0.9567 55.4928 9600 0.9448 0.7927
0.9556 56.6493 9800 0.9303 0.7851
0.9454 57.8058 10000 0.9304 0.7784
0.9356 58.9623 10200 0.9202 0.7718
0.927 60.1159 10400 0.9264 0.7730
0.9172 61.2725 10600 0.9252 0.7736
0.9177 62.4290 10800 0.9087 0.7682
0.9107 63.5855 11000 0.9119 0.7663
0.9017 64.7420 11200 0.9014 0.7609
0.899 65.8986 11400 0.8962 0.7597
0.8854 67.0522 11600 0.8976 0.7533
0.8841 68.2087 11800 0.8952 0.7554
0.8792 69.3652 12000 0.8951 0.7535
0.8697 70.5217 12200 0.8913 0.7513
0.8677 71.6783 12400 0.8820 0.7496
0.862 72.8348 12600 0.8834 0.7447
0.8573 73.9913 12800 0.8824 0.7437
0.8527 75.1449 13000 0.8747 0.7388
0.8451 76.3014 13200 0.8806 0.7399
0.8435 77.4580 13400 0.8713 0.7401
0.8393 78.6145 13600 0.8734 0.7387
0.8353 79.7710 13800 0.8702 0.7367
0.834 80.9275 14000 0.8661 0.7335
0.8265 82.0812 14200 0.8642 0.7312
0.8183 83.2377 14400 0.8638 0.7334
0.8238 84.3942 14600 0.8643 0.7311
0.8176 85.5507 14800 0.8640 0.7309
0.8183 86.7072 15000 0.8603 0.7294
0.8121 87.8638 15200 0.8586 0.7270
0.8033 89.0174 15400 0.8585 0.7265
0.8116 90.1739 15600 0.8560 0.7254
0.8058 91.3304 15800 0.8553 0.7262
0.7992 92.4870 16000 0.8548 0.7263
0.7979 93.6435 16200 0.8528 0.7236
0.7979 94.8 16400 0.8529 0.7235
0.7978 95.9565 16600 0.8526 0.7242
0.7934 97.1101 16800 0.8519 0.7238
0.7915 98.2667 17000 0.8520 0.7233
0.7996 99.4232 17200 0.8515 0.7230

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
10
Safetensors
Model size
99.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for csikasote/wav2vec2-base-librispeech-model

Finetuned
(136)
this model