Vinay15's picture
Upload 12 files
838974a verified
metadata
language:
  - it
license: mit
tags:
  - generated_from_trainer
datasets:
  - facebook/voxpopuli
pipeline_tag: text-to-speech
base_model: microsoft/speecht5_tts
model-index:
  - name: SpeechT5-it
    results:
      - task:
          type: text-to-speech
          name: Text to Speech
        dataset:
          name: VOXPOPULI
          type: facebook/voxpopuli
          config: it
          split: validation
          args: it
        metrics:
          - type: loss
            value: 0.46
            name: Loss

SpeechT5-it

This model is a fine-tuned version of microsoft/speecht5_tts on the VOXPOPULI dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4600

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss
0.5641 1.0 712 0.5090
0.5394 2.0 1424 0.4915
0.5277 3.0 2136 0.4819
0.5136 4.0 2848 0.4798
0.5109 5.0 3560 0.4733
0.5078 6.0 4272 0.4731
0.5033 7.0 4984 0.4692
0.5021 8.0 5696 0.4691
0.4984 9.0 6408 0.4670
0.488 10.0 7120 0.4641
0.491 11.0 7832 0.4641
0.4918 12.0 8544 0.4647
0.4933 13.0 9256 0.4622
0.499 14.0 9968 0.4619
0.4906 15.0 10680 0.4608
0.4884 16.0 11392 0.4622
0.4847 17.0 12104 0.4616
0.4916 18.0 12816 0.4592
0.4845 19.0 13528 0.4600
0.4788 20.0 14240 0.4594
0.4746 21.0 14952 0.4607
0.4875 22.0 15664 0.4615
0.4831 23.0 16376 0.4597
0.4798 24.0 17088 0.4595
0.4727 25.0 17800 0.4592
0.4736 26.0 18512 0.4598
0.4746 27.0 19224 0.4608
0.4728 28.0 19936 0.4589
0.4771 29.0 20648 0.4593
0.4743 30.0 21360 0.4588
0.4785 31.0 22072 0.4601
0.4757 32.0 22784 0.4597
0.4731 33.0 23496 0.4598
0.4746 34.0 24208 0.4593
0.4715 35.0 24920 0.4599
0.4769 36.0 25632 0.4622
0.4778 37.0 26344 0.4605
0.4798 38.0 27056 0.4594
0.4694 39.0 27768 0.4607
0.468 40.0 28480 0.4600

Framework versions

  • Transformers 4.30.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.13.1
  • Tokenizers 0.13.3