SpeechT5 TTS MT V2

This model is a fine-tuned version of microsoft/speecht5_tts on the maguette dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10
training_steps: 200
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	6.6667	10	1.1093
No log	13.3333	20	0.9983
1.1116	20.0	30	0.9168
1.1116	26.6667	40	0.8119
0.8005	33.3333	50	0.7925
0.8005	40.0	60	0.7629
0.8005	46.6667	70	0.7380
0.738	53.3333	80	0.6968
0.738	60.0	90	0.6487
0.677	66.6667	100	0.6119
0.677	73.3333	110	0.5832
0.677	80.0	120	0.5555
0.5968	86.6667	130	0.5425
0.5968	93.3333	140	0.5329
0.5626	100.0	150	0.5180
0.5626	106.6667	160	0.5074
0.5626	113.3333	170	0.5100
0.5461	120.0	180	0.5035
0.5461	126.6667	190	0.4918
0.5208	133.3333	200	0.4931

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1268)

this model