mms-1b-fl102-xho-5

This model is a fine-tuned version of facebook/mms-1b-fl102 on the lelapa/Vukuzenzele_isiXhosa_Speech_Dataset_ViXSD for 5 epochs.

It achieves the following results on the evaluation set:

Loss: 0.2675
Wer: 0.3656

Model description

Massively Multilingual Speech (MMS) - Finetuned ASR - FL102

We finetune the MMS - FL102 checkpoint on 7hrs of isiXhosa Speech Data, which is a model fine-tuned for multi-lingual ASR and part of Facebook's Massive Multilingual Speech project. The checkpoint is based on the Wav2Vec2 architecture and makes use of adapter models to transcribe 100+ languages. The checkpoint consists of 1 billion parameters and has been fine-tuned from facebook/mms-1b on 102 languages of Fleurs.

Intended uses & limitations

The datasets created and used for the model benchmarks are taken solely from South African government magazine resources. Therefore it is high-lighted that this model might ignore certain social/societal structures and will be representative of the dominant political views at the time the dataset was sourced.

Training and evaluation data

lelapa/Vukuzenzele_isiXhosa_Speech_Dataset_ViXSD

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 8
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
9.1848	1.0	10	3.3243	0.9999
1.8736	2.0	20	0.5765	0.5415
0.3856	3.0	30	0.3047	0.3908
0.2553	4.0	40	0.2675	0.3656

Framework versions

Transformers 4.49.0.dev0
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

lelapa
/

mms-1b-fl102-xho-5

You need to agree to share your contact information to access this model