metadata

library_name: transformers
language:
  - twi
license: apache-2.0
base_model: openai/whisper-tiny
tags:
  - custom-dataset
  - local-dataset
  - whisper
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: T6-Whisper-FineTuned-DL-Twi
    results: []

T6-Whisper-FineTuned-DL-Twi

This model is a fine-tuned version of openai/whisper-tiny on the Twi-native Ghanaian language. dataset. It achieves the following results on the evaluation set:

Loss: 0.0063
Wer: 23.4562
Cer: 21.7611

Model description

T6-Whisper-FineTuned-DL-Twi is a fine-tuned version of openai/whisper-tiny focused specifically on the Twi language, a widely spoken native language in Ghana. This model adapts Whisper’s multilingual speech recognition capabilities to better understand and transcribe Twi speech, especially in financial contexts.

It was developed as part of a project to support accessibility in financial systems, aiming to make digital financial services more inclusive for Ghanaian communities that primarily communicate in Twi.

Intended uses & limitations

Intended uses:

Automatic Speech Recognition (ASR) for Twi and English-Twi mixed audio.
Enhancing voice interfaces in fintech platforms (e.g., mobile banking, customer support).
Increasing accessibility for low-literate or visually impaired users in financial contexts.
Supporting research in code-switched speech and low-resource African languages.

Limitations:

May not perform optimally outside the financial domain (e.g., health or legal speech).
Performance can degrade in noisy environments or with heavy accents not represented in the training data.
While it handles code-switching, rapid or highly irregular switches may still reduce accuracy.
Based on the Whisper-tiny model, which is optimized for speed and size, not peak performance.

Training and evaluation data

The model was fine-tuned using a custom dataset containing Twi and English-Twi code-switched audio, primarily from the financial domain. This includes content like:

Mobile money instructions
Banking app voice interactions
Financial literacy radio shows and interviews
Call center conversations involving customer queries
Dataset size: ~ 50 hours
Language mix: Twi + English (code-switched)
Transcript quality: Manually verified by native speakers
Train/validation split: [e.g., 80/20]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.025	0.6333	1000	0.0285	27.9879	21.3775
0.0083	1.2666	2000	0.0094	20.4318	17.7329
0.0058	1.8999	3000	0.0072	19.5177	17.5028
0.0012	2.5332	4000	0.0063	23.4562	21.7611

Framework versions

Transformers 4.48.0.dev0
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0