dkt-py-bot's picture
Update readme
183dd82 verified
metadata
library_name: transformers
language:
  - twi
license: apache-2.0
base_model: openai/whisper-tiny
tags:
  - custom-dataset
  - local-dataset
  - whisper
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: T6-Whisper-FineTuned-DL-Twi
    results: []

T6-Whisper-FineTuned-DL-Twi

This model is a fine-tuned version of openai/whisper-tiny on the Twi-native Ghanaian language. dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0063
  • Wer: 23.4562
  • Cer: 21.7611

Model description

T6-Whisper-FineTuned-DL-Twi is a fine-tuned version of openai/whisper-tiny focused specifically on the Twi language, a widely spoken native language in Ghana. This model adapts Whisper’s multilingual speech recognition capabilities to better understand and transcribe Twi speech, especially in financial contexts.

It was developed as part of a project to support accessibility in financial systems, aiming to make digital financial services more inclusive for Ghanaian communities that primarily communicate in Twi.

Intended uses & limitations

Intended uses:

  • Automatic Speech Recognition (ASR) for Twi and English-Twi mixed audio.
  • Enhancing voice interfaces in fintech platforms (e.g., mobile banking, customer support).
  • Increasing accessibility for low-literate or visually impaired users in financial contexts.
  • Supporting research in code-switched speech and low-resource African languages.

Limitations:

  • May not perform optimally outside the financial domain (e.g., health or legal speech).
  • Performance can degrade in noisy environments or with heavy accents not represented in the training data.
  • While it handles code-switching, rapid or highly irregular switches may still reduce accuracy.
  • Based on the Whisper-tiny model, which is optimized for speed and size, not peak performance.

Training and evaluation data

The model was fine-tuned using a custom dataset containing Twi and English-Twi code-switched audio, primarily from the financial domain. This includes content like:

  • Mobile money instructions
  • Banking app voice interactions
  • Financial literacy radio shows and interviews
  • Call center conversations involving customer queries
  • Dataset size: ~ 50 hours
  • Language mix: Twi + English (code-switched)
  • Transcript quality: Manually verified by native speakers
  • Train/validation split: [e.g., 80/20]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
0.025 0.6333 1000 0.0285 27.9879 21.3775
0.0083 1.2666 2000 0.0094 20.4318 17.7329
0.0058 1.8999 3000 0.0072 19.5177 17.5028
0.0012 2.5332 4000 0.0063 23.4562 21.7611

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0