Whisper large-v3 turbo model for Kinyarwanda

This repository contains the fine-tuned model leophill/whisper-large-v3-turbo-sw-kinyarwanda, which was built as part of the Kinyarwanda Automatic Speech Recognition Track A challenge organized on Kaggle by Digital Umuganda. The dataset comprises 500 hours of labeled Kinyarwanda speech data spanning five high-impact domains—Health, Government, Financial Services, Education, and Agriculture—to support robust ASR model development in both conversational and formal contexts.

This model uses Swahili (sw) as proxy language as Kinyarwanda is not taken into account by Whisper pretrained models.

The model supports capitalization and punctuation. It has been ranked 3rd on the track A leaderboard, which evaluated lowercase transcriptions with no punctuation.

Usage

To run the model, first install both torch and the transformers libraries.

The model can be used with the pipeline class to transcribe audios of arbitrary length, inluding local audio files:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "leophill/whisper-large-v3-turbo-sw-kinyarwanda"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)
audio_file = "audio.wav"
result = pipe(audio_file, generate_kwargs={"language": "swahili", "task": "transcribe"})
print(result["text"])

More information

For more information about the original Whisper large-v3 turbo model, see its model card.

CTranslate2 version

A CTranslate2 version of this model is available on a dedicated model page.

Citation

@misc{whisper_lv3_turbo_kinyarwanda_asr,
  author = {Leopold Hillah},
  title = {Finetuning Whisper Large V3 Turbo for Kinyarwanda ASR using Swahili as Proxy Language},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/leophill/whisper-large-v3-turbo-sw-kinyarwanda}
}
@misc{kinyarwanda-automatic-speech-recognition-track-a,
    author = {Digital Umuganda},
    title = {Kinyarwanda Automatic Speech Recognition Track A},
    year = {2025},
    howpublished = {\url{https://kaggle.com/competitions/kinyarwanda-automatic-speech-recognition-track-a}},
    note = {Kaggle}
}
Downloads last month
18
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for leophill/whisper-large-v3-turbo-sw-kinyarwanda

Finetuned
(303)
this model