AI & ML interests

TTS, ASR, VLMs

Uganda Open Source AI Lab (USOAL)

USOAL is an open initiative created by passionate machine learning researchers in Uganda to share exciting models and findings from the local research community.

We aim to contribute to the growth of African language technologies by building open, high-quality models for speech, language, and other AI applications.


Table of Contents


1. Uganda Text-to-Speech (TTS)

A collection of fine-tuned Orpheus 3B models that generate natural-sounding speech in multiple Ugandan languages including English, Luganda, Runyankole, Teso, and Acholi. These Models were built on top of open-sourced datasets from SunBird AI, Yogera and Mozilla's Common Voice Dataset.

🔉 Audio Examples (With Prompts)

Language Voice Prompt Audio Sample
English Christopher Hello I can speak in English as Christopher, one of the voices I can speak.
English Barbara Or as Barbara, this is one of my female voices. Pretty cool, right?
English Mary I can also speak as Mary as well.
English James Or I can speak as James, as you can see.
English Jessica This is my other voice called Jessica. I have more voices like Jennifer, Susan, Linda, Patricia, and Elizabeth, which I’ll share when they’re ready.
Luganda Christopher Nsobola okwo’geranga Christopher nga wowulila kati.
Luganda Charles Oba neenjogela nga Charles wenti.
Luganda Sandra Nina neddoboozi lya Sandra bweliti.
Luganda Michelle Nsobola ogwogella bwenti mulino eddoboozi.
Luganda Daniel Oba nemulino elye’kisajja nga woowulira.
Runyankole Christopher Nimbasa kugamba nka Christopher omwiraka eri.
Runyankole Patricia Bimwe ebirikugambwa aha reediyo nibihwera abantu kumanya obutare burungi bw’amasharuura gaabo.
Runyankole Elizabeth Omu disiturikiti ya Kayunga emisiri erikukira obwngi ekashangwa erimu ebicoori ebiine oburwaire.
Runyankole Michelle Nimbasa kugamba nka Michelle omwiraka eri.
Runyankole James Uganda eteire amaani aha buhingi n’oburiisa.
Teso Christopher Epedorete akoriok aimedaun ejok kanejaas aicoreta nu itikitikere adeka.
Teso Jessica Akoru ikorion luegelegela nes ingarakini itunganan.
Teso James Iraasit yen emunaara aticepak ikur enyamitos.
Teso Daniel Aipagisanar nes ewai ecie lo ibwaikinet iboro toma aswam.
Teso Barbara Isisianakinete isomeroi kwana asiomak eipone lo isubusaere.
Acholi Mark Uganda tye ka keme ki lok me pur.
Acholi Barbara Lupur twero nongo kony ma dit ka gunongo ngec me gengo onyo cango two ma balo jami ma i poto.
Acholi Michelle Gum madwong me timo biacara tye i te yub ma pe jenge i kom gamente.
Acholi James Ler ma pe gidodo ma woto ka yenyo cam i dye poto obalo cam weng ma tye i poto.

🛠️ How it Works

These TTS models are built using a two-stage architecture:

  1. Audio Token Generation
    Uses SNAC (Structured Neural Audio Codec) to convert text into audio tokens.

  2. TTS Model Fine-Tuning
    Fine-tuned versions of the Orpheus 3B model convert the audio tokens into realistic speech in multiple Ugandan languages.

⚠️ Note: Some non-English outputs may sound lower in quality due to SNAC not being pretrained on local African phonetics.

The Github repository shares more details about the models
📦 Code: GitHub Repository for Uganda TTS


2. Uganda Text Generation (Advanced)

We are also training powerful models that can understand and generate text in low-resource Ugandan languages.

📦 Code: GitHub Repository for Uganda Text Generation