Whisper-Tiny Welsh / Cy

This is a whisper-tiny model based on techiaith/whisper-tiny-ft-cy-en, fine tuned using the ACFT method, for use on Android phones.

This model can be loaded into the FUTO Keyboard, and most likely other similar keyboards (Heliboard, Florisboard, AnySoftKeyboard, possibly even Swiftkey). More info on this can be found here.

Android Installation Instructions

To use this model with FUTO keyboard:

  1. Download the .bin file from download/whisper-tiny-welsh.bin onto your Android phone
  2. Either click the file and choose 'Open with FUTO keyboard' or open FUTO and go to Languages & Models > Add Language > Welsh and then under 'Voice Input Model' click to open the .bin file
  3. From within FUTO keyboard, click the microphone, 'siarad Cymraeg' and see the results.

screenshot

Training and evaluation data

Trained/evaluated using welsh-transcription-samples, a subset of Mozilla's Common Voice CY dataset. Smaller and more useful for poor-man's training without a GPU. Training on the full Mozilla Common Voice corpus may provide better results.

Model Setup

# Training hyperparameters
LEARNING_RATE = 1e-6
NUM_EPOCHS = 8
# (Note: only recordings < 29.0s were used)

WER & CER

Dataset WER CER
CommonVoice CY (ClemSummer, validation split) 62.99 21.46
techiaith/banc-trawsgrifiadau-bangor TODO: no access TODO: no access

Thanks to techiaith and ClemSummer for their prior work. Diolch

Downloads last month
31
Safetensors
Model size
57.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pjrobertson/whisper-tiny-welsh-cy

Finetuned
(1586)
this model

Dataset used to train pjrobertson/whisper-tiny-welsh-cy