--- license: apache-2.0 datasets: - ClemSummer/welsh-transcription-samples-7k language: - cy base_model: - openai/whisper-tiny - techiaith/whisper-tiny-ft-cy-en --- # Whisper-Tiny Welsh / Cy This is a whisper-tiny model based on techiaith/whisper-tiny-ft-cy-en, fine tuned using the ACFT method, for use on Android phones. This model can be loaded into the FUTO Keyboard, and most likely other similar keyboards (Heliboard, Florisboard, AnySoftKeyboard, possibly even Swiftkey). More info on this can be found [here](https://github.com/futo-org/whisper-acft). ## Android Installation Instructions To use this model with FUTO keyboard: 1. Download the .bin file from [download/whisper-tiny-welsh.bin](download/whisper-tiny-welsh.bin) onto your Android phone 2. Either click the file and choose 'Open with FUTO keyboard' or open FUTO and go to Languages & Models > Add Language > Welsh and then under 'Voice Input Model' click to open the .bin file 3. From within FUTO keyboard, click the microphone, 'siarad Cymraeg' and see the results. ![screenshot](screenshot.jpg) ## Training and evaluation data Trained/evaluated using [welsh-transcription-samples](https://huggingface.co/datasets/ClemSummer/welsh-transcription-samples-7k/), a subset of Mozilla's Common Voice CY dataset. Smaller and more useful for poor-man's training without a GPU. Training on the full Mozilla Common Voice corpus may provide better results. ## Model Setup ```python # Training hyperparameters LEARNING_RATE = 1e-6 NUM_EPOCHS = 8 # (Note: only recordings < 29.0s were used) ``` ## WER & CER | Dataset | WER | CER | | -------- | --- | ----- | | CommonVoice CY (ClemSummer, validation split) | 62.99 | 21.46 | | techiaith/banc-trawsgrifiadau-bangor | TODO: *no access* | TODO: *no access* | ----- Thanks to techiaith and ClemSummer for their prior work. Diolch