Patrick Robertson
commited on
Commit
·
7b46f36
1
Parent(s):
3f9a3e4
Add more info on training method and WER+CER scores
Browse files
README.md
CHANGED
@@ -17,6 +17,8 @@ This is a whisper-tiny model based on techiaith/whisper-tiny-ft-cy-en, fine tune
|
|
17 |
This model can be loaded into the FUTO Keyboard, and most likely other similar keyboards (Heliboard, Florisboard, AnySoftKeyboard, possibly even Swiftkey).
|
18 |
More info on this can be found [here](https://github.com/futo-org/whisper-acft).
|
19 |
|
|
|
|
|
20 |
To use this model with FUTO keyboard:
|
21 |
|
22 |
1. Download the .bin file from [download/whisper-tiny-welsh.bin](download/whisper-tiny-welsh.bin) onto your Android phone
|
@@ -26,6 +28,26 @@ To use this model with FUTO keyboard:
|
|
26 |

|
27 |
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
-----
|
30 |
|
31 |
|
|
|
17 |
This model can be loaded into the FUTO Keyboard, and most likely other similar keyboards (Heliboard, Florisboard, AnySoftKeyboard, possibly even Swiftkey).
|
18 |
More info on this can be found [here](https://github.com/futo-org/whisper-acft).
|
19 |
|
20 |
+
## Android Installation Instructions
|
21 |
+
|
22 |
To use this model with FUTO keyboard:
|
23 |
|
24 |
1. Download the .bin file from [download/whisper-tiny-welsh.bin](download/whisper-tiny-welsh.bin) onto your Android phone
|
|
|
28 |

|
29 |
|
30 |
|
31 |
+
## Training and evaluation data
|
32 |
+
|
33 |
+
Trained/evaluated using [welsh-transcription-samples](https://huggingface.co/datasets/ClemSummer/welsh-transcription-samples-7k/), a subset of Mozilla's Common Voice CY dataset. Smaller and more useful for poor-man's training without a GPU. Training on the full Mozilla Common Voice corpus may provide better results.
|
34 |
+
|
35 |
+
## Model Setup
|
36 |
+
|
37 |
+
```python
|
38 |
+
# Training hyperparameters
|
39 |
+
LEARNING_RATE = 1e-6
|
40 |
+
NUM_EPOCHS = 8
|
41 |
+
# (Note: only recordings < 29.0s were used)
|
42 |
+
```
|
43 |
+
|
44 |
+
## WER & CER
|
45 |
+
|
46 |
+
| Dataset | WER | CER |
|
47 |
+
| -------- | --- | ----- |
|
48 |
+
| CommonVoice CY (ClemSummer, validation split) | 62.99 | 21.46 |
|
49 |
+
| techiaith/banc-trawsgrifiadau-bangor | TODO: *no access* | TODO: *no access* |
|
50 |
+
|
51 |
-----
|
52 |
|
53 |
|