Patrick Robertson commited on
Commit
7b46f36
·
1 Parent(s): 3f9a3e4

Add more info on training method and WER+CER scores

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -17,6 +17,8 @@ This is a whisper-tiny model based on techiaith/whisper-tiny-ft-cy-en, fine tune
17
  This model can be loaded into the FUTO Keyboard, and most likely other similar keyboards (Heliboard, Florisboard, AnySoftKeyboard, possibly even Swiftkey).
18
  More info on this can be found [here](https://github.com/futo-org/whisper-acft).
19
 
 
 
20
  To use this model with FUTO keyboard:
21
 
22
  1. Download the .bin file from [download/whisper-tiny-welsh.bin](download/whisper-tiny-welsh.bin) onto your Android phone
@@ -26,6 +28,26 @@ To use this model with FUTO keyboard:
26
  ![screenshot](screenshot.jpg)
27
 
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  -----
30
 
31
 
 
17
  This model can be loaded into the FUTO Keyboard, and most likely other similar keyboards (Heliboard, Florisboard, AnySoftKeyboard, possibly even Swiftkey).
18
  More info on this can be found [here](https://github.com/futo-org/whisper-acft).
19
 
20
+ ## Android Installation Instructions
21
+
22
  To use this model with FUTO keyboard:
23
 
24
  1. Download the .bin file from [download/whisper-tiny-welsh.bin](download/whisper-tiny-welsh.bin) onto your Android phone
 
28
  ![screenshot](screenshot.jpg)
29
 
30
 
31
+ ## Training and evaluation data
32
+
33
+ Trained/evaluated using [welsh-transcription-samples](https://huggingface.co/datasets/ClemSummer/welsh-transcription-samples-7k/), a subset of Mozilla's Common Voice CY dataset. Smaller and more useful for poor-man's training without a GPU. Training on the full Mozilla Common Voice corpus may provide better results.
34
+
35
+ ## Model Setup
36
+
37
+ ```python
38
+ # Training hyperparameters
39
+ LEARNING_RATE = 1e-6
40
+ NUM_EPOCHS = 8
41
+ # (Note: only recordings < 29.0s were used)
42
+ ```
43
+
44
+ ## WER & CER
45
+
46
+ | Dataset | WER | CER |
47
+ | -------- | --- | ----- |
48
+ | CommonVoice CY (ClemSummer, validation split) | 62.99 | 21.46 |
49
+ | techiaith/banc-trawsgrifiadau-bangor | TODO: *no access* | TODO: *no access* |
50
+
51
  -----
52
 
53