MarieAlvenir commited on
Commit
7620057
·
1 Parent(s): 770e9a4

Whisper results removed

Browse files
Files changed (3) hide show
  1. README.md +39 -38
  2. images/cer.png +0 -0
  3. images/wer.png +0 -0
README.md CHANGED
@@ -171,7 +171,6 @@ The model was evaluated using the following metrics:
171
  | Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
172
  | :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
173
  | [CoRal-dataset/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) | 315M | Read-aloud and conversation | 6.5% ± 0.2% | 16.3% ± 0.4% |
174
- | [CoRal-dataset/roest-whisper-large-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) | 1540M | Read-aloud and conversation | 5.3% ± 0.2% | 12.0% ± 0.4% |
175
  | [Alvenir/roest-whisper-large-v1](https://huggingface.co/Alvenir/coral-1-whisper-large) | 1540M | Read-aloud | **4.3% ± 0.2%** | **10.4% ± 0.3%** |
176
  | [alexandrainst/roest-wav2vec2-315M-v1](https://huggingface.co/alexandrainst/roest-315m) | 315M | Read-aloud | 6.6% ± 0.2% | 17.0% ± 0.4% |
177
  | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 4.7% ± 0.2% | 11.8% ± 0.3% |
@@ -185,45 +184,44 @@ The model was evaluated using the following metrics:
185
  <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
186
 
187
  ### Table CER scores in % of evaluation across demographics on the CoRal test data
188
- | Category | roest-wav2vec2-315m-v2 | roest-wav2vec2-315m-v1 | roest-whisper-large-v2 | roest-whisper-large-v1 |
189
- |:---:|:---:|:---:|:---:|:---:|
190
- | female | 7.2 | 7.4 | 6.9 | 5.1 |
191
- | male | 5.7 | 5.8 | 3.7 | 3.6 |
192
- | 0-25 | 5.3 | 5.4 | 3.3 | 3.4 |
193
- | 25-50 | 6.0 | 6.2 | 6.5 | 4.0 |
194
- | 50+ | 7.4 | 7.5 | 5.1 | 5.0 |
195
- | Bornholmsk | 6.1 | 6.8 | 3.4 | 3.8 |
196
- | Fynsk | 7.2 | 7.4 | 13.8 | 5.1 |
197
- | Københavnsk | 3.2 | 3.3 | 2.1 | 1.9 |
198
- | Non-native | 7.5 | 7.8 | 4.9 | 4.8 |
199
- | Nordjysk | 2.8 | 2.6 | 1.7 | 1.6 |
200
- | Sjællandsk | 4.5 | 4.4 | 2.9 | 3.0 |
201
- | Sydømål | 6.4 | 6.4 | 4.1 | 4.1 |
202
- | Sønderjysk | 11.6 | 11.9 | 8.8 | 8.8 |
203
- | Vestjysk | 9.8 | 10.1 | 6.9 | 6.4 |
204
- | Østjysk | 4.1 | 4.0 | 2.8 | 2.6 |
205
- | Overall | 6.5 | 6.6 | 5.3 | 4.3 |
206
 
207
  ### Table WER scores in % of evaluation across demographics on the CoRal test data
208
- | Category | roest-wav2vec2-315m-v2 | roest-wav2vec2-315m-v1 | roest-whisper-large-v2 | roest-whisper-large-v1 |
209
- |:---:|:---:|:---:|:---:|:---:|
210
- | female | 17.7 | 18.5 | 14.2 | 11.5 |
211
- | male | 14.9 | 15.5 | 9.9 | 9.4 |
212
- | 0-25 | 14.0 | 14.7 | 9.0 | 9.0 |
213
- | 25-50 | 15.8 | 16.6 | 14.1 | 10.1 |
214
- | 50+ | 17.7 | 18.2 | 11.5 | 11.3 |
215
- | Bornholmsk | 15.7 | 17.7 | 9.3 | 9.8 |
216
- | Fynsk | 17.7 | 18.3 | 24.9 | 12.1 |
217
- | Københavnsk | 10.0 | 10.2 | 6.7 | 5.9 |
218
- | Non-native | 19.4 | 20.9 | 13.0 | 12.2 |
219
- | Nordjysk | 7.5 | 7.7 | 4.9 | 4.5 |
220
- | Sjællandsk | 12.7 | 12.6 | 7.5 | 7.6 |
221
- | Sydømål | 15.3 | 14.9 | 10.3 | 10.0 |
222
- | Sønderjysk | 25.4 | 26.0 | 17.4 | 17.5 |
223
- | Vestjysk | 25.2 | 26.3 | 16.3 | 15.0 |
224
- | Østjysk | 11.3 | 11.7 | 8.0 | 7.5 |
225
- | Overall | 16.3 | 17.0 | 12.0 | 10.4 |
226
-
227
 
228
  ### Roest-wav2vec2-315M with and without language model
229
  The inclusion of a post-processing language model can affect the performance significantly. The Roest-v1 and Roest-v2 models are using the same Language Model (LM). The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
@@ -267,6 +265,9 @@ The model was also tested against other datasets to evaluate generalizability:
267
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
268
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
269
 
 
 
 
270
  ---
271
 
272
  ## Training curves
 
171
  | Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
172
  | :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
173
  | [CoRal-dataset/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) | 315M | Read-aloud and conversation | 6.5% ± 0.2% | 16.3% ± 0.4% |
 
174
  | [Alvenir/roest-whisper-large-v1](https://huggingface.co/Alvenir/coral-1-whisper-large) | 1540M | Read-aloud | **4.3% ± 0.2%** | **10.4% ± 0.3%** |
175
  | [alexandrainst/roest-wav2vec2-315M-v1](https://huggingface.co/alexandrainst/roest-315m) | 315M | Read-aloud | 6.6% ± 0.2% | 17.0% ± 0.4% |
176
  | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 4.7% ± 0.2% | 11.8% ± 0.3% |
 
184
  <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
185
 
186
  ### Table CER scores in % of evaluation across demographics on the CoRal test data
187
+ | Category | roest-whisper-large-v1 | roest-wav2vec2-315m-v1 | roest-wav2vec2-315m-v2 |
188
+ |:---:|:---:|:---:|:---:|
189
+ | female | 5.1 | 7.4 | 7.2 |
190
+ | male | 3.6 | 5.8 | 5.7 |
191
+ | 0-25 | 3.4 | 5.4 | 5.3 |
192
+ | 25-50 | 4.0 | 6.2 | 6.0 |
193
+ | 50+ | 5.0 | 7.5 | 7.4 |
194
+ | Bornholmsk | 3.8 | 6.8 | 6.1 |
195
+ | Fynsk | 5.1 | 7.4 | 7.2 |
196
+ | Københavnsk | 1.9 | 3.3 | 3.2 |
197
+ | Non-native | 4.8 | 7.8 | 7.5 |
198
+ | Nordjysk | 1.6 | 2.6 | 2.8 |
199
+ | Sjællandsk | 3.0 | 4.4 | 4.5 |
200
+ | Sydømål | 4.1 | 6.4 | 6.4 |
201
+ | Sønderjysk | 8.8 | 11.9 | 11.6 |
202
+ | Vestjysk | 6.4 | 10.1 | 9.8 |
203
+ | Østjysk | 2.6 | 4.0 | 4.1 |
204
+ | Overall | 4.3 | 6.6 | 6.5 |
205
 
206
  ### Table WER scores in % of evaluation across demographics on the CoRal test data
207
+ | Category | roest-whisper-large-v1 | roest-wav2vec2-315m-v1 | roest-wav2vec2-315m-v2 |
208
+ |:---:|:---:|:---:|:---:|
209
+ | female | 11.5 | 18.5 | 17.7 |
210
+ | male | 9.4 | 15.5 | 14.9 |
211
+ | 0-25 | 9.0 | 14.7 | 14.0 |
212
+ | 25-50 | 10.1 | 16.6 | 15.8 |
213
+ | 50+ | 11.3 | 18.2 | 17.7 |
214
+ | Bornholmsk | 9.8 | 17.7 | 15.7 |
215
+ | Fynsk | 12.1 | 18.3 | 17.7 |
216
+ | Københavnsk | 5.9 | 10.2 | 10.0 |
217
+ | Non-native | 12.2 | 20.9 | 19.4 |
218
+ | Nordjysk | 4.5 | 7.7 | 7.5 |
219
+ | Sjællandsk | 7.6 | 12.6 | 12.7 |
220
+ | Sydømål | 10.0 | 14.9 | 15.3 |
221
+ | Sønderjysk | 17.5 | 26.0 | 25.4 |
222
+ | Vestjysk | 15.0 | 26.3 | 25.2 |
223
+ | Østjysk | 7.5 | 11.7 | 11.3 |
224
+ | Overall | 10.4 | 17.0 | 16.3 |
 
225
 
226
  ### Roest-wav2vec2-315M with and without language model
227
  The inclusion of a post-processing language model can affect the performance significantly. The Roest-v1 and Roest-v2 models are using the same Language Model (LM). The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
 
265
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
266
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
267
 
268
+
269
+ **OBS!** The vocab used for training incudes numerals (0,1,2,..,9), which are translated to text in a post-processing step. If the model misses spaces the numbers are interpreted as one, which expecially affects the NST score as this dataset contains many numerals.
270
+
271
  ---
272
 
273
  ## Training curves
images/cer.png CHANGED
images/wer.png CHANGED