Commit
·
7620057
1
Parent(s):
770e9a4
Whisper results removed
Browse files- README.md +39 -38
- images/cer.png +0 -0
- images/wer.png +0 -0
README.md
CHANGED
@@ -171,7 +171,6 @@ The model was evaluated using the following metrics:
|
|
171 |
| Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
|
172 |
| :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
173 |
| [CoRal-dataset/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) | 315M | Read-aloud and conversation | 6.5% ± 0.2% | 16.3% ± 0.4% |
|
174 |
-
| [CoRal-dataset/roest-whisper-large-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) | 1540M | Read-aloud and conversation | 5.3% ± 0.2% | 12.0% ± 0.4% |
|
175 |
| [Alvenir/roest-whisper-large-v1](https://huggingface.co/Alvenir/coral-1-whisper-large) | 1540M | Read-aloud | **4.3% ± 0.2%** | **10.4% ± 0.3%** |
|
176 |
| [alexandrainst/roest-wav2vec2-315M-v1](https://huggingface.co/alexandrainst/roest-315m) | 315M | Read-aloud | 6.6% ± 0.2% | 17.0% ± 0.4% |
|
177 |
| [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 4.7% ± 0.2% | 11.8% ± 0.3% |
|
@@ -185,45 +184,44 @@ The model was evaluated using the following metrics:
|
|
185 |
<img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
|
186 |
|
187 |
### Table CER scores in % of evaluation across demographics on the CoRal test data
|
188 |
-
| Category | roest-
|
189 |
-
|
190 |
-
| female |
|
191 |
-
| male |
|
192 |
-
| 0-25 |
|
193 |
-
| 25-50 |
|
194 |
-
| 50+ |
|
195 |
-
| Bornholmsk |
|
196 |
-
| Fynsk |
|
197 |
-
| Københavnsk |
|
198 |
-
| Non-native |
|
199 |
-
| Nordjysk |
|
200 |
-
| Sjællandsk |
|
201 |
-
| Sydømål |
|
202 |
-
| Sønderjysk |
|
203 |
-
| Vestjysk |
|
204 |
-
| Østjysk |
|
205 |
-
| Overall |
|
206 |
|
207 |
### Table WER scores in % of evaluation across demographics on the CoRal test data
|
208 |
-
| Category | roest-
|
209 |
-
|
210 |
-
| female |
|
211 |
-
| male |
|
212 |
-
| 0-25 |
|
213 |
-
| 25-50 |
|
214 |
-
| 50+ |
|
215 |
-
| Bornholmsk |
|
216 |
-
| Fynsk |
|
217 |
-
| Københavnsk |
|
218 |
-
| Non-native |
|
219 |
-
| Nordjysk |
|
220 |
-
| Sjællandsk |
|
221 |
-
| Sydømål |
|
222 |
-
| Sønderjysk |
|
223 |
-
| Vestjysk |
|
224 |
-
| Østjysk |
|
225 |
-
| Overall |
|
226 |
-
|
227 |
|
228 |
### Roest-wav2vec2-315M with and without language model
|
229 |
The inclusion of a post-processing language model can affect the performance significantly. The Roest-v1 and Roest-v2 models are using the same Language Model (LM). The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
|
@@ -267,6 +265,9 @@ The model was also tested against other datasets to evaluate generalizability:
|
|
267 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
|
268 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
|
269 |
|
|
|
|
|
|
|
270 |
---
|
271 |
|
272 |
## Training curves
|
|
|
171 |
| Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
|
172 |
| :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
173 |
| [CoRal-dataset/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) | 315M | Read-aloud and conversation | 6.5% ± 0.2% | 16.3% ± 0.4% |
|
|
|
174 |
| [Alvenir/roest-whisper-large-v1](https://huggingface.co/Alvenir/coral-1-whisper-large) | 1540M | Read-aloud | **4.3% ± 0.2%** | **10.4% ± 0.3%** |
|
175 |
| [alexandrainst/roest-wav2vec2-315M-v1](https://huggingface.co/alexandrainst/roest-315m) | 315M | Read-aloud | 6.6% ± 0.2% | 17.0% ± 0.4% |
|
176 |
| [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 4.7% ± 0.2% | 11.8% ± 0.3% |
|
|
|
184 |
<img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
|
185 |
|
186 |
### Table CER scores in % of evaluation across demographics on the CoRal test data
|
187 |
+
| Category | roest-whisper-large-v1 | roest-wav2vec2-315m-v1 | roest-wav2vec2-315m-v2 |
|
188 |
+
|:---:|:---:|:---:|:---:|
|
189 |
+
| female | 5.1 | 7.4 | 7.2 |
|
190 |
+
| male | 3.6 | 5.8 | 5.7 |
|
191 |
+
| 0-25 | 3.4 | 5.4 | 5.3 |
|
192 |
+
| 25-50 | 4.0 | 6.2 | 6.0 |
|
193 |
+
| 50+ | 5.0 | 7.5 | 7.4 |
|
194 |
+
| Bornholmsk | 3.8 | 6.8 | 6.1 |
|
195 |
+
| Fynsk | 5.1 | 7.4 | 7.2 |
|
196 |
+
| Københavnsk | 1.9 | 3.3 | 3.2 |
|
197 |
+
| Non-native | 4.8 | 7.8 | 7.5 |
|
198 |
+
| Nordjysk | 1.6 | 2.6 | 2.8 |
|
199 |
+
| Sjællandsk | 3.0 | 4.4 | 4.5 |
|
200 |
+
| Sydømål | 4.1 | 6.4 | 6.4 |
|
201 |
+
| Sønderjysk | 8.8 | 11.9 | 11.6 |
|
202 |
+
| Vestjysk | 6.4 | 10.1 | 9.8 |
|
203 |
+
| Østjysk | 2.6 | 4.0 | 4.1 |
|
204 |
+
| Overall | 4.3 | 6.6 | 6.5 |
|
205 |
|
206 |
### Table WER scores in % of evaluation across demographics on the CoRal test data
|
207 |
+
| Category | roest-whisper-large-v1 | roest-wav2vec2-315m-v1 | roest-wav2vec2-315m-v2 |
|
208 |
+
|:---:|:---:|:---:|:---:|
|
209 |
+
| female | 11.5 | 18.5 | 17.7 |
|
210 |
+
| male | 9.4 | 15.5 | 14.9 |
|
211 |
+
| 0-25 | 9.0 | 14.7 | 14.0 |
|
212 |
+
| 25-50 | 10.1 | 16.6 | 15.8 |
|
213 |
+
| 50+ | 11.3 | 18.2 | 17.7 |
|
214 |
+
| Bornholmsk | 9.8 | 17.7 | 15.7 |
|
215 |
+
| Fynsk | 12.1 | 18.3 | 17.7 |
|
216 |
+
| Københavnsk | 5.9 | 10.2 | 10.0 |
|
217 |
+
| Non-native | 12.2 | 20.9 | 19.4 |
|
218 |
+
| Nordjysk | 4.5 | 7.7 | 7.5 |
|
219 |
+
| Sjællandsk | 7.6 | 12.6 | 12.7 |
|
220 |
+
| Sydømål | 10.0 | 14.9 | 15.3 |
|
221 |
+
| Sønderjysk | 17.5 | 26.0 | 25.4 |
|
222 |
+
| Vestjysk | 15.0 | 26.3 | 25.2 |
|
223 |
+
| Østjysk | 7.5 | 11.7 | 11.3 |
|
224 |
+
| Overall | 10.4 | 17.0 | 16.3 |
|
|
|
225 |
|
226 |
### Roest-wav2vec2-315M with and without language model
|
227 |
The inclusion of a post-processing language model can affect the performance significantly. The Roest-v1 and Roest-v2 models are using the same Language Model (LM). The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
|
|
|
265 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
|
266 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
|
267 |
|
268 |
+
|
269 |
+
**OBS!** The vocab used for training incudes numerals (0,1,2,..,9), which are translated to text in a post-processing step. If the model misses spaces the numbers are interpreted as one, which expecially affects the NST score as this dataset contains many numerals.
|
270 |
+
|
271 |
---
|
272 |
|
273 |
## Training curves
|
images/cer.png
CHANGED
![]() |
![]() |
images/wer.png
CHANGED
![]() |
![]() |