CoRal-project
/

roest-wav2vec2-315m-v2

Automatic Speech Recognition

Safetensors

Danish

wav2vec2

Eval Results

Model card Files Files and versions Community

MarieAlvenir commited on Apr 7

Commit

7620057

1 Parent(s): 770e9a4

Whisper results removed

Browse files

Files changed (3) hide show

README.md +39 -38
images/cer.png +0 -0
images/wer.png +0 -0

README.md CHANGED Viewed

@@ -171,7 +171,6 @@ The model was evaluated using the following metrics:
 | Model                                                                                            | Number of parameters |   Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
 | :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
 | [CoRal-dataset/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) |                 315M | Read-aloud and conversation |                                                                             6.5% ± 0.2% |                                                                            16.3% ± 0.4% |
-| [CoRal-dataset/roest-whisper-large-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) |                1540M | Read-aloud and conversation |                                                                             5.3%  ± 0.2%            |                                                                               12.0% ± 0.4%          |
 | [Alvenir/roest-whisper-large-v1](https://huggingface.co/Alvenir/coral-1-whisper-large)            |                1540M |                  Read-aloud |                                                                         **4.3% ± 0.2%** |                                                                        **10.4% ± 0.3%** |
 | [alexandrainst/roest-wav2vec2-315M-v1](https://huggingface.co/alexandrainst/roest-315m)                      |                 315M |                  Read-aloud |                                                                             6.6% ± 0.2% |                                                                            17.0% ± 0.4% |
 | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2)                                  |                1540M |                  Read-aloud |                                                                             4.7% ± 0.2% |                                                                            11.8% ± 0.3% |
@@ -185,45 +184,44 @@ The model was evaluated using the following metrics:
 <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
 ### Table CER scores in % of evaluation across demographics on the CoRal test data
-| Category | roest-wav2vec2-315m-v2 | roest-wav2vec2-315m-v1 | roest-whisper-large-v2 | roest-whisper-large-v1 |
-|:---:|:---:|:---:|:---:|:---:|
-| female | 7.2 | 7.4 | 6.9 | 5.1 |
-| male | 5.7 | 5.8 | 3.7 | 3.6 |
-| 0-25 | 5.3 | 5.4 | 3.3 | 3.4 |
-| 25-50 | 6.0 | 6.2 | 6.5 | 4.0 |
-| 50+ | 7.4 | 7.5 | 5.1 | 5.0 |
-| Bornholmsk | 6.1 | 6.8 | 3.4 | 3.8 |
-| Fynsk | 7.2 | 7.4 | 13.8 | 5.1 |
-| Københavnsk | 3.2 | 3.3 | 2.1 | 1.9 |
-| Non-native | 7.5 | 7.8 | 4.9 | 4.8 |
-| Nordjysk | 2.8 | 2.6 | 1.7 | 1.6 |
-| Sjællandsk | 4.5 | 4.4 | 2.9 | 3.0 |
-| Sydømål | 6.4 | 6.4 | 4.1 | 4.1 |
-| Sønderjysk | 11.6 | 11.9 | 8.8 | 8.8 |
-| Vestjysk | 9.8 | 10.1 | 6.9 | 6.4 |
-| Østjysk | 4.1 | 4.0 | 2.8 | 2.6 |
-| Overall | 6.5 | 6.6 | 5.3 | 4.3 |
 ### Table WER scores in % of evaluation across demographics on the CoRal test data
-| Category | roest-wav2vec2-315m-v2 | roest-wav2vec2-315m-v1 | roest-whisper-large-v2 | roest-whisper-large-v1 |
-|:---:|:---:|:---:|:---:|:---:|
-| female | 17.7 | 18.5 | 14.2 | 11.5 |
-| male | 14.9 | 15.5 | 9.9 | 9.4 |
-| 0-25 | 14.0 | 14.7 | 9.0 | 9.0 |
-| 25-50 | 15.8 | 16.6 | 14.1 | 10.1 |
-| 50+ | 17.7 | 18.2 | 11.5 | 11.3 |
-| Bornholmsk | 15.7 | 17.7 | 9.3 | 9.8 |
-| Fynsk | 17.7 | 18.3 | 24.9 | 12.1 |
-| Københavnsk | 10.0 | 10.2 | 6.7 | 5.9 |
-| Non-native | 19.4 | 20.9 | 13.0 | 12.2 |
-| Nordjysk | 7.5 | 7.7 | 4.9 | 4.5 |
-| Sjællandsk | 12.7 | 12.6 | 7.5 | 7.6 |
-| Sydømål | 15.3 | 14.9 | 10.3 | 10.0 |
-| Sønderjysk | 25.4 | 26.0 | 17.4 | 17.5 |
-| Vestjysk | 25.2 | 26.3 | 16.3 | 15.0 |
-| Østjysk | 11.3 | 11.7 | 8.0 | 7.5 |
-| Overall | 16.3 | 17.0 | 12.0 | 10.4 |
 ### Roest-wav2vec2-315M with and without language model
 The inclusion of a post-processing language model can affect the performance significantly. The Roest-v1 and Roest-v2 models are using the same Language Model (LM). The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
@@ -267,6 +265,9 @@ The model was also tested against other datasets to evaluate generalizability:
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs)                         | 27.3        | 7.9   | **26.4**    | **7.7**  |
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed                  | 16.6        | 6.3   | **15.6**    | **6.1**  |
 ---
 ## Training curves

 | Model                                                                                            | Number of parameters |   Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
 | :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
 | [CoRal-dataset/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-dataset/roest-whisper-large) |                 315M | Read-aloud and conversation |                                                                             6.5% ± 0.2% |                                                                            16.3% ± 0.4% |
 | [Alvenir/roest-whisper-large-v1](https://huggingface.co/Alvenir/coral-1-whisper-large)            |                1540M |                  Read-aloud |                                                                         **4.3% ± 0.2%** |                                                                        **10.4% ± 0.3%** |
 | [alexandrainst/roest-wav2vec2-315M-v1](https://huggingface.co/alexandrainst/roest-315m)                      |                 315M |                  Read-aloud |                                                                             6.6% ± 0.2% |                                                                            17.0% ± 0.4% |
 | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2)                                  |                1540M |                  Read-aloud |                                                                             4.7% ± 0.2% |                                                                            11.8% ± 0.3% |
 <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
 ### Table CER scores in % of evaluation across demographics on the CoRal test data
+| Category | roest-whisper-large-v1 | roest-wav2vec2-315m-v1 | roest-wav2vec2-315m-v2 |
+|:---:|:---:|:---:|:---:|
+| female | 5.1 | 7.4 | 7.2 |
+| male | 3.6 | 5.8 | 5.7 |
+| 0-25 | 3.4 | 5.4 | 5.3 |
+| 25-50 | 4.0 | 6.2 | 6.0 |
+| 50+ | 5.0 | 7.5 | 7.4 |
+| Bornholmsk | 3.8 | 6.8 | 6.1 |
+| Fynsk | 5.1 | 7.4 | 7.2 |
+| Københavnsk | 1.9 | 3.3 | 3.2 |
+| Non-native | 4.8 | 7.8 | 7.5 |
+| Nordjysk | 1.6 | 2.6 | 2.8 |
+| Sjællandsk | 3.0 | 4.4 | 4.5 |
+| Sydømål | 4.1 | 6.4 | 6.4 |
+| Sønderjysk | 8.8 | 11.9 | 11.6 |
+| Vestjysk | 6.4 | 10.1 | 9.8 |
+| Østjysk | 2.6 | 4.0 | 4.1 |
+| Overall | 4.3 | 6.6 | 6.5 |
 ### Table WER scores in % of evaluation across demographics on the CoRal test data
+| Category | roest-whisper-large-v1 | roest-wav2vec2-315m-v1 | roest-wav2vec2-315m-v2 |
+|:---:|:---:|:---:|:---:|
+| female | 11.5 | 18.5 | 17.7 |
+| male | 9.4 | 15.5 | 14.9 |
+| 0-25 | 9.0 | 14.7 | 14.0 |
+| 25-50 | 10.1 | 16.6 | 15.8 |
+| 50+ | 11.3 | 18.2 | 17.7 |
+| Bornholmsk | 9.8 | 17.7 | 15.7 |
+| Fynsk | 12.1 | 18.3 | 17.7 |
+| Københavnsk | 5.9 | 10.2 | 10.0 |
+| Non-native | 12.2 | 20.9 | 19.4 |
+| Nordjysk | 4.5 | 7.7 | 7.5 |
+| Sjællandsk | 7.6 | 12.6 | 12.7 |
+| Sydømål | 10.0 | 14.9 | 15.3 |
+| Sønderjysk | 17.5 | 26.0 | 25.4 |
+| Vestjysk | 15.0 | 26.3 | 25.2 |
+| Østjysk | 7.5 | 11.7 | 11.3 |
+| Overall | 10.4 | 17.0 | 16.3 |
 ### Roest-wav2vec2-315M with and without language model
 The inclusion of a post-processing language model can affect the performance significantly. The Roest-v1 and Roest-v2 models are using the same Language Model (LM). The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs)                         | 27.3        | 7.9   | **26.4**    | **7.7**  |
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed                  | 16.6        | 6.3   | **15.6**    | **6.1**  |
+**OBS!** The vocab used for training incudes numerals (0,1,2,..,9), which are translated to text in a post-processing step. If the model misses spaces the numbers are interpreted as one, which expecially affects the NST score as this dataset contains many numerals.
 ---
 ## Training curves

images/cer.png CHANGED Viewed

images/wer.png CHANGED Viewed