CoRal-project
/

roest-wav2vec2-315m-v2

Automatic Speech Recognition

Safetensors

Danish

wav2vec2

Eval Results

Model card Files Files and versions Community

MarieAlvenir commited on 12 days ago

Commit

8b23e47

1 Parent(s): 30f680f

Small change of reference to this model

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -182,7 +182,7 @@ The model was firstly evaluated on a tentative version of the coral-v2 conversat
 The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
 The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
-Note that the high generelization error on conversation data for models trained on read-aloud data is still being analyzed.
 | Model                                                                                               | Number of parameters |   Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
 | :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
@@ -271,7 +271,7 @@ Note that the high generelization error on conversation data for models trained
 <details>
   <summary>
-    <b>Experiments with Røst-wav2vec2-315M with and without language model</b>
   </summary>
   The inclusion of a post-processing language model can affect the performance significantly.
@@ -283,7 +283,7 @@ Note that the high generelization error on conversation data for models trained
   | [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2)     |                   1B | Read-aloud and conversation |                               Yes |                                                                         **6.5% ± 0.2%** |                                                                         **16.4% ± 0.4%** |
   | [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2)     |                   1B | Read-aloud and conversation |                                No |                                                                             8.1% ± 0.2% |                                                                             23.9% ± 0.4% |
   | CoRal-project/roest-wav2vec2-315M-v2 (This model) |                 315M | Read-aloud and conversation |                               Yes |                                                                         **6.5% ± 0.2%** |                                                                         **16.3% ± 0.4%** |
-  | CoRal-project/roest-wav2vec2-315M-v2 (This model) |                 315M | Read-aloud and conversation |                                No |                                                                             8.2% ± 0.2% |                                                                             25.1% ± 0.4% |
   | [CoRal-project/roest-wav2vec2-315m-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1) |                 315M |                  Read-aloud |                               Yes |                                                                             6.6% ± 0.2% |                                                                             17.0% ± 0.4% |
   | [CoRal-project/roest-wav2vec2-315m-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1) |                 315M |                  Read-aloud |                                No |                                                                             8.6% ± 0.2% |                                                                             26.3% ± 0.5% |

 The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
 The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
+Note that the high generalization error on conversation data for models trained on read-aloud data is still being analyzed.
 | Model                                                                                               | Number of parameters |   Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
 | :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
 <details>
   <summary>
+    <b>Experiments with Røst-wav2vec2 with and without language model</b>
   </summary>
   The inclusion of a post-processing language model can affect the performance significantly.
   | [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2)     |                   1B | Read-aloud and conversation |                               Yes |                                                                         **6.5% ± 0.2%** |                                                                         **16.4% ± 0.4%** |
   | [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2)     |                   1B | Read-aloud and conversation |                                No |                                                                             8.1% ± 0.2% |                                                                             23.9% ± 0.4% |
   | CoRal-project/roest-wav2vec2-315M-v2 (This model) |                 315M | Read-aloud and conversation |                               Yes |                                                                         **6.5% ± 0.2%** |                                                                         **16.3% ± 0.4%** |
+  | CoRal-project/roest-wav2vec2-315M-v2  |                 315M | Read-aloud and conversation |                                No |                                                                             8.2% ± 0.2% |                                                                             25.1% ± 0.4% |
   | [CoRal-project/roest-wav2vec2-315m-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1) |                 315M |                  Read-aloud |                               Yes |                                                                             6.6% ± 0.2% |                                                                             17.0% ± 0.4% |
   | [CoRal-project/roest-wav2vec2-315m-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1) |                 315M |                  Read-aloud |                                No |                                                                             8.6% ± 0.2% |                                                                             26.3% ± 0.5% |