sorenmulli commited on
Commit
948bdc2
·
verified ·
1 Parent(s): 867b442

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -31,9 +31,9 @@ model-index:
31
  ---
32
 
33
  # Røst-wav2vec2-315m-v2
34
- This is a Danish state-of-the-art speech recognition model, trained as part of the CoRal project by [Alvenir](https://www.alvenir.ai/).
35
 
36
- This repository contains a Wav2Vec2 model trained on the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main).
37
  The CoRal-v2 dataset includes a rich variety of Danish conversational and read-aloud data, distributed across diverse age groups, genders, and dialects.
38
  The model is designed for automatic speech recognition (ASR).
39
 
@@ -181,8 +181,8 @@ The model was firstly evaluated on a tentative version of the coral-v2 conversat
181
 
182
  The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
183
  The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
184
- The Whisper model is performing very poorly on the test set. An explanation could be hallucinations during silence and short sentences, a known whisper issue.
185
- Furthermore, both v1 models have not been trained on any conversation data, giving the models an obvious disadvantage.
186
 
187
  | Model | Number of parameters | Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
188
  | :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
@@ -193,6 +193,10 @@ Furthermore, both v1 models have not been trained on any conversation data, givi
193
  | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 78.2% | 72.6% |
194
  | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | - | 46.4 % | 57.4% |
195
 
 
 
 
 
196
 
197
  ### Read-aloud CoRal Performance
198
 
@@ -207,9 +211,6 @@ Furthermore, both v1 models have not been trained on any conversation data, givi
207
 
208
  **OBS!** Benchmark for hviske-v2 has been re-evaluated and the confidence interval is larger than reported in the model card.
209
 
210
- <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-cer.png">
211
-
212
- <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-wer.png">
213
 
214
  <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
215
 
 
31
  ---
32
 
33
  # Røst-wav2vec2-315m-v2
34
+ This is a pre-release of a Danish state-of-the-art speech recognition model, trained as part of the CoRal project by [Alvenir](https://www.alvenir.ai/).
35
 
36
+ This repository contains a Wav2Vec2 model trained on the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main) soon to be released.
37
  The CoRal-v2 dataset includes a rich variety of Danish conversational and read-aloud data, distributed across diverse age groups, genders, and dialects.
38
  The model is designed for automatic speech recognition (ASR).
39
 
 
181
 
182
  The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
183
  The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
184
+
185
+ Note that the high generelization error on conversation data for models trained on read-aloud data is still being analyzed.
186
 
187
  | Model | Number of parameters | Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
188
  | :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
 
193
  | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 78.2% | 72.6% |
194
  | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | - | 46.4 % | 57.4% |
195
 
196
+ <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-cer.png">
197
+
198
+ <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-wer.png">
199
+
200
 
201
  ### Read-aloud CoRal Performance
202
 
 
211
 
212
  **OBS!** Benchmark for hviske-v2 has been re-evaluated and the confidence interval is larger than reported in the model card.
213
 
 
 
 
214
 
215
  <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
216