Update README.md
Browse files
README.md
CHANGED
@@ -31,9 +31,9 @@ model-index:
|
|
31 |
---
|
32 |
|
33 |
# Røst-wav2vec2-315m-v2
|
34 |
-
This is a Danish state-of-the-art speech recognition model, trained as part of the CoRal project by [Alvenir](https://www.alvenir.ai/).
|
35 |
|
36 |
-
This repository contains a Wav2Vec2 model trained on the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main).
|
37 |
The CoRal-v2 dataset includes a rich variety of Danish conversational and read-aloud data, distributed across diverse age groups, genders, and dialects.
|
38 |
The model is designed for automatic speech recognition (ASR).
|
39 |
|
@@ -181,8 +181,8 @@ The model was firstly evaluated on a tentative version of the coral-v2 conversat
|
|
181 |
|
182 |
The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
|
183 |
The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
|
184 |
-
|
185 |
-
|
186 |
|
187 |
| Model | Number of parameters | Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
|
188 |
| :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
|
@@ -193,6 +193,10 @@ Furthermore, both v1 models have not been trained on any conversation data, givi
|
|
193 |
| [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 78.2% | 72.6% |
|
194 |
| [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | - | 46.4 % | 57.4% |
|
195 |
|
|
|
|
|
|
|
|
|
196 |
|
197 |
### Read-aloud CoRal Performance
|
198 |
|
@@ -207,9 +211,6 @@ Furthermore, both v1 models have not been trained on any conversation data, givi
|
|
207 |
|
208 |
**OBS!** Benchmark for hviske-v2 has been re-evaluated and the confidence interval is larger than reported in the model card.
|
209 |
|
210 |
-
<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-cer.png">
|
211 |
-
|
212 |
-
<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-wer.png">
|
213 |
|
214 |
<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
|
215 |
|
|
|
31 |
---
|
32 |
|
33 |
# Røst-wav2vec2-315m-v2
|
34 |
+
This is a pre-release of a Danish state-of-the-art speech recognition model, trained as part of the CoRal project by [Alvenir](https://www.alvenir.ai/).
|
35 |
|
36 |
+
This repository contains a Wav2Vec2 model trained on the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main) soon to be released.
|
37 |
The CoRal-v2 dataset includes a rich variety of Danish conversational and read-aloud data, distributed across diverse age groups, genders, and dialects.
|
38 |
The model is designed for automatic speech recognition (ASR).
|
39 |
|
|
|
181 |
|
182 |
The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
|
183 |
The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
|
184 |
+
|
185 |
+
Note that the high generelization error on conversation data for models trained on read-aloud data is still being analyzed.
|
186 |
|
187 |
| Model | Number of parameters | Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
|
188 |
| :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
|
|
|
193 |
| [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 78.2% | 72.6% |
|
194 |
| [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | - | 46.4 % | 57.4% |
|
195 |
|
196 |
+
<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-cer.png">
|
197 |
+
|
198 |
+
<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-wer.png">
|
199 |
+
|
200 |
|
201 |
### Read-aloud CoRal Performance
|
202 |
|
|
|
211 |
|
212 |
**OBS!** Benchmark for hviske-v2 has been re-evaluated and the confidence interval is larger than reported in the model card.
|
213 |
|
|
|
|
|
|
|
214 |
|
215 |
<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">
|
216 |
|