CoRal-project
/

roest-wav2vec2-315m-v2

Automatic Speech Recognition

Safetensors

Danish

wav2vec2

Eval Results

Model card Files Files and versions Community

sorenmulli commited on 12 days ago

Commit

948bdc2

verified ·

1 Parent(s): 867b442

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -7

README.md CHANGED Viewed

@@ -31,9 +31,9 @@ model-index:
 ---
 # Røst-wav2vec2-315m-v2
-This is a Danish state-of-the-art speech recognition model, trained as part of the CoRal project by [Alvenir](https://www.alvenir.ai/).
-This repository contains a Wav2Vec2 model trained on the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main).
 The CoRal-v2 dataset includes a rich variety of Danish conversational and read-aloud data, distributed across diverse age groups, genders, and dialects.
 The model is designed for automatic speech recognition (ASR).
@@ -181,8 +181,8 @@ The model was firstly evaluated on a tentative version of the coral-v2 conversat
 The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
 The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
-The Whisper model is performing very poorly on the test set. An explanation could be hallucinations during silence and short sentences, a known whisper issue.
-Furthermore, both v1 models have not been trained on any conversation data, giving the models an obvious disadvantage.
 | Model                                                                                               | Number of parameters |   Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
 | :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
@@ -193,6 +193,10 @@ Furthermore, both v1 models have not been trained on any conversation data, givi
 | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2)                                     |                1540M |                  Read-aloud |                                                                                                         78.2% |                                                                                                         72.6% |
 | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3)                                    |                1540M |                           - |                                                                                                        46.4 % |                                                                                                         57.4% |
 ### Read-aloud CoRal Performance
@@ -207,9 +211,6 @@ Furthermore, both v1 models have not been trained on any conversation data, givi
 **OBS!** Benchmark for hviske-v2 has been re-evaluated and the confidence interval is larger than reported in the model card.
-<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-cer.png">
-<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-wer.png">
 <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">

 ---
 # Røst-wav2vec2-315m-v2
+This is a pre-release of a Danish state-of-the-art speech recognition model, trained as part of the CoRal project by [Alvenir](https://www.alvenir.ai/).
+This repository contains a Wav2Vec2 model trained on the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main) soon to be released.
 The CoRal-v2 dataset includes a rich variety of Danish conversational and read-aloud data, distributed across diverse age groups, genders, and dialects.
 The model is designed for automatic speech recognition (ASR).
 The results are tentative as the test set only includes 5 unique speakers, of which 4 are women.
 The test set includes 2 speakers with 'Fynsk' dialect, 1 with 'Sønderjysk', 1 with 'Non-native' and 1 'Nordjysk'.
+Note that the high generelization error on conversation data for models trained on read-aloud data is still being analyzed.
 | Model                                                                                               | Number of parameters |   Finetuned on data of type | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) CER | [CoRal-v2::conversation](https://huggingface.co/datasets/CoRal-project/coral-v2/viewer/conversation/test) WER |
 | :-------------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | ------------------------------------------------------------------------------------------------------------: | ------------------------------------------------------------------------------------------------------------: |
 | [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2)                                     |                1540M |                  Read-aloud |                                                                                                         78.2% |                                                                                                         72.6% |
 | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3)                                    |                1540M |                           - |                                                                                                        46.4 % |                                                                                                         57.4% |
+<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-cer.png">
+<img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/comparison-conversation-wer.png">
 ### Read-aloud CoRal Performance
 **OBS!** Benchmark for hviske-v2 has been re-evaluated and the confidence interval is larger than reported in the model card.
 <img src="https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2/resolve/main/images/cer.png">