CoRal-project
/

roest-wav2vec2-315m-v1

@@ -45,7 +45,7 @@ model-index:
 pipeline_tag: automatic-speech-recognition
 ---
-# Røst-315m
 This is a Danish state-of-the-art speech recognition model, trained by [the Alexandra
 Institute](https://alexandra.dk/).
@@ -65,7 +65,7 @@ Next you can use the model using the `transformers` Python package as follows:
 ```python
 >>> from transformers import pipeline
 >>> audio = get_audio()  # 16kHz raw audio array
->>> transcriber = pipeline(model="alexandrainst/roest-315m")
 >>> transcriber(audio)
 {'text': 'your transcription'}
 ```
@@ -79,9 +79,9 @@ bootstrapped the results 1000 times and report here the mean scores along with a
 confidence interval (lower is better; best scores in **bold**, second-best in
 *italics*):
-| Model | Number of parameters | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) CER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) WER |
 |:---|---:|---:|---:|---:|---:|
-| Røst-315m (this model) | 315M | **6.6% ± 0.2%** | **17.0% ± 0.4%** | 6.6% ± 0.6% | 16.7% ± 0.8% |
 | [chcaa/xls-r-300m-danish-nst-cv9](https://hf.co/chcaa/xls-r-300m-danish-nst-cv9) | 315M | 14.4% ± 0.3% | 36.5% ± 0.6% | **4.1% ± 0.5%** | **12.0% ± 0.8%** |
 | [mhenrichsen/hviske](https://hf.co/mhenrichsen/hviske) | 1540M | 14.2% ± 0.5% | 33.2% ± 0.7% | *5.2% ± 0.4%* | *14.2% ± 0.8%* |
 | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | *11.4% ± 0.3%* | *28.3% ± 0.6%* | *5.5% ± 0.4%* | *14.8% ± 0.8%* |
@@ -127,7 +127,7 @@ This model is the result of four different stages of training:
      both audio and transcriptions to perform the speech-to-text task (also known as
      automatic speech recognition). The finetuning data is as follows:
      - The read-aloud training split of the [CoRal
-       dataset](https://huggingface.co/datasets/alexandrainst/coral) (revision
        fb20199b3966d3373e0d3a5ded2c5920c70de99c), consisting of 361 hours of Danish
        read-aloud speech, diverse across dialects, accents, ages and genders.
   3. An n-gram language model has been trained separately, and is used to guide the
@@ -147,7 +147,7 @@ This model is the result of four different stages of training:
 The first step was trained by [Babu et al.
 (2021)](https://doi.org/10.48550/arXiv.2111.09296) and the second and third step by
-[Nielsen et al. (2024)](https://huggingface.co/alexandrainst/roest-315m).
 The final product is then the combination of the finetuned model along with the n-gram
 model, and this is what is used when you use the model as mentioned in the Quick Start
@@ -160,7 +160,7 @@ This model is intended to be used for Danish automatic speech recognition.
 Note that Biometric Identification is not allowed using the CoRal dataset and/or derived
 models. For more information, see addition 4 in our
-[license](https://huggingface.co/alexandrainst/roest-315m/blob/main/LICENSE).
 ## Why the name Røst?
@@ -175,7 +175,7 @@ Scandinavia](https://da.wikipedia.org/wiki/Koralrev#Koldtvandskoralrev).
 The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows
 commercial use with a few restrictions (speech synthesis and biometric identification).
 See
-[license](https://huggingface.co/alexandrainst/roest-315m/blob/main/LICENSE).
 ## Creators and Funders

 pipeline_tag: automatic-speech-recognition
 ---
+# Røst-Wav2Vec2-315m-v1
 This is a Danish state-of-the-art speech recognition model, trained by [the Alexandra
 Institute](https://alexandra.dk/).
 ```python
 >>> from transformers import pipeline
 >>> audio = get_audio()  # 16kHz raw audio array
+>>> transcriber = pipeline(model="CoRal-project/roest-wav2vec2-315m-v1")
 >>> transcriber(audio)
 {'text': 'your transcription'}
 ```
 confidence interval (lower is better; best scores in **bold**, second-best in
 *italics*):
+| Model | Number of parameters | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) WER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) CER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) WER |
 |:---|---:|---:|---:|---:|---:|
+|CoRal-project/roest-wav2vec2-315m-v1 (this model) | 315M | **6.6% ± 0.2%** | **17.0% ± 0.4%** | 6.6% ± 0.6% | 16.7% ± 0.8% |
 | [chcaa/xls-r-300m-danish-nst-cv9](https://hf.co/chcaa/xls-r-300m-danish-nst-cv9) | 315M | 14.4% ± 0.3% | 36.5% ± 0.6% | **4.1% ± 0.5%** | **12.0% ± 0.8%** |
 | [mhenrichsen/hviske](https://hf.co/mhenrichsen/hviske) | 1540M | 14.2% ± 0.5% | 33.2% ± 0.7% | *5.2% ± 0.4%* | *14.2% ± 0.8%* |
 | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | *11.4% ± 0.3%* | *28.3% ± 0.6%* | *5.5% ± 0.4%* | *14.8% ± 0.8%* |
      both audio and transcriptions to perform the speech-to-text task (also known as
      automatic speech recognition). The finetuning data is as follows:
      - The read-aloud training split of the [CoRal
+       dataset](https://huggingface.co/datasets/CoRal-project/coral) (revision
        fb20199b3966d3373e0d3a5ded2c5920c70de99c), consisting of 361 hours of Danish
        read-aloud speech, diverse across dialects, accents, ages and genders.
   3. An n-gram language model has been trained separately, and is used to guide the
 The first step was trained by [Babu et al.
 (2021)](https://doi.org/10.48550/arXiv.2111.09296) and the second and third step by
+[Nielsen et al. (2024)](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1).
 The final product is then the combination of the finetuned model along with the n-gram
 model, and this is what is used when you use the model as mentioned in the Quick Start
 Note that Biometric Identification is not allowed using the CoRal dataset and/or derived
 models. For more information, see addition 4 in our
+[license](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1/blob/main/LICENSE).
 ## Why the name Røst?
 The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows
 commercial use with a few restrictions (speech synthesis and biometric identification).
 See
+[license](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1/blob/main/LICENSE).
 ## Creators and Funders