MarieAlvenir commited on
Commit
6766a9b
·
verified ·
1 Parent(s): 33266be

Updated paths to match move to CoRal-project

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -45,7 +45,7 @@ model-index:
45
  pipeline_tag: automatic-speech-recognition
46
  ---
47
 
48
- # Røst-315m
49
 
50
  This is a Danish state-of-the-art speech recognition model, trained by [the Alexandra
51
  Institute](https://alexandra.dk/).
@@ -65,7 +65,7 @@ Next you can use the model using the `transformers` Python package as follows:
65
  ```python
66
  >>> from transformers import pipeline
67
  >>> audio = get_audio() # 16kHz raw audio array
68
- >>> transcriber = pipeline(model="alexandrainst/roest-315m")
69
  >>> transcriber(audio)
70
  {'text': 'your transcription'}
71
  ```
@@ -79,9 +79,9 @@ bootstrapped the results 1000 times and report here the mean scores along with a
79
  confidence interval (lower is better; best scores in **bold**, second-best in
80
  *italics*):
81
 
82
- | Model | Number of parameters | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) CER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) WER |
83
  |:---|---:|---:|---:|---:|---:|
84
- | Røst-315m (this model) | 315M | **6.6% ± 0.2%** | **17.0% ± 0.4%** | 6.6% ± 0.6% | 16.7% ± 0.8% |
85
  | [chcaa/xls-r-300m-danish-nst-cv9](https://hf.co/chcaa/xls-r-300m-danish-nst-cv9) | 315M | 14.4% ± 0.3% | 36.5% ± 0.6% | **4.1% ± 0.5%** | **12.0% ± 0.8%** |
86
  | [mhenrichsen/hviske](https://hf.co/mhenrichsen/hviske) | 1540M | 14.2% ± 0.5% | 33.2% ± 0.7% | *5.2% ± 0.4%* | *14.2% ± 0.8%* |
87
  | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | *11.4% ± 0.3%* | *28.3% ± 0.6%* | *5.5% ± 0.4%* | *14.8% ± 0.8%* |
@@ -127,7 +127,7 @@ This model is the result of four different stages of training:
127
  both audio and transcriptions to perform the speech-to-text task (also known as
128
  automatic speech recognition). The finetuning data is as follows:
129
  - The read-aloud training split of the [CoRal
130
- dataset](https://huggingface.co/datasets/alexandrainst/coral) (revision
131
  fb20199b3966d3373e0d3a5ded2c5920c70de99c), consisting of 361 hours of Danish
132
  read-aloud speech, diverse across dialects, accents, ages and genders.
133
  3. An n-gram language model has been trained separately, and is used to guide the
@@ -147,7 +147,7 @@ This model is the result of four different stages of training:
147
 
148
  The first step was trained by [Babu et al.
149
  (2021)](https://doi.org/10.48550/arXiv.2111.09296) and the second and third step by
150
- [Nielsen et al. (2024)](https://huggingface.co/alexandrainst/roest-315m).
151
 
152
  The final product is then the combination of the finetuned model along with the n-gram
153
  model, and this is what is used when you use the model as mentioned in the Quick Start
@@ -160,7 +160,7 @@ This model is intended to be used for Danish automatic speech recognition.
160
 
161
  Note that Biometric Identification is not allowed using the CoRal dataset and/or derived
162
  models. For more information, see addition 4 in our
163
- [license](https://huggingface.co/alexandrainst/roest-315m/blob/main/LICENSE).
164
 
165
 
166
  ## Why the name Røst?
@@ -175,7 +175,7 @@ Scandinavia](https://da.wikipedia.org/wiki/Koralrev#Koldtvandskoralrev).
175
  The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows
176
  commercial use with a few restrictions (speech synthesis and biometric identification).
177
  See
178
- [license](https://huggingface.co/alexandrainst/roest-315m/blob/main/LICENSE).
179
 
180
 
181
  ## Creators and Funders
 
45
  pipeline_tag: automatic-speech-recognition
46
  ---
47
 
48
+ # Røst-Wav2Vec2-315m-v1
49
 
50
  This is a Danish state-of-the-art speech recognition model, trained by [the Alexandra
51
  Institute](https://alexandra.dk/).
 
65
  ```python
66
  >>> from transformers import pipeline
67
  >>> audio = get_audio() # 16kHz raw audio array
68
+ >>> transcriber = pipeline(model="CoRal-project/roest-wav2vec2-315m-v1")
69
  >>> transcriber(audio)
70
  {'text': 'your transcription'}
71
  ```
 
79
  confidence interval (lower is better; best scores in **bold**, second-best in
80
  *italics*):
81
 
82
+ | Model | Number of parameters | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) WER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) CER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) WER |
83
  |:---|---:|---:|---:|---:|---:|
84
+ |CoRal-project/roest-wav2vec2-315m-v1 (this model) | 315M | **6.6% ± 0.2%** | **17.0% ± 0.4%** | 6.6% ± 0.6% | 16.7% ± 0.8% |
85
  | [chcaa/xls-r-300m-danish-nst-cv9](https://hf.co/chcaa/xls-r-300m-danish-nst-cv9) | 315M | 14.4% ± 0.3% | 36.5% ± 0.6% | **4.1% ± 0.5%** | **12.0% ± 0.8%** |
86
  | [mhenrichsen/hviske](https://hf.co/mhenrichsen/hviske) | 1540M | 14.2% ± 0.5% | 33.2% ± 0.7% | *5.2% ± 0.4%* | *14.2% ± 0.8%* |
87
  | [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | *11.4% ± 0.3%* | *28.3% ± 0.6%* | *5.5% ± 0.4%* | *14.8% ± 0.8%* |
 
127
  both audio and transcriptions to perform the speech-to-text task (also known as
128
  automatic speech recognition). The finetuning data is as follows:
129
  - The read-aloud training split of the [CoRal
130
+ dataset](https://huggingface.co/datasets/CoRal-project/coral) (revision
131
  fb20199b3966d3373e0d3a5ded2c5920c70de99c), consisting of 361 hours of Danish
132
  read-aloud speech, diverse across dialects, accents, ages and genders.
133
  3. An n-gram language model has been trained separately, and is used to guide the
 
147
 
148
  The first step was trained by [Babu et al.
149
  (2021)](https://doi.org/10.48550/arXiv.2111.09296) and the second and third step by
150
+ [Nielsen et al. (2024)](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1).
151
 
152
  The final product is then the combination of the finetuned model along with the n-gram
153
  model, and this is what is used when you use the model as mentioned in the Quick Start
 
160
 
161
  Note that Biometric Identification is not allowed using the CoRal dataset and/or derived
162
  models. For more information, see addition 4 in our
163
+ [license](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1/blob/main/LICENSE).
164
 
165
 
166
  ## Why the name Røst?
 
175
  The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows
176
  commercial use with a few restrictions (speech synthesis and biometric identification).
177
  See
178
+ [license](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1/blob/main/LICENSE).
179
 
180
 
181
  ## Creators and Funders