Updated paths to match move to CoRal-project
Browse files
README.md
CHANGED
@@ -45,7 +45,7 @@ model-index:
|
|
45 |
pipeline_tag: automatic-speech-recognition
|
46 |
---
|
47 |
|
48 |
-
# Røst-315m
|
49 |
|
50 |
This is a Danish state-of-the-art speech recognition model, trained by [the Alexandra
|
51 |
Institute](https://alexandra.dk/).
|
@@ -65,7 +65,7 @@ Next you can use the model using the `transformers` Python package as follows:
|
|
65 |
```python
|
66 |
>>> from transformers import pipeline
|
67 |
>>> audio = get_audio() # 16kHz raw audio array
|
68 |
-
>>> transcriber = pipeline(model="
|
69 |
>>> transcriber(audio)
|
70 |
{'text': 'your transcription'}
|
71 |
```
|
@@ -79,9 +79,9 @@ bootstrapped the results 1000 times and report here the mean scores along with a
|
|
79 |
confidence interval (lower is better; best scores in **bold**, second-best in
|
80 |
*italics*):
|
81 |
|
82 |
-
| Model | Number of parameters | [CoRal](https://huggingface.co/datasets/
|
83 |
|:---|---:|---:|---:|---:|---:|
|
84 |
-
|
|
85 |
| [chcaa/xls-r-300m-danish-nst-cv9](https://hf.co/chcaa/xls-r-300m-danish-nst-cv9) | 315M | 14.4% ± 0.3% | 36.5% ± 0.6% | **4.1% ± 0.5%** | **12.0% ± 0.8%** |
|
86 |
| [mhenrichsen/hviske](https://hf.co/mhenrichsen/hviske) | 1540M | 14.2% ± 0.5% | 33.2% ± 0.7% | *5.2% ± 0.4%* | *14.2% ± 0.8%* |
|
87 |
| [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | *11.4% ± 0.3%* | *28.3% ± 0.6%* | *5.5% ± 0.4%* | *14.8% ± 0.8%* |
|
@@ -127,7 +127,7 @@ This model is the result of four different stages of training:
|
|
127 |
both audio and transcriptions to perform the speech-to-text task (also known as
|
128 |
automatic speech recognition). The finetuning data is as follows:
|
129 |
- The read-aloud training split of the [CoRal
|
130 |
-
dataset](https://huggingface.co/datasets/
|
131 |
fb20199b3966d3373e0d3a5ded2c5920c70de99c), consisting of 361 hours of Danish
|
132 |
read-aloud speech, diverse across dialects, accents, ages and genders.
|
133 |
3. An n-gram language model has been trained separately, and is used to guide the
|
@@ -147,7 +147,7 @@ This model is the result of four different stages of training:
|
|
147 |
|
148 |
The first step was trained by [Babu et al.
|
149 |
(2021)](https://doi.org/10.48550/arXiv.2111.09296) and the second and third step by
|
150 |
-
[Nielsen et al. (2024)](https://huggingface.co/
|
151 |
|
152 |
The final product is then the combination of the finetuned model along with the n-gram
|
153 |
model, and this is what is used when you use the model as mentioned in the Quick Start
|
@@ -160,7 +160,7 @@ This model is intended to be used for Danish automatic speech recognition.
|
|
160 |
|
161 |
Note that Biometric Identification is not allowed using the CoRal dataset and/or derived
|
162 |
models. For more information, see addition 4 in our
|
163 |
-
[license](https://huggingface.co/
|
164 |
|
165 |
|
166 |
## Why the name Røst?
|
@@ -175,7 +175,7 @@ Scandinavia](https://da.wikipedia.org/wiki/Koralrev#Koldtvandskoralrev).
|
|
175 |
The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows
|
176 |
commercial use with a few restrictions (speech synthesis and biometric identification).
|
177 |
See
|
178 |
-
[license](https://huggingface.co/
|
179 |
|
180 |
|
181 |
## Creators and Funders
|
|
|
45 |
pipeline_tag: automatic-speech-recognition
|
46 |
---
|
47 |
|
48 |
+
# Røst-Wav2Vec2-315m-v1
|
49 |
|
50 |
This is a Danish state-of-the-art speech recognition model, trained by [the Alexandra
|
51 |
Institute](https://alexandra.dk/).
|
|
|
65 |
```python
|
66 |
>>> from transformers import pipeline
|
67 |
>>> audio = get_audio() # 16kHz raw audio array
|
68 |
+
>>> transcriber = pipeline(model="CoRal-project/roest-wav2vec2-315m-v1")
|
69 |
>>> transcriber(audio)
|
70 |
{'text': 'your transcription'}
|
71 |
```
|
|
|
79 |
confidence interval (lower is better; best scores in **bold**, second-best in
|
80 |
*italics*):
|
81 |
|
82 |
+
| Model | Number of parameters | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) WER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) CER | [Danish Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/viewer/da/test) WER |
|
83 |
|:---|---:|---:|---:|---:|---:|
|
84 |
+
|CoRal-project/roest-wav2vec2-315m-v1 (this model) | 315M | **6.6% ± 0.2%** | **17.0% ± 0.4%** | 6.6% ± 0.6% | 16.7% ± 0.8% |
|
85 |
| [chcaa/xls-r-300m-danish-nst-cv9](https://hf.co/chcaa/xls-r-300m-danish-nst-cv9) | 315M | 14.4% ± 0.3% | 36.5% ± 0.6% | **4.1% ± 0.5%** | **12.0% ± 0.8%** |
|
86 |
| [mhenrichsen/hviske](https://hf.co/mhenrichsen/hviske) | 1540M | 14.2% ± 0.5% | 33.2% ± 0.7% | *5.2% ± 0.4%* | *14.2% ± 0.8%* |
|
87 |
| [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | *11.4% ± 0.3%* | *28.3% ± 0.6%* | *5.5% ± 0.4%* | *14.8% ± 0.8%* |
|
|
|
127 |
both audio and transcriptions to perform the speech-to-text task (also known as
|
128 |
automatic speech recognition). The finetuning data is as follows:
|
129 |
- The read-aloud training split of the [CoRal
|
130 |
+
dataset](https://huggingface.co/datasets/CoRal-project/coral) (revision
|
131 |
fb20199b3966d3373e0d3a5ded2c5920c70de99c), consisting of 361 hours of Danish
|
132 |
read-aloud speech, diverse across dialects, accents, ages and genders.
|
133 |
3. An n-gram language model has been trained separately, and is used to guide the
|
|
|
147 |
|
148 |
The first step was trained by [Babu et al.
|
149 |
(2021)](https://doi.org/10.48550/arXiv.2111.09296) and the second and third step by
|
150 |
+
[Nielsen et al. (2024)](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1).
|
151 |
|
152 |
The final product is then the combination of the finetuned model along with the n-gram
|
153 |
model, and this is what is used when you use the model as mentioned in the Quick Start
|
|
|
160 |
|
161 |
Note that Biometric Identification is not allowed using the CoRal dataset and/or derived
|
162 |
models. For more information, see addition 4 in our
|
163 |
+
[license](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1/blob/main/LICENSE).
|
164 |
|
165 |
|
166 |
## Why the name Røst?
|
|
|
175 |
The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows
|
176 |
commercial use with a few restrictions (speech synthesis and biometric identification).
|
177 |
See
|
178 |
+
[license](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1/blob/main/LICENSE).
|
179 |
|
180 |
|
181 |
## Creators and Funders
|