Commit
·
c32fc04
1
Parent(s):
a4da82e
Paths updated for roest-wav2vec2-315m-v1
Browse files
README.md
CHANGED
@@ -141,7 +141,7 @@ Wav2Vec2 is a state-of-the-art model architecture for speech recognition, levera
|
|
141 |
```
|
142 |
python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-project/coral-v2 datasets.coral_readaloud_internal.id=CoRal-project/coral-v2
|
143 |
```
|
144 |
-
The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [
|
145 |
|
146 |
---
|
147 |
|
@@ -169,9 +169,9 @@ The model was evaluated using the following metrics:
|
|
169 |
|
170 |
| Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
|
171 |
| :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
172 |
-
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-
|
173 |
-
| [
|
174 |
-
| [
|
175 |
| [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 4.7% ± 0.2% | 11.8% ± 0.3% |
|
176 |
| [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | - | 11.4% ± 0.3% | 28.3% ± 0.6% |
|
177 |
|
@@ -231,8 +231,8 @@ The inclusion of a post-processing language model can affect the performance sig
|
|
231 |
| :-------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
232 |
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) | 315M | Read-aloud and conversation | Yes | **6.5% ± 0.2%** | **16.3% ± 0.4%** |
|
233 |
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) | 315M | Read-aloud and conversation | No | 8.2% ± 0.2% | 25.1% ± 0.4% |
|
234 |
-
| [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/
|
235 |
-
| [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/
|
236 |
|
237 |
### Detailed Roest-wav2vec2-315M with and without language model on different dialects
|
238 |
Here are the results of the model on different danish dialects in the test set:
|
|
|
141 |
```
|
142 |
python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-project/coral-v2 datasets.coral_readaloud_internal.id=CoRal-project/coral-v2
|
143 |
```
|
144 |
+
The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [CoRal-project/roest-wav2vec2-315m-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2).
|
145 |
|
146 |
---
|
147 |
|
|
|
169 |
|
170 |
| Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
|
171 |
| :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
172 |
+
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) | 315M | Read-aloud and conversation | 6.5% ± 0.2% | 16.3% ± 0.4% |
|
173 |
+
| [CoRal-project/roest-whisper-large-v1](https://huggingface.co/CoRal-project/roest-whisper-large) | 1540M | Read-aloud | **4.3% ± 0.2%** | **10.4% ± 0.3%** |
|
174 |
+
| [CoRal-project/roest-wav2vec2-315M-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1) | 315M | Read-aloud | 6.6% ± 0.2% | 17.0% ± 0.4% |
|
175 |
| [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | Read-aloud | 4.7% ± 0.2% | 11.8% ± 0.3% |
|
176 |
| [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | - | 11.4% ± 0.3% | 28.3% ± 0.6% |
|
177 |
|
|
|
231 |
| :-------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
232 |
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) | 315M | Read-aloud and conversation | Yes | **6.5% ± 0.2%** | **16.3% ± 0.4%** |
|
233 |
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) | 315M | Read-aloud and conversation | No | 8.2% ± 0.2% | 25.1% ± 0.4% |
|
234 |
+
| [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1) | 315M | Read-aloud | Yes | 6.6% ± 0.2% | 17.0% ± 0.4% |
|
235 |
+
| [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v1) | 315M | Read-aloud | No | 8.6% ± 0.2% | 26.3% ± 0.5% |
|
236 |
|
237 |
### Detailed Roest-wav2vec2-315M with and without language model on different dialects
|
238 |
Here are the results of the model on different danish dialects in the test set:
|