Commit
·
04936be
1
Parent(s):
33b409c
Updated paths to coral dataset
Browse files
README.md
CHANGED
@@ -137,7 +137,7 @@ Explore the following audio samples along with their transcriptions and accuracy
|
|
137 |
|
138 |
## Model Details
|
139 |
|
140 |
-
Wav2Vec2 is a state-of-the-art model architecture for speech recognition, leveraging self-supervised learning from raw audio data. The pre-trained [Wav2Vec2-XLS-R-300M](facebook/wav2vec2-xls-r-300m) has been fine-tuned for automatic speech recognition with the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main) dataset to enhance its performance in recognizing Danish speech with consideration to different dialects. The model was trained for 30K steps using the training setup in the [CoRaL repository](https://github.com/alexandrainst/coral/tree) by running:
|
141 |
```
|
142 |
python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-project/coral-v2 datasets.coral_readaloud_internal.id=CoRal-project/coral-v2
|
143 |
```
|
@@ -164,10 +164,10 @@ The model was evaluated using the following metrics:
|
|
164 |
- **Word Error Rate (WER)**: The percentage of words incorrectly transcribed.
|
165 |
- **Character Error Rate (CER)**: The percentage of characters incorrectly transcribed.
|
166 |
|
167 |
-
**OBS!** It should be noted that the [CoRal test dataset](https://huggingface.co/datasets/
|
168 |
|
169 |
|
170 |
-
|Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/
|
171 |
| :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
172 |
| [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2) | 1B | Read-aloud and conversation | 6.5% ± 0.2% | 16.4% ± 0.4% |
|
173 |
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) | 315M | Read-aloud and conversation | 6.5% ± 0.2% | 16.3% ± 0.4% |
|
|
|
137 |
|
138 |
## Model Details
|
139 |
|
140 |
+
Wav2Vec2 is a state-of-the-art model architecture for speech recognition, leveraging self-supervised learning from raw audio data. The pre-trained [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) has been fine-tuned for automatic speech recognition with the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main) dataset to enhance its performance in recognizing Danish speech with consideration to different dialects. The model was trained for 30K steps using the training setup in the [CoRaL repository](https://github.com/alexandrainst/coral/tree) by running:
|
141 |
```
|
142 |
python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-project/coral-v2 datasets.coral_readaloud_internal.id=CoRal-project/coral-v2
|
143 |
```
|
|
|
164 |
- **Word Error Rate (WER)**: The percentage of words incorrectly transcribed.
|
165 |
- **Character Error Rate (CER)**: The percentage of characters incorrectly transcribed.
|
166 |
|
167 |
+
**OBS!** It should be noted that the [CoRal test dataset](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) does not contain any conversation data, whereas the model is trained for read-aloud and conversation, but is only tested on read-aloud in the [CoRal test dataset](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test).
|
168 |
|
169 |
|
170 |
+
| Model | Number of parameters | Finetuned on data of type | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) WER |
|
171 |
| :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
|
172 |
| [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2) | 1B | Read-aloud and conversation | 6.5% ± 0.2% | 16.4% ± 0.4% |
|
173 |
| [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) | 315M | Read-aloud and conversation | 6.5% ± 0.2% | 16.3% ± 0.4% |
|