CoRal-project
/

roest-wav2vec2-315m-v2

Automatic Speech Recognition

Safetensors

Danish

wav2vec2

Eval Results

Model card Files Files and versions Community

MarieAlvenir commited on Apr 29

Commit

04936be

1 Parent(s): 33b409c

Updated paths to coral dataset

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -137,7 +137,7 @@ Explore the following audio samples along with their transcriptions and accuracy
 ## Model Details
-Wav2Vec2 is a state-of-the-art model architecture for speech recognition, leveraging self-supervised learning from raw audio data. The pre-trained [Wav2Vec2-XLS-R-300M](facebook/wav2vec2-xls-r-300m) has been fine-tuned for automatic speech recognition with the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main) dataset to enhance its performance in recognizing Danish speech with consideration to different dialects. The model was trained for 30K steps using the training setup in the [CoRaL repository](https://github.com/alexandrainst/coral/tree) by running:
 ```
 python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-project/coral-v2 datasets.coral_readaloud_internal.id=CoRal-project/coral-v2
 ```
@@ -164,10 +164,10 @@ The model was evaluated using the following metrics:
 - **Word Error Rate (WER)**: The percentage of words incorrectly transcribed.
 - **Character Error Rate (CER)**: The percentage of characters incorrectly transcribed.
-**OBS!** It should be noted that the [CoRal test dataset](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) does not contain any conversation data, whereas the model is trained for read-aloud and conversation, but is only tested on read-aloud in the [CoRal test dataset](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test).
-|Model                                                                                            | Number of parameters |   Finetuned on data of type | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER |
 | :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
 | [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2) |                 1B | Read-aloud and conversation |                                                                             6.5% ± 0.2% |                                                                            16.4% ± 0.4% |
 | [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) |                 315M | Read-aloud and conversation |                                                                             6.5% ± 0.2% |                                                                            16.3% ± 0.4% |

 ## Model Details
+Wav2Vec2 is a state-of-the-art model architecture for speech recognition, leveraging self-supervised learning from raw audio data. The pre-trained [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) has been fine-tuned for automatic speech recognition with the [CoRal-v2 dataset](https://huggingface.co/datasets/CoRal-project/coral-v2/tree/main) dataset to enhance its performance in recognizing Danish speech with consideration to different dialects. The model was trained for 30K steps using the training setup in the [CoRaL repository](https://github.com/alexandrainst/coral/tree) by running:
 ```
 python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-project/coral-v2 datasets.coral_readaloud_internal.id=CoRal-project/coral-v2
 ```
 - **Word Error Rate (WER)**: The percentage of words incorrectly transcribed.
 - **Character Error Rate (CER)**: The percentage of characters incorrectly transcribed.
+**OBS!** It should be noted that the [CoRal test dataset](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) does not contain any conversation data, whereas the model is trained for read-aloud and conversation, but is only tested on read-aloud in the [CoRal test dataset](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test).
+| Model                                                                                            | Number of parameters |   Finetuned on data of type | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) WER |
 | :----------------------------------------------------------------------------------------------- | -------------------: | --------------------------: | --------------------------------------------------------------------------------------: | --------------------------------------------------------------------------------------: |
 | [CoRal-project/roest-wav2vec2-1B-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-1B-v2) |                 1B | Read-aloud and conversation |                                                                             6.5% ± 0.2% |                                                                            16.4% ± 0.4% |
 | [CoRal-project/roest-wav2vec2-315M-v2](https://huggingface.co/CoRal-project/roest-wav2vec2-315m-v2) |                 315M | Read-aloud and conversation |                                                                             6.5% ± 0.2% |                                                                            16.3% ± 0.4% |