Commit
·
a5509a5
1
Parent(s):
ef8b96e
Transcription section formatting
Browse files
README.md
CHANGED
@@ -52,73 +52,89 @@ Next you can use the model using the `transformers` Python package as follows:
|
|
52 |
>>> transcriber(audio)
|
53 |
{'text': 'your transcription'}
|
54 |
```
|
|
|
55 |
|
56 |
-
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
### Example 1
|
|
|
|
|
59 |
<audio controls>
|
60 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example1.wav" type="audio/wav">
|
61 |
Your browser does not support the audio tag.
|
62 |
</audio>
|
63 |
|
64 |
-
**
|
|
|
65 |
|
66 |
-
**Transcription:**
|
|
|
67 |
|
68 |
-
**
|
|
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
73 |
|
74 |
-
|
75 |
<audio controls>
|
76 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example2.wav" type="audio/wav">
|
77 |
Your browser does not support the audio tag.
|
78 |
</audio>
|
79 |
|
80 |
-
**
|
|
|
81 |
|
82 |
-
**Transcription:**
|
|
|
83 |
|
84 |
-
**
|
|
|
85 |
|
86 |
-
|
87 |
|
88 |
-
|
89 |
|
90 |
-
|
91 |
<audio controls>
|
92 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example3.wav" type="audio/wav">
|
93 |
Your browser does not support the audio tag.
|
94 |
</audio>
|
95 |
|
96 |
-
**
|
|
|
97 |
|
98 |
-
**Transcription:**
|
|
|
99 |
|
100 |
-
**
|
|
|
101 |
|
102 |
-
|
103 |
|
104 |
-
|
105 |
|
106 |
-
|
107 |
<audio controls>
|
108 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example4.wav" type="audio/wav">
|
109 |
Your browser does not support the audio tag.
|
110 |
</audio>
|
111 |
|
112 |
-
**
|
113 |
-
|
114 |
-
**Transcription:** det er produceret af thomas helme og indspillede i easy sound recording studio i københavn
|
115 |
-
|
116 |
-
**Target transcription:** det er produceret af thomas helmig og indspillet i easy sound recording studio i københavn
|
117 |
|
118 |
-
**
|
|
|
119 |
|
120 |
-
**
|
|
|
121 |
|
|
|
122 |
|
123 |
## Model Details
|
124 |
|
@@ -127,6 +143,9 @@ Wav2Vec2 is a state-of-the-art model architecture for speech recognition, levera
|
|
127 |
python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-dataset/coral-v2 datasets.coral_readaloud_internal.id=CoRal-dataset/coral-v2
|
128 |
```
|
129 |
The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
|
|
|
|
|
|
|
130 |
## Dataset
|
131 |
|
132 |
### [CoRal-v2](https://huggingface.co/datasets/CoRal-dataset/coral-v2/tree/main)
|
@@ -138,6 +157,8 @@ The model is evaluated using a Language Model (LM) as post-processing. The utili
|
|
138 |
### License
|
139 |
Note that the dataset used is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See [license](https://huggingface.co/Alvenir/coral-1-whisper-large/blob/main/LICENSE).
|
140 |
|
|
|
|
|
141 |
## Evaluation
|
142 |
|
143 |
The model was evaluated using the following metrics:
|
@@ -246,9 +267,13 @@ The model was also tested against other datasets to evaluate generalizability:
|
|
246 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
|
247 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
|
248 |
|
|
|
|
|
249 |
## Training curves
|
250 |
<img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/training_plots.png">
|
251 |
|
|
|
|
|
252 |
## Creators and Funders
|
253 |
This model has been trained and the model card written by Marie Juhl Jørgensen and Søren Vejlgaard Holm at [Alvenir](https://www.alvenir.ai/).
|
254 |
|
|
|
52 |
>>> transcriber(audio)
|
53 |
{'text': 'your transcription'}
|
54 |
```
|
55 |
+
Certainly! Here’s a refined version of the transcription examples section, organized for better readability and presentation:
|
56 |
|
57 |
+
---
|
58 |
+
|
59 |
+
## Transcription Examples
|
60 |
+
|
61 |
+
Explore the following audio samples along with their transcriptions and accuracy metrics. Each example showcases the model's performance with different Danish dialects.
|
62 |
|
63 |
+
### Example 1 - Vestjysk Dialect
|
64 |
+
|
65 |
+
**Audio Sample:**
|
66 |
<audio controls>
|
67 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example1.wav" type="audio/wav">
|
68 |
Your browser does not support the audio tag.
|
69 |
</audio>
|
70 |
|
71 |
+
**Model Transcription:**
|
72 |
+
*det blev til yderlig ti mål i den første sæson på trods af en position som back*
|
73 |
|
74 |
+
**Target Transcription:**
|
75 |
+
*det blev til yderligere ti mål i den første sæson på trods af en position som back*
|
76 |
|
77 |
+
- **Character Error Rate (CER):** 3.7%
|
78 |
+
- **Word Error Rate (WER):** 5.9%
|
79 |
|
80 |
+
---
|
81 |
|
82 |
+
### Example 2 - Sønderjysk Dialect
|
83 |
|
84 |
+
**Audio Sample:**
|
85 |
<audio controls>
|
86 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example2.wav" type="audio/wav">
|
87 |
Your browser does not support the audio tag.
|
88 |
</audio>
|
89 |
|
90 |
+
**Model Transcription:**
|
91 |
+
*en arkitektoniske udformning af pladser forslagene iver benzen*
|
92 |
|
93 |
+
**Target Transcription:**
|
94 |
+
*den arkitektoniske udformning af pladsen er forestået af ivar bentsen*
|
95 |
|
96 |
+
- **Character Error Rate (CER):** 20.3%
|
97 |
+
- **Word Error Rate (WER):** 60.0%
|
98 |
|
99 |
+
---
|
100 |
|
101 |
+
### Example 3 - Nordsjællandsk Dialect
|
102 |
|
103 |
+
**Audio Sample:**
|
104 |
<audio controls>
|
105 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example3.wav" type="audio/wav">
|
106 |
Your browser does not support the audio tag.
|
107 |
</audio>
|
108 |
|
109 |
+
**Model Transcription:**
|
110 |
+
*østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission*
|
111 |
|
112 |
+
**Target Transcription:**
|
113 |
+
*østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission*
|
114 |
|
115 |
+
- **Character Error Rate (CER):** 0.0%
|
116 |
+
- **Word Error Rate (WER):** 0.0%
|
117 |
|
118 |
+
---
|
119 |
|
120 |
+
### Example 4 - Lollandsk Dialect
|
121 |
|
122 |
+
**Audio Sample:**
|
123 |
<audio controls>
|
124 |
<source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example4.wav" type="audio/wav">
|
125 |
Your browser does not support the audio tag.
|
126 |
</audio>
|
127 |
|
128 |
+
**Model Transcription:**
|
129 |
+
*det er produceret af thomas helme og indspillede i easy sound recording studio i københavn*
|
|
|
|
|
|
|
130 |
|
131 |
+
**Target Transcription:**
|
132 |
+
*det er produceret af thomas helmig og indspillet i easy sound recording studio i københavn*
|
133 |
|
134 |
+
- **Character Error Rate (CER):** 4.4%
|
135 |
+
- **Word Error Rate (WER):** 13.3%
|
136 |
|
137 |
+
---
|
138 |
|
139 |
## Model Details
|
140 |
|
|
|
143 |
python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-dataset/coral-v2 datasets.coral_readaloud_internal.id=CoRal-dataset/coral-v2
|
144 |
```
|
145 |
The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
|
146 |
+
|
147 |
+
---
|
148 |
+
|
149 |
## Dataset
|
150 |
|
151 |
### [CoRal-v2](https://huggingface.co/datasets/CoRal-dataset/coral-v2/tree/main)
|
|
|
157 |
### License
|
158 |
Note that the dataset used is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See [license](https://huggingface.co/Alvenir/coral-1-whisper-large/blob/main/LICENSE).
|
159 |
|
160 |
+
---
|
161 |
+
|
162 |
## Evaluation
|
163 |
|
164 |
The model was evaluated using the following metrics:
|
|
|
267 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
|
268 |
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
|
269 |
|
270 |
+
---
|
271 |
+
|
272 |
## Training curves
|
273 |
<img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/training_plots.png">
|
274 |
|
275 |
+
---
|
276 |
+
|
277 |
## Creators and Funders
|
278 |
This model has been trained and the model card written by Marie Juhl Jørgensen and Søren Vejlgaard Holm at [Alvenir](https://www.alvenir.ai/).
|
279 |
|