MarieAlvenir commited on
Commit
a5509a5
·
1 Parent(s): ef8b96e

Transcription section formatting

Browse files
Files changed (1) hide show
  1. README.md +52 -27
README.md CHANGED
@@ -52,73 +52,89 @@ Next you can use the model using the `transformers` Python package as follows:
52
  >>> transcriber(audio)
53
  {'text': 'your transcription'}
54
  ```
 
55
 
56
- ## Transcription examples
 
 
 
 
57
 
58
- ### Example 1
 
 
59
  <audio controls>
60
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example1.wav" type="audio/wav">
61
  Your browser does not support the audio tag.
62
  </audio>
63
 
64
- **Dialect:** Vestjysk
 
65
 
66
- **Transcription:** det blev til yderlig ti mål i den første sæson på trods af en position som back
 
67
 
68
- **Target transcription:** det blev til yderligere ti mål i den første sæson på trods af en position som back
 
69
 
70
- **CER:** 3.7%
71
 
72
- **WER:** 5.9%
73
 
74
- ### Example 2
75
  <audio controls>
76
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example2.wav" type="audio/wav">
77
  Your browser does not support the audio tag.
78
  </audio>
79
 
80
- **Dialect:** Sønderjysk
 
81
 
82
- **Transcription:** en arkitektoniske udformning af pladser forslagene iver benzen
 
83
 
84
- **Target transcription:** den arkitektoniske udformning af pladsen er forestået af ivar bentsen
 
85
 
86
- **CER:** 20.3%
87
 
88
- **WER:** 60.0%
89
 
90
- ### Example 3
91
  <audio controls>
92
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example3.wav" type="audio/wav">
93
  Your browser does not support the audio tag.
94
  </audio>
95
 
96
- **Dialect:** Nordsjællandsk
 
97
 
98
- **Transcription:** østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission
 
99
 
100
- **Target transcription:** østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission
 
101
 
102
- **CER:** 0.0%
103
 
104
- **WER:** 0.0%
105
 
106
- ### Example 4
107
  <audio controls>
108
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example4.wav" type="audio/wav">
109
  Your browser does not support the audio tag.
110
  </audio>
111
 
112
- **Dialect:** Lollandsk
113
-
114
- **Transcription:** det er produceret af thomas helme og indspillede i easy sound recording studio i københavn
115
-
116
- **Target transcription:** det er produceret af thomas helmig og indspillet i easy sound recording studio i københavn
117
 
118
- **CER:** 4.4%
 
119
 
120
- **WER:** 13.3%
 
121
 
 
122
 
123
  ## Model Details
124
 
@@ -127,6 +143,9 @@ Wav2Vec2 is a state-of-the-art model architecture for speech recognition, levera
127
  python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-dataset/coral-v2 datasets.coral_readaloud_internal.id=CoRal-dataset/coral-v2
128
  ```
129
  The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
 
 
 
130
  ## Dataset
131
 
132
  ### [CoRal-v2](https://huggingface.co/datasets/CoRal-dataset/coral-v2/tree/main)
@@ -138,6 +157,8 @@ The model is evaluated using a Language Model (LM) as post-processing. The utili
138
  ### License
139
  Note that the dataset used is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See [license](https://huggingface.co/Alvenir/coral-1-whisper-large/blob/main/LICENSE).
140
 
 
 
141
  ## Evaluation
142
 
143
  The model was evaluated using the following metrics:
@@ -246,9 +267,13 @@ The model was also tested against other datasets to evaluate generalizability:
246
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
247
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
248
 
 
 
249
  ## Training curves
250
  <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/training_plots.png">
251
 
 
 
252
  ## Creators and Funders
253
  This model has been trained and the model card written by Marie Juhl Jørgensen and Søren Vejlgaard Holm at [Alvenir](https://www.alvenir.ai/).
254
 
 
52
  >>> transcriber(audio)
53
  {'text': 'your transcription'}
54
  ```
55
+ Certainly! Here’s a refined version of the transcription examples section, organized for better readability and presentation:
56
 
57
+ ---
58
+
59
+ ## Transcription Examples
60
+
61
+ Explore the following audio samples along with their transcriptions and accuracy metrics. Each example showcases the model's performance with different Danish dialects.
62
 
63
+ ### Example 1 - Vestjysk Dialect
64
+
65
+ **Audio Sample:**
66
  <audio controls>
67
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example1.wav" type="audio/wav">
68
  Your browser does not support the audio tag.
69
  </audio>
70
 
71
+ **Model Transcription:**
72
+ *det blev til yderlig ti mål i den første sæson på trods af en position som back*
73
 
74
+ **Target Transcription:**
75
+ *det blev til yderligere ti mål i den første sæson på trods af en position som back*
76
 
77
+ - **Character Error Rate (CER):** 3.7%
78
+ - **Word Error Rate (WER):** 5.9%
79
 
80
+ ---
81
 
82
+ ### Example 2 - Sønderjysk Dialect
83
 
84
+ **Audio Sample:**
85
  <audio controls>
86
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example2.wav" type="audio/wav">
87
  Your browser does not support the audio tag.
88
  </audio>
89
 
90
+ **Model Transcription:**
91
+ *en arkitektoniske udformning af pladser forslagene iver benzen*
92
 
93
+ **Target Transcription:**
94
+ *den arkitektoniske udformning af pladsen er forestået af ivar bentsen*
95
 
96
+ - **Character Error Rate (CER):** 20.3%
97
+ - **Word Error Rate (WER):** 60.0%
98
 
99
+ ---
100
 
101
+ ### Example 3 - Nordsjællandsk Dialect
102
 
103
+ **Audio Sample:**
104
  <audio controls>
105
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example3.wav" type="audio/wav">
106
  Your browser does not support the audio tag.
107
  </audio>
108
 
109
+ **Model Transcription:**
110
+ *østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission*
111
 
112
+ **Target Transcription:**
113
+ *østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission*
114
 
115
+ - **Character Error Rate (CER):** 0.0%
116
+ - **Word Error Rate (WER):** 0.0%
117
 
118
+ ---
119
 
120
+ ### Example 4 - Lollandsk Dialect
121
 
122
+ **Audio Sample:**
123
  <audio controls>
124
  <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example4.wav" type="audio/wav">
125
  Your browser does not support the audio tag.
126
  </audio>
127
 
128
+ **Model Transcription:**
129
+ *det er produceret af thomas helme og indspillede i easy sound recording studio i københavn*
 
 
 
130
 
131
+ **Target Transcription:**
132
+ *det er produceret af thomas helmig og indspillet i easy sound recording studio i københavn*
133
 
134
+ - **Character Error Rate (CER):** 4.4%
135
+ - **Word Error Rate (WER):** 13.3%
136
 
137
+ ---
138
 
139
  ## Model Details
140
 
 
143
  python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-dataset/coral-v2 datasets.coral_readaloud_internal.id=CoRal-dataset/coral-v2
144
  ```
145
  The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
146
+
147
+ ---
148
+
149
  ## Dataset
150
 
151
  ### [CoRal-v2](https://huggingface.co/datasets/CoRal-dataset/coral-v2/tree/main)
 
157
  ### License
158
  Note that the dataset used is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See [license](https://huggingface.co/Alvenir/coral-1-whisper-large/blob/main/LICENSE).
159
 
160
+ ---
161
+
162
  ## Evaluation
163
 
164
  The model was evaluated using the following metrics:
 
267
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | 27.3 | 7.9 | **26.4** | **7.7** |
268
  | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed | 16.6 | 6.3 | **15.6** | **6.1** |
269
 
270
+ ---
271
+
272
  ## Training curves
273
  <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/training_plots.png">
274
 
275
+ ---
276
+
277
  ## Creators and Funders
278
  This model has been trained and the model card written by Marie Juhl Jørgensen and Søren Vejlgaard Holm at [Alvenir](https://www.alvenir.ai/).
279