carlosdanielhernandezmena commited on
Commit
06a825a
1 Parent(s): 2da805e

Adding info to the README file.

Browse files
Files changed (1) hide show
  1. README.md +204 -0
README.md CHANGED
@@ -1,3 +1,207 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: is
3
+ datasets:
4
+ - language-and-voice-lab/samromur_milljon
5
+ tags:
6
+ - audio
7
+ - automatic-speech-recognition
8
+ - icelandic
9
+ - xlrs-53-icelandic
10
+ - iceland
11
+ - reykjavik
12
+ - samromur
13
  license: cc-by-4.0
14
+ widget:
15
+ model-index:
16
+ - name: wav2vec2-large-xlsr-53-icelandic-ep30-967h
17
+ results:
18
+ - task:
19
+ name: Automatic Speech Recognition
20
+ type: automatic-speech-recognition
21
+ dataset:
22
+ name: Samrómur (Test)
23
+ type: language-and-voice-lab/samromur_asr
24
+ split: test
25
+ args:
26
+ language: is
27
+ metrics:
28
+ - name: WER
29
+ type: wer
30
+ value: 7.698
31
+ - task:
32
+ name: Automatic Speech Recognition
33
+ type: automatic-speech-recognition
34
+ dataset:
35
+ name: Samrómur (Dev)
36
+ type: language-and-voice-lab/samromur_asr
37
+ split: validation
38
+ args:
39
+ language: is
40
+ metrics:
41
+ - name: WER
42
+ type: wer
43
+ value: 6.786
44
+ - task:
45
+ name: Automatic Speech Recognition
46
+ type: automatic-speech-recognition
47
+ dataset:
48
+ name: Samrómur Children (Test)
49
+ type: language-and-voice-lab/samromur_children
50
+ split: test
51
+ args:
52
+ language: is
53
+ metrics:
54
+ - name: WER
55
+ type: wer
56
+ value: 6.467
57
+ - task:
58
+ name: Automatic Speech Recognition
59
+ type: automatic-speech-recognition
60
+ dataset:
61
+ name: Samrómur Children (Dev)
62
+ type: language-and-voice-lab/samromur_children
63
+ split: validation
64
+ args:
65
+ language: is
66
+ metrics:
67
+ - name: WER
68
+ type: wer
69
+ value: 4.234
70
+ - task:
71
+ name: Automatic Speech Recognition
72
+ type: automatic-speech-recognition
73
+ dataset:
74
+ name: Malrómur (Test)
75
+ type: language-and-voice-lab/malromur_asr
76
+ split: test
77
+ args:
78
+ language: is
79
+ metrics:
80
+ - name: WER
81
+ type: wer
82
+ value: 6.631
83
+ - task:
84
+ name: Automatic Speech Recognition
85
+ type: automatic-speech-recognition
86
+ dataset:
87
+ name: Malrómur (Dev)
88
+ type: language-and-voice-lab/malromur_asr
89
+ split: validation
90
+ args:
91
+ language: is
92
+ metrics:
93
+ - name: WER
94
+ type: wer
95
+ value: 5.836
96
+ - task:
97
+ name: Automatic Speech Recognition
98
+ type: automatic-speech-recognition
99
+ dataset:
100
+ name: Althingi (Test)
101
+ type: language-and-voice-lab/althingi_asr
102
+ split: test
103
+ args:
104
+ language: is
105
+ metrics:
106
+ - name: WER
107
+ type: wer
108
+ value: 17.904
109
+ - task:
110
+ name: Automatic Speech Recognition
111
+ type: automatic-speech-recognition
112
+ dataset:
113
+ name: Althingi (Dev)
114
+ type: language-and-voice-lab/althingi_asr
115
+ split: validation
116
+ args:
117
+ language: is
118
+ metrics:
119
+ - name: WER
120
+ type: wer
121
+ value: 17.931
122
  ---
123
+ # wav2vec2-large-xlsr-53-icelandic-ep30-967h
124
+
125
+ The "wav2vec2-large-xlsr-53-icelandic-ep30-967h" is an acoustic model suitable for Automatic Speech Recognition in Icelandic. It is the result of fine-tuning the model [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) for 30 epochs with 967 hours of Icelandic data collected by the [Language and Voice Laboratory](https://huggingface.co/language-and-voice-lab) through the platform [Samrómur](https://samromur.is/).
126
+
127
+ The specific data that was used to fine-tune the model is the corpus [Samrómur Milljón](https://huggingface.co/datasets/language-and-voice-lab/samromur_milljon), which is the result of the automatic verification of 1 million of recordings comming from the corpus ["Samromur Unverified 22.07"](http://hdl.handle.net/20.500.12537/265). It has to be pointed out that this model was trained with different data than our previous model [
128
+ wav2vec2-large-xlsr-53-icelandic-ep10-1000h ](https://huggingface.co/carlosdanielhernandezmena/wav2vec2-large-xlsr-53-icelandic-ep10-1000h).
129
+
130
+ The fine-tuning process was performed during July (2023) in the servers of the Language and Voice Laboratory (https://lvl.ru.is/) at Reykjavík University (Iceland) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).
131
+
132
+ # Evaluation
133
+ ```python
134
+ import torch
135
+ from transformers import Wav2Vec2Processor
136
+ from transformers import Wav2Vec2ForCTC
137
+
138
+ #Load the processor and model.
139
+ MODEL_NAME="language-and-voice-lab/wav2vec2-large-xlsr-53-icelandic-ep30-967h"
140
+ processor = Wav2Vec2Processor.from_pretrained(MODEL_NAME)
141
+ model = Wav2Vec2ForCTC.from_pretrained(MODEL_NAME)
142
+
143
+ #Load the dataset
144
+ from datasets import load_dataset, load_metric, Audio
145
+ ds=load_dataset("language-and-voice-lab/samromur_children", split="test")
146
+
147
+ #Downsample to 16kHz
148
+ ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
149
+
150
+ #Process the dataset
151
+ def prepare_dataset(batch):
152
+ audio = batch["audio"]
153
+ #Batched output is "un-batched" to ensure mapping is correct
154
+ batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
155
+ with processor.as_target_processor():
156
+ batch["labels"] = processor(batch["normalized_text"]).input_ids
157
+ return batch
158
+ ds = ds.map(prepare_dataset, remove_columns=ds.column_names,num_proc=1)
159
+
160
+ #Define the evaluation metric
161
+ import numpy as np
162
+ wer_metric = load_metric("wer")
163
+ def compute_metrics(pred):
164
+ pred_logits = pred.predictions
165
+ pred_ids = np.argmax(pred_logits, axis=-1)
166
+ pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
167
+ pred_str = processor.batch_decode(pred_ids)
168
+ #We do not want to group tokens when computing the metrics
169
+ label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
170
+ wer = wer_metric.compute(predictions=pred_str, references=label_str)
171
+ return {"wer": wer}
172
+
173
+ #Do the evaluation (with batch_size=1)
174
+ model = model.to(torch.device("cuda"))
175
+ def map_to_result(batch):
176
+ with torch.no_grad():
177
+ input_values = torch.tensor(batch["input_values"], device="cuda").unsqueeze(0)
178
+ logits = model(input_values).logits
179
+ pred_ids = torch.argmax(logits, dim=-1)
180
+ batch["pred_str"] = processor.batch_decode(pred_ids)[0]
181
+ batch["sentence"] = processor.decode(batch["labels"], group_tokens=False)
182
+ return batch
183
+ results = ds.map(map_to_result,remove_columns=ds.column_names)
184
+
185
+ #Compute the overall WER now.
186
+ print("Test WER: {:.3f}".format(wer_metric.compute(predictions=results["pred_str"], references=results["sentence"])))
187
+ ```
188
+ **Test Result**: 0.076
189
+
190
+ # BibTeX entry and citation info
191
+ *When publishing results based on these models please refer to:*
192
+ ```bibtex
193
+ @misc{mena2023xlrs53icelandic30ep967h,
194
+ title={Acoustic Model in Icelandic: wav2vec2-large-xlsr-53-icelandic-ep30-967h.},
195
+ author={Hernandez Mena, Carlos Daniel},
196
+ year={2023},
197
+ url={https://huggingface.co/language-and-voice-lab/wav2vec2-large-xlsr-53-icelandic-ep30-967h},
198
+ }
199
+ ```
200
+
201
+ # Acknowledgements
202
+
203
+ Thanks to Jón Guðnason, head of the Language and Voice Lab for providing computational power to make this model possible.
204
+
205
+ We also want to thank to the "Language Technology Programme for Icelandic 2019-2023" which is managed and coordinated by Almannarómur, and it is funded by the Icelandic Ministry of Education, Science and Culture. This model is an unexpected result of all the resources gathered by the Programme.
206
+
207
+ Special thanks to Björn Ingi Stefánsson for setting up the configuration of the server where this model was trained.