Update README.md
Browse files
README.md
CHANGED
@@ -252,7 +252,7 @@ model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
|
|
252 |
|
253 |
#Load the dataset
|
254 |
from datasets import load_dataset, load_metric, Audio
|
255 |
-
ds=load_dataset("projecte-aina/
|
256 |
|
257 |
#Downsample to 16kHz
|
258 |
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
|
@@ -289,7 +289,9 @@ print(WER)
|
|
289 |
|
290 |
### Training data
|
291 |
|
292 |
-
The specific datasets used to create the model are
|
|
|
|
|
293 |
|
294 |
### Training procedure
|
295 |
|
@@ -341,4 +343,4 @@ Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing
|
|
341 |
### Funding
|
342 |
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
343 |
|
344 |
-
The training of the model was possible thanks to the
|
|
|
252 |
|
253 |
#Load the dataset
|
254 |
from datasets import load_dataset, load_metric, Audio
|
255 |
+
ds=load_dataset("projecte-aina/parlament_parla",split='test')
|
256 |
|
257 |
#Downsample to 16kHz
|
258 |
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
|
|
|
289 |
|
290 |
### Training data
|
291 |
|
292 |
+
The specific datasets used to create the model are:
|
293 |
+
- [Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
|
294 |
+
- ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr). (soon to be published)
|
295 |
|
296 |
### Training procedure
|
297 |
|
|
|
343 |
### Funding
|
344 |
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
345 |
|
346 |
+
The training of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.
|