Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Specify Dialect in Multilingual Sets
#42
by
Nvidia-Riva
- opened
Hey guys, the dialect mismatch in the datasets you posted are going to cause some issues down the line. May make life easier to specify them by dialect so the WER performance is more interpretable.
Portuguese
-> Fleurs is from Brazilian dialect.
-> MLS is mix of Brazilian and European. But the texts are going to be biased towards European dialect but pronounciation will be heavily Brazilian.Spanish
-> Fleurs is LA
-> MLS will be biased towards European dialect in writing but LA dialect in pronunciation. (Don Quixote read by someone from Mexico, for instance).