|
--- |
|
library_name: transformers |
|
datasets: |
|
- FBK-MT/Speech-MASSIVE |
|
language: |
|
- pl |
|
metrics: |
|
- wer |
|
- bleu |
|
base_model: |
|
- openai/whisper-tiny |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# Model Card |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is a fine-tuned version of OpenAI's Whisper-Tiny ASR model, |
|
optimized for transcribing Polish voice commands. The fine-tuning process |
|
utilized the MASSIVE Speech dataset to enhance the model's performance |
|
on Polish utterances. The Whisper-Tiny model is a transformer-based |
|
encoder-decoder architecture, pre-trained on 680,000 hours of labeled speech data. |
|
|
|
- **Developed by:** gs224 |
|
- **Language(s) (NLP):** Polish |
|
- **Finetuned from model:** Whisper-tiny |
|
|
|
Link to the training code: https://github.com/gs224/Fine-tuning-Whisper-for-Polish-voice-commands |
|
|
|
## Uses |
|
|
|
The model can be used for automatic transcription of Polish speech-to-text tasks, including voice command recognition. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model may not perform well on languages or domains it was not fine-tuned for, and it is not suitable for sensitive applications requiring very high accuracy. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The fine-tuning was performed on a relatively small subset of Polish voice data |
|
with limited epochs, leading to potential underperformance in certain dialects or accents. |
|
The presence of capital letters and punctuation in the ground-truth transcription |
|
may affect the Word Error Rate (WER) score. |
|
|
|
### Recommendations |
|
|
|
Future improvements could include training on larger datasets, more diverse utterances, |
|
and addressing case sensitivity and punctuation in ground-truth labels. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
https://huggingface.co/datasets/FBK-MT/Speech-MASSIVE-test |
|
|
|
## Evaluation |
|
|
|
Word Error Rate (WER) |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Metrics |
|
|
|
WER, a typical metrics for ASR. |
|
|
|
### Results |
|
|
|
Word Error Rate on the test set: |
|
|
|
| Base model | Fine-tuned model | |
|
|------------|------------------| |
|
| 0.8435 | 0.3176 | |
|
|
|
Example sentences: |
|
|
|
| Reference | Base model | Fine-tuned model | |
|
|-----------|------------|------------------| |
|
| wyślij maila do mojego brata i przypomnij o rocznicy ślubu | wysli myę latą mojego biata i przypamni o nici ślubu | wyślij maila do mojego bryata i przypomnij mi o lepszy ślubu | |
|
| przypomnij mi o jutrzejszym spotkaniu godzinę wcześniej | przypomnij mi o jutrzejszym spotkaniu godzinę wcześniej | przypomnij mi o jutrzejszym spotkaniu godzina wcześniej | |
|
| graj plejlistę boba dylana | gra i play listę boba dylana | graj playlistę boba delana | |
|
| graj ale jazz autorki sanah | grei, al het rust autoorkisana | graj ale jazz autorki sanah | |
|
| olly posłuchajmy sto jeden i trzy f. m. | oli posłuchajmy sto jeden i trzefam | olly posłuchaj we z to jeden i trzy f. m. | |
|
|
|
|
|
|