|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- ARTPARK-IISc/Vaani |
|
language: |
|
- hi |
|
base_model: |
|
- openai/whisper-small |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
|
|
# Whisper-small-vaani-kannada |
|
|
|
This is a fine-tuned version of [OpenAI's Whisper-Small](https://huggingface.co/openai/whisper-small), trained on Kannada speech from multiple datasets. |
|
|
|
# Usage |
|
This can be used with the pipeline function from the Transformers module. |
|
```python |
|
|
|
import torch |
|
from transformers import pipeline |
|
|
|
audio = "path to the audio file to be transcribed" |
|
device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
modelTags="ARTPARK-IISc/whisper-small-vaani-kannada" |
|
transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device) |
|
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ka", task="transcribe") |
|
|
|
print('Transcription: ', transcribe(audio)["text"]) |
|
|
|
``` |
|
# Training and Evaluation |
|
|
|
The models has finetuned using folllowing dataset [Vaani](https://huggingface.co/datasets/ARTPARK-IISc/Vaani) , [Fleurs](https://huggingface.co/datasets/google/fleurs),[IndicTTS](https://huggingface.co/datasets/SPRINGLab/IndicTTS-Hindi) |
|
|
|
|
|
The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below. |
|
|
|
| Dataset | WER | |
|
| :---: | :---: | |
|
| Fleurs | 29.16 | |
|
| IndicTTS | 15.27 | |
|
| Kathbath | 33.94 | |
|
| Kathbath Noisy| 38.46 | |
|
| Vaani | 69.78 | |
|
|