File size: 1,474 Bytes
cfd99fa
2f1cd91
cfd99fa
 
 
65a803d
cfd99fa
 
fc14946
6a85d56
 
 
65a803d
6a85d56
65a803d
6a85d56
65a803d
 
 
6a85d56
65a803d
 
6a85d56
65a803d
 
 
 
 
6a85d56
65a803d
6a85d56
65a803d
 
6a85d56
65a803d
6a85d56
 
65a803d
6a85d56
65a803d
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: apache-2.0
datasets:
- ARTPARK-IISc/Vaani
language:
- hi
base_model:
- openai/whisper-small
pipeline_tag: automatic-speech-recognition
---


# Whisper-small-vaani-kannada

This is a fine-tuned version of [OpenAI's Whisper-Small](https://huggingface.co/openai/whisper-small), trained on Kannada speech from multiple datasets.

# Usage
This can be used with the pipeline function from the Transformers module.
```python

import torch
from transformers import pipeline

audio = "path to the audio file to be transcribed"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
modelTags="ARTPARK-IISc/whisper-small-vaani-kannada"
transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ka", task="transcribe")

print('Transcription: ', transcribe(audio)["text"])

```
# Training and Evaluation

The models has finetuned using folllowing dataset [Vaani](https://huggingface.co/datasets/ARTPARK-IISc/Vaani) , [Fleurs](https://huggingface.co/datasets/google/fleurs),[IndicTTS](https://huggingface.co/datasets/SPRINGLab/IndicTTS-Hindi)


The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below.

| Dataset | WER | 
| :---:   | :---: | 
| Fleurs | 29.16   | 
| IndicTTS | 15.27   | 
| Kathbath | 33.94   | 
| Kathbath Noisy| 38.46  | 
| Vaani  | 69.78  |