ARTPARK-IISc
/

whisper-small-vaani-kannada

Automatic Speech Recognition

Safetensors

Hindi

whisper

Model card Files Files and versions Community

SujithPulikodan commited on Apr 17

Commit

65a803d

verified ·

1 Parent(s): 035caf8

Update README.md

Browse files

Files changed (1) hide show

README.md +25 -40

README.md CHANGED Viewed

@@ -2,60 +2,45 @@
 license: apache-2.0
 datasets:
 - ARTPARK-IISc/Vaani
-- google/fleurs
 language:
-- kn
 base_model:
 - openai/whisper-small
 pipeline_tag: automatic-speech-recognition
 ---
-```Python
-import torch
-from transformers import WhisperForConditionalGeneration, WhisperProcessor, WhisperTokenizer,WhisperFeatureExtractor
-import soundfile as sf
-model="ARTPARK-IISc/whisper-small-vaani-kannada"
-# Load tokenizer and feature extractor individually
-feature_extractor = WhisperFeatureExtractor.from_pretrained(model)
-tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small", language="Kannada", task="transcribe")
-# Create the processor manually
-processor = WhisperProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
-# Load and preprocess the audio file
-audio_file_path = "Sample_Audio.wav"  # replace with your audio file path
-device = "cuda" if torch.cuda.is_available() else "cpu"
-# Load the processor and model
-model = WhisperForConditionalGeneration.from_pretrained(model).to(device)
-# load audio
-audio_data, sample_rate = sf.read(audio_file_path)
-# Ensure the audio is 16kHz (Whisper expects 16kHz audio)
-if sample_rate != 16000:
-    import torchaudio
-    resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
-    audio_data = resampler(torch.tensor(audio_data).unsqueeze(0)).squeeze().numpy()
-# Use the processor to prepare the input features
-input_features = processor(audio_data, sampling_rate=16000, return_tensors="pt").input_features.to(device)
-# Generate transcription (disable gradient calculation during inference)
-with torch.no_grad():
-    predicted_ids = model.generate(input_features)
-# Decode the generated IDs into human-readable text
-transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
-print(transcription)
-```

 license: apache-2.0
 datasets:
 - ARTPARK-IISc/Vaani
 language:
+- hi
 base_model:
 - openai/whisper-small
 pipeline_tag: automatic-speech-recognition
 ---
+# Whisper-small-vaani-kannada
+This is a fine-tuned version of [OpenAI's Whisper-Small](https://huggingface.co/openai/whisper-small), trained on Kannada speech from multiple datasets.
+# Usage
+This can be used with the pipeline function from the Transformers module.
+```python
+import torch
+from transformers import pipeline
+audio = "path to the audio file to be transcribed"
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+modelTags="ARTPARK-IISc/whisper-small-vaani-kannada"
+transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device)
+transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ka", task="transcribe")
+print('Transcription: ', transcribe(audio)["text"])
+```
+# Training and Evaluation
+The models has finetuned using folllowing dataset [Vaani](https://huggingface.co/datasets/ARTPARK-IISc/Vaani) , [Fleurs](https://huggingface.co/datasets/google/fleurs),[IndicTTS](https://huggingface.co/datasets/SPRINGLab/IndicTTS-Hindi)
+The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below.
+| Dataset | WER |
+| :---:   | :---: |
+| Fleurs | 29.16   |
+| IndicTTS | 15.27   |
+| Kathbath | 33.94   |
+| Kathbath Noisy| 38.46  |
+| Vaani  | 69.78  |