Marathi ASR Model

This is a fine-tuned Wav2Vec2-BERT model for Automatic Speech Recognition (ASR) in Marathi language.

Model Details

  • Model Type: Wav2Vec2-BERT for CTC
  • Language: Marathi
  • Training Dataset: OpenSLR Marathi Dataset
  • Last Updated: April 16, 2025

Usage

from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC
import torchaudio
import torch

# Load model and processor
processor = Wav2Vec2BertProcessor.from_pretrained("hriteshMaikap/marathi-asr-model")
model = Wav2Vec2BertForCTC.from_pretrained("hriteshMaikap/marathi-asr-model")

# Load audio
waveform, sample_rate = torchaudio.load("audio.wav")
# Resample if needed
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(sample_rate, 16000)
    waveform = resampler(waveform)
    sample_rate = 16000
# Convert to mono if needed
if waveform.shape[0] > 1:
    waveform = torch.mean(waveform, dim=0, keepdim=True)
# Convert to numpy
speech_array = waveform.squeeze().numpy()

# Transcribe
inputs = processor(speech_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    logits = model(inputs.input_features).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])

print(transcription)

Step	Training Loss	Validation Loss	Wer
300	0.211100	0.220232	0.183333
600	0.086900	0.172057	0.113889
Downloads last month
32
Safetensors
Model size
606M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train hriteshMaikap/marathi-asr-model

Space using hriteshMaikap/marathi-asr-model 1

Evaluation results

  • Word Error Rate on Marathi OpenSLR Dataset
    self-reported
    Your WER here