---
datasets:
- mozilla-foundation/common_voice_17_0
- openslr/openslr
language:
- bn
metrics:
- wer
- cer
base_model:
- facebook/w2v-bert-2.0
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- asr
- bangla
- bangla-asr
- wav2vec-bert
- wav2vec-bert-bangla
license: cc-by-sa-4.0
---

# Model Card for Shrutimala Bangla ASR

## Model Details

### Model Description
This model is a fine-tuned version of `facebook/w2v-bert-2.0` for automatic speech recognition (ASR) in Bangla. The model has been trained on a large Bangla dataset, primarily sourced from Mozilla Common Voice 17.0, Common Voice 20.0, OpenSLR and achieves a Word Error Rate (WER) of 11%.

- **Developed by:** Sazzadul Islam
- **Model type:** Wav2Vec-BERT-based Bangla ASR model
- **Language(s):** Bangla (bn)
- **License:** CC-BY-SA-4.0
- **Fine-tuned from:** `facebook/w2v-bert-2.0`

<!-- ### Model Sources
- **Repository:** [Add Link]
- **Paper [optional]:** [Add Link]
- **Demo:** https://huggingface.co/spaces/sazzadul/Shrutimala_Bangla_ASR
 -->
## Uses

### Direct Use
This model can be used for automatic speech recognition (ASR) in Bangla and English, with applications in transcription, voice assistants, and accessibility tools.

### Downstream Use
It can be further fine-tuned for domain-specific ASR tasks, including medical or legal transcription in Bangla.

### Out-of-Scope Use
- Not suitable for real-time ASR on low-power devices without optimization.
- May not perform well on noisy environments or highly accented regional dialects outside the training data.

## Bias, Risks, and Limitations

- The model may struggle with low-resource dialects and uncommon speech patterns.
- Biases may exist due to dataset imbalances in gender, age, or socio-economic backgrounds.
- Ethical considerations should be taken when using the model for surveillance or sensitive applications.

## How to Get Started with the Model

Use the following code snippet to load the model:

```python
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch

processor = Wav2Vec2Processor.from_pretrained("your_model_id")
model = Wav2Vec2ForCTC.from_pretrained("your_model_id")

# Load and process audio file
audio_input = ...  # Provide audio tensor
inputs = processor(audio_input, return_tensors="pt", sampling_rate=16000)

# Perform ASR
with torch.no_grad():
    logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
print(transcription)
```

## Training Details

### Training Data
The model was trained on the Mozilla Common Voice 17.0, Common Voice 20.0 and OpenSLR dataset for Bangla.

### Training Procedure
#### Preprocessing
- Audio was resampled to 16kHz-8kHz-16kHz.
- Transcripts were normalized to improve ASR performance.

#### Training Hyperparameters
- **Batch Size:** 16
- **Learning Rate:** 1e-5
- **Training Steps:** 25000
- **Mixed Precision:** FP16

#### Training Time and Compute
- **Hardware Used:** RTX 4090
- **Training Time:** 37 Hours
- **Dataset Size:** 143k

## Evaluation

### Testing Data & Metrics
#### Metrics
- **WER:** 11.26%
- **CER:** 2.39

#### Factors
The model was evaluated on:
- Standard Bangla speech
- Various speaker demographics

### Results
- Performs well on clear, standard Bangla speech.
- Struggles with strong regional accents and noisy environments.


## Technical Specifications

### Model Architecture
The model is based on `facebook/w2v-bert-2.0`, a hybrid Wav2Vec2-BERT model for ASR.

<!-- ### Compute Infrastructure
- **Hardware:** [GPU/TPU used]
- **Software:** [Transformers version, PyTorch/TensorFlow version]
 -->

### Citation
This model is based on the research presented in the following paper. If you use this model, please cite the original authors:

```
@misc{ridoy2025adaptabilityasrmodelslowresource,
      title={Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla}, 
      author={Md Sazzadul Islam Ridoy and Sumi Akter and Md. Aminur Rahman},
      year={2025},
      eprint={2507.01931},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.01931}, 
}
```


## Contact
For any issues or inquiries, please contact isazzadul23@gmail.com.