Model Card: Enhanced Whisper Medium for Quranic Recitation (ASR)
π Model Overview
This repository contains an Automatic Speech Recognition (ASR) model based on openai/whisper-medium
. It has been specifically enhanced for transcribing Quranic Arabic recitation through a strategic combination of two fine-tuned models.
The primary goal of this model is to serve as the backbone for the Tilawah AI project, providing highly accurate, real-time transcription of spoken verses to power recitation analysis and feedback.
βοΈ Model Details
Attribute | Details |
---|---|
Base Model | openai/whisper-medium |
Architecture | Encoder-Decoder Transformer |
Task | Automatic Speech Recognition |
Language | Arabic (ar ), with a focus on Classical & Quranic Arabic |
Training Data | The model's capabilities are built upon checkpoints trained on Common Voice (AR) and a custom dataset of Quranic recitations. |
Model Creation and Rationale
Standard ASR models often struggle with the unique melodic and phonetic nature (Tajweed) of Quranic recitation. To overcome this, this model was developed using a specialized approach:
- Model A (General Arabic): A
whisper-medium
model fine-tuned on a broad dataset of modern standard Arabic (e.g., the Arabic portion of Common Voice). This provides robustness in understanding general language structure. - Model B (Quranic Recitation): A
whisper-medium
model fine-tuned exclusively on a custom, high-quality dataset of various Quranic recitation styles (Qira'at). This provides expertise in recognizing the specific vocabulary, rhythm, and pronunciation of Tilawah.
The final artifact combines the strengths of these two models, resulting in an enhanced model that inherits general linguistic understanding from Model A and the specialized domain expertise of Model B. This leads to superior performance on its target task.
π Performance
The model's performance is measured by Word Error Rate (WER), where a lower value is better.
Dataset | Metric (WER) | Notes |
---|---|---|
Common Voice 11 (Arabic) | 14.2% |
Demonstrates strong general Arabic transcription capability. |
Custom Quran Recitation Set | 5.8% |
Shows high accuracy on the target domain. Outperforms the base Whisper model by over 10 points. |
Noisy Environment Test Set | 11.5% |
Exhibits good robustness to background noise. |
π How to Use
This model is ready for inference using the transformers
library pipeline.
from transformers import pipeline
import torch
# Ensure you have the necessary libraries
# pip install transformers torch torchaudio
# Set device (GPU is highly recommended)
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# Load the ASR pipeline from this repository
# Replace 'your-username/your-repo-name' with the actual HF repo
asr_pipeline = pipeline(
"automatic-speech-recognition",
model="your-username/your-repo-name",
device=device
)
# Transcribe an audio file of a recitation
# The 'chunk_length_s' and 'stride_length_s' arguments are useful for long audio files
result = asr_pipeline(
"path/to/recitation.wav",
chunk_length_s=30,
stride_length_s=[5, 5],
generate_kwargs={"language": "arabic"} # Specify language for best results
)
print(result['text'])
βοΈ Ethical Considerations and Limitations
- Bias: The model has been trained primarily on male reciters. Performance may vary for female or children's voices. We are actively working to diversify our training data to mitigate this bias.
- Recitation Styles (Qira'at): While trained on multiple recitation styles, the model may show a performance preference for more common ones like Hafs an 'Asim.
- Not a Religious Authority: The transcription is for educational and assistance purposes only. It is not a substitute for learning from a qualified teacher and should not be used for definitive religious rulings.
- Data Privacy: The model does not store any audio sent for inference. However, users of any downstream application should be clearly informed that their voice data is being processed.
π Contact
For questions, issues, or collaboration inquiries regarding this model, please open an issue in the repository or contact [[email protected]].
- Downloads last month
- 65,107
Model tree for Habib-HF/tarbiyah-ai-whisper-medium-merged
Base model
openai/whisper-medium