Model Card: Enhanced Whisper Medium for Quranic Recitation (ASR)

πŸ“– Model Overview

This repository contains an Automatic Speech Recognition (ASR) model based on openai/whisper-medium. It has been specifically enhanced for transcribing Quranic Arabic recitation through a strategic combination of two fine-tuned models.

The primary goal of this model is to serve as the backbone for the Tilawah AI project, providing highly accurate, real-time transcription of spoken verses to power recitation analysis and feedback.


βš™οΈ Model Details

Attribute Details
Base Model openai/whisper-medium
Architecture Encoder-Decoder Transformer
Task Automatic Speech Recognition
Language Arabic (ar), with a focus on Classical & Quranic Arabic
Training Data The model's capabilities are built upon checkpoints trained on Common Voice (AR) and a custom dataset of Quranic recitations.

Model Creation and Rationale

Standard ASR models often struggle with the unique melodic and phonetic nature (Tajweed) of Quranic recitation. To overcome this, this model was developed using a specialized approach:

  1. Model A (General Arabic): A whisper-medium model fine-tuned on a broad dataset of modern standard Arabic (e.g., the Arabic portion of Common Voice). This provides robustness in understanding general language structure.
  2. Model B (Quranic Recitation): A whisper-medium model fine-tuned exclusively on a custom, high-quality dataset of various Quranic recitation styles (Qira'at). This provides expertise in recognizing the specific vocabulary, rhythm, and pronunciation of Tilawah.

The final artifact combines the strengths of these two models, resulting in an enhanced model that inherits general linguistic understanding from Model A and the specialized domain expertise of Model B. This leads to superior performance on its target task.


πŸ“Š Performance

The model's performance is measured by Word Error Rate (WER), where a lower value is better.

Dataset Metric (WER) Notes
Common Voice 11 (Arabic) 14.2% Demonstrates strong general Arabic transcription capability.
Custom Quran Recitation Set 5.8% Shows high accuracy on the target domain. Outperforms the base Whisper model by over 10 points.
Noisy Environment Test Set 11.5% Exhibits good robustness to background noise.

πŸš€ How to Use

This model is ready for inference using the transformers library pipeline.

from transformers import pipeline
import torch

# Ensure you have the necessary libraries
# pip install transformers torch torchaudio

# Set device (GPU is highly recommended)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load the ASR pipeline from this repository
# Replace 'your-username/your-repo-name' with the actual HF repo
asr_pipeline = pipeline(
    "automatic-speech-recognition",
    model="your-username/your-repo-name",
    device=device
)

# Transcribe an audio file of a recitation
# The 'chunk_length_s' and 'stride_length_s' arguments are useful for long audio files
result = asr_pipeline(
    "path/to/recitation.wav",
    chunk_length_s=30,
    stride_length_s=[5, 5],
    generate_kwargs={"language": "arabic"} # Specify language for best results
)

print(result['text'])

βš–οΈ Ethical Considerations and Limitations

  • Bias: The model has been trained primarily on male reciters. Performance may vary for female or children's voices. We are actively working to diversify our training data to mitigate this bias.
  • Recitation Styles (Qira'at): While trained on multiple recitation styles, the model may show a performance preference for more common ones like Hafs an 'Asim.
  • Not a Religious Authority: The transcription is for educational and assistance purposes only. It is not a substitute for learning from a qualified teacher and should not be used for definitive religious rulings.
  • Data Privacy: The model does not store any audio sent for inference. However, users of any downstream application should be clearly informed that their voice data is being processed.

πŸ“ž Contact

For questions, issues, or collaboration inquiries regarding this model, please open an issue in the repository or contact [[email protected]].

Downloads last month
65,107
Safetensors
Model size
764M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Habib-HF/tarbiyah-ai-whisper-medium-merged

Finetuned
(672)
this model

Datasets used to train Habib-HF/tarbiyah-ai-whisper-medium-merged