Model Card for tarbiyah-ai-whisper-medium-peft-tarteel

This repository contains a PEFT (Parameter-Efficient Fine-Tuning) adapter for the openai/whisper-medium model. This model has been fine-tuned for Automatic Speech Recognition (ASR) of Quranic Arabic, specifically for clear, recited Arabic as found in the Tarteel v11 dataset.

Model Details

Model Description

This is a PEFT-adapter model, which acts as a lightweight "patch" on top of the powerful openai/whisper-medium base model. It has been specifically trained to improve performance on recognizing and transcribing Quranic recitation. The fine-tuning was performed using modern, memory-efficient techniques including 8-bit quantization and LoRA (Low-Rank Adaptation).

Developed by: Habib-HF
Model type: whisper-medium with LoRA adapter
Language(s) (NLP): Arabic (ar)
License: MIT
Finetuned from model: openai/whisper-medium

Model Sources

Repository: https://huggingface.co/Habib-HF/tarbiyah-whisper-medium-peft-tarteel

Uses

Direct Use

This model is intended for transcribing audio of Quranic recitation. It can be used directly for generating text from audio files. Given its training, it performs best on recited Arabic that is clearly spoken.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from datasets import Audio

# Define model names
base_model_name = "openai/whisper-medium"
adapter_model_name = "Habib-HF/tarbiyah-whisper-medium-peft-tarteel"

# Load the processor
processor = WhisperProcessor.from_pretrained(base_model_name, language="arabic", task="transcribe")

# Load the base model in 8-bit
base_model = WhisperForConditionalGeneration.from_pretrained(
    base_model_name,
    load_in_8bit=True,
    device_map="auto"
)

# Apply the PEFT adapter
peft_model = PeftModel.from_pretrained(base_model, adapter_model_name)

# Load an audio file (replace with your file)
# Example of loading a local file:
# audio_data = Audio(sampling_rate=16000).decode_example({"path": "path/to/your/audio.mp3", "bytes": None})
# speech_array = audio_data["array"]
# sampling_rate = audio_data["sampling_rate"]

# Process audio and transcribe
# input_features = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt").input_features
# input_features = input_features.to(peft_model.device).half()

# forced_decoder_ids = processor.get_decoder_prompt_ids(language="arabic", task="transcribe")
# predicted_ids = peft_model.generate(input_features, forced_decoder_ids=forced_decoder_ids, max_new_tokens=225)
# transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
# print(transcription[0])

Habib-HF
/

tarbiyah-whisper-medium-peft-tarteel