Model Card for tarbiyah-ai-whisper-medium-peft-tarteel
This repository contains a PEFT (Parameter-Efficient Fine-Tuning) adapter for the openai/whisper-medium
model. This model has been fine-tuned for Automatic Speech Recognition (ASR) of Quranic Arabic, specifically for clear, recited Arabic as found in the Tarteel v11 dataset.
Model Details
Model Description
This is a PEFT-adapter model, which acts as a lightweight "patch" on top of the powerful openai/whisper-medium
base model. It has been specifically trained to improve performance on recognizing and transcribing Quranic recitation. The fine-tuning was performed using modern, memory-efficient techniques including 8-bit quantization and LoRA (Low-Rank Adaptation).
- Developed by: Habib-HF
- Model type:
whisper-medium
with LoRA adapter - Language(s) (NLP): Arabic (ar)
- License: MIT
- Finetuned from model:
openai/whisper-medium
Model Sources
Uses
Direct Use
This model is intended for transcribing audio of Quranic recitation. It can be used directly for generating text from audio files. Given its training, it performs best on recited Arabic that is clearly spoken.
How to Get Started with the Model
Use the code below to get started with the model.
import torch
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from datasets import Audio
# Define model names
base_model_name = "openai/whisper-medium"
adapter_model_name = "Habib-HF/tarbiyah-whisper-medium-peft-tarteel"
# Load the processor
processor = WhisperProcessor.from_pretrained(base_model_name, language="arabic", task="transcribe")
# Load the base model in 8-bit
base_model = WhisperForConditionalGeneration.from_pretrained(
base_model_name,
load_in_8bit=True,
device_map="auto"
)
# Apply the PEFT adapter
peft_model = PeftModel.from_pretrained(base_model, adapter_model_name)
# Load an audio file (replace with your file)
# Example of loading a local file:
# audio_data = Audio(sampling_rate=16000).decode_example({"path": "path/to/your/audio.mp3", "bytes": None})
# speech_array = audio_data["array"]
# sampling_rate = audio_data["sampling_rate"]
# Process audio and transcribe
# input_features = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt").input_features
# input_features = input_features.to(peft_model.device).half()
# forced_decoder_ids = processor.get_decoder_prompt_ids(language="arabic", task="transcribe")
# predicted_ids = peft_model.generate(input_features, forced_decoder_ids=forced_decoder_ids, max_new_tokens=225)
# transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
# print(transcription[0])
- Downloads last month
- 4
Model tree for Habib-HF/tarbiyah-whisper-medium-peft-tarteel
Base model
openai/whisper-medium