|
--- |
|
language: |
|
- ar |
|
metrics: |
|
- wer |
|
base_model: |
|
- openai/whisper-large-v3 |
|
pipeline_tag: automatic-speech-recognition |
|
tags: |
|
- whisper |
|
- arabic |
|
- pytorch |
|
license: apache-2.0 |
|
--- |
|
# WhisperLevantineArabic |
|
|
|
**Fine-tuned Whisper model for the Levantine Dialect (Israeli-Arabic)** |
|
|
|
Thanks to [ivrit.ai](https://github.com/ivrit-ai/ivrit.ai/tree/master) for providing the fine-tuning code scripts! |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of [Whisper Larg v3](https://github.com/openai/whisper) tailored specifically for transcribing Levantine Arabic, focusing on the Israeli dialect. It is designed to improve automatic speech recognition (ASR) performance for this particular variant of Arabic. |
|
|
|
- **Base Model**: Whisper Large V3 |
|
- **Fine-tuned for**: Levantine Arabic (Israeli Dialect) |
|
- **WER on test set**: 33% |
|
|
|
## Training Data |
|
|
|
The dataset used for training and fine-tuning this model consists of approximately 1,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include: |
|
|
|
1. **Self-maintained Collection**: 1,200 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech. |
|
|
|
- **Total Dataset Size**: ~1,200 hours |
|
- **Sampling Rate**: 8kHz - upsampled to 16kHz |
|
- **Annotation**: Human-transcribed and annotated for high accuracy. |
|
|
|
## How to Use |
|
The fine-tuned model was converted using the [faster-whisper](https://github.com/SYSTRAN/faster-whisper) package, enabling inference up to 4× faster than OpenAI's Whisper. |
|
The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results. You can load the model as follows: |
|
|
|
Will save a .vtt file with transcriptions and timestamps in audio_dir: |
|
```python |
|
python transcriber.py --model_path path/to/model --audio_dir path/to/audio --word_timestamps True --vad_filter True |
|
``` |
|
or: |
|
|
|
To visualize printed transcriptions: |
|
```python |
|
pip install faster-whisper |
|
import faster_whisper |
|
import librosa |
|
|
|
model = faster_whisper.WhisperModel("model.bin") |
|
audio_file = 'your audio file.wav' |
|
with torch.no_grad(): |
|
audio_data, sample_rate = librosa.load(audio_file) |
|
audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000) |
|
segments, _ = model.transcribe(audio_data, language='ar') |
|
for segment in segments: |
|
for word in segment.words: |
|
print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word)) |
|
|
|
transcript = ' '.join(s.text for s in segments) |
|
``` |