|
--- |
|
license: apache-2.0 |
|
language: |
|
- ar |
|
base_model: |
|
- tarteel-ai/whisper-base-ar-quran |
|
- openai/whisper-base |
|
tags: |
|
- generated_from_trainer |
|
- islam |
|
- quran |
|
- arabic |
|
- asr |
|
- audio |
|
- whisper |
|
- pytorch |
|
- ctranslate2 |
|
- automatic-speech-recognition |
|
metrics: |
|
- wer |
|
model-index: |
|
- name: faster-whisper-base-ar-quran |
|
results: [] |
|
pipeline_tag: automatic-speech-recognition |
|
library_name: transformers |
|
--- |
|
|
|
***TOC:*** |
|
|
|
- [faster-whisper-base-ar-quran](#faster-whisper-base-ar-quran) |
|
- [Usage Example](#usage-example) |
|
- [Quantization Side Note](#quantization-side-note) |
|
- [CTranslate2 Installation](#ctranslate2-installation) |
|
|
|
--- |
|
|
|
# faster-whisper-base-ar-quran |
|
|
|
[This model](https://huggingface.co/OdyAsh/faster-whisper-base-ar-quran) is a CTranslate2 version of [tarteel-ai/whisper-base-ar-quran](https://huggingface.co/tarteel-ai/whisper-base-ar-quran). The conversion was performed using the following command: |
|
|
|
```bash |
|
# Don't add vocab.json and config.json , as they already get created while converting to new model |
|
ct2-transformers-converter --model tarteel-ai/whisper-base-ar-quran --force --output_dir "path/to/output/dir/faster-whisper-base-ar-quran" --quantization float16 --copy_files added_tokens.json normalizer.json preprocessor_config.json special_tokens_map.json tokenizer_config.json |
|
``` |
|
|
|
To use the `ct2-transformers-converter` command, you'll need to install the required dependencies: |
|
```bash |
|
pip install transformers[torch] |
|
``` |
|
For more information about the converter, see the [CTranslate2 documentation](https://opennmt.net/CTranslate2/guides/transformers.html) or the "CTranslate2 Installation" section below |
|
|
|
This conversion transforms the model from OpenAI's [vanilla Whisper family](https://huggingface.co/openai?search_models=whisper) to the [faster-whisper family](https://huggingface.co/Systran), making it compatible with [WhisperX](https://github.com/m-bain/whisperX) which utilizes faster-whisper versions for improved performance. |
|
* For reference: "WhisperX enables significantly faster transcription speeds - up to 70x realtime with large-v2 models while requiring less than 8GB GPU memory." |
|
|
|
## Usage Example |
|
|
|
Follow the `Python Usage ๐` section in WhisperX's README page [here](https://github.com/m-bain/whisperX/tree/main?tab=readme-ov-file#python-usage-), but change this line: |
|
|
|
```python |
|
model = whisperx.load_model("large-v2", device, compute_type=compute_type) |
|
``` |
|
|
|
to this line: |
|
|
|
```python |
|
model = whisperx.load_model("OdyAsh/faster-whisper-base-ar-quran", device, compute_type=compute_type) |
|
``` |
|
|
|
Another usage example: specific parts of a repo (called surah-splitter) [here](https://github.com/OdyAsh/surah-splitter/blob/8c2889aecb1eb00a856225d680ca08ec8fded8af/src/surah_splitter/app/main_cli.py#L236) and [here](https://github.com/OdyAsh/surah-splitter/blob/8c2889aecb1eb00a856225d680ca08ec8fded8af/src/surah_splitter/quran_toolkit/surah_processor.py#L70). |
|
|
|
## Quantization Side Note |
|
|
|
You'll see that in the command above, we're using `--quantization float16`. However, the original `tarteel-ai/whisper-base-ar-quran` is in float32 precision ([source](https://huggingface.co/tarteel-ai/whisper-base-ar-quran/blob/main/config.json#L37)). Yet, float16 conversion was kept for the following reasons: |
|
* Reduced size: 141mb instead of [290mb](https://huggingface.co/tarteel-ai/whisper-base-ar-quran/blob/main/pytorch_model.bin). |
|
* Negligible performance impact for the usecase in mind: When this model was tested on sample Quran audio files, there were few transcription errors that didn't affect the overall performance of [OdyAsh's whisperx solution](https://github.com/OdyAsh/surah-splitter) which used this model, since the subsequent steps in that solution's pipeline (e.g., alignment, reference <-> input matching DP algorithm, etc.) still yielded accurate results. |
|
|
|
However, if the transcription results are not satisfactory for your use case, you can always convert the model to float32 precision by: |
|
* Changing the `--quantization` argument to `float32` in the command above to get a larger model size (around 290mb). |
|
* Or, during inference runtime, you can set the `compute_type` argument of [whisperx.load_model()](https://github.com/m-bain/whisperX/tree/main?tab=readme-ov-file#python-usage-) to `float32`. |
|
* Read [this](https://github.com/OpenNMT/CTranslate2/blob/master/docs/quantization.md#:~:text=bfloat16-,For%20example,-%2C) and [this](https://www.perplexity.ai/search/search-about-the-ct2-transform-GdKee50NTp.iIOnoHIEuBw) for proof that the model can be recomputed to float32 precision at runtime. |
|
|
|
## CTranslate2 Installation |
|
|
|
In the [OdyAsh/faster-whisper-base-ar-quran](https://github.com/OdyAsh/faster-whisper-base-ar-quran) GitHub repo, you'll see `pyproject.toml` and `uv.lock` files, indicating that you can use `uv` to install the required packages for the `ct2-transformers-converter` command instead of pip (if you want). |
|
|
|
Steps: |
|
|
|
1. Install `uv` if not already installed by following [this](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer) section in their docs. |
|
|
|
2. In your terminal, navigate to your local directory in which you cloned the [OdyAsh/faster-whisper-base-ar-quran](https://github.com/OdyAsh/faster-whisper-base-ar-quran) GitHub repo. |
|
|
|
3. Install the required packages right away (since `uv.lock` file is already present in that local directory): |
|
|
|
```bash |
|
uv install |
|
``` |
|
|
|
4. Verify installation: |
|
|
|
```bash |
|
ct2-transformers-converter --help |
|
``` |
|
|