aerospike2010

Upload 9 files

9d94826 verified 14 days ago

5.46 kB

	---
	license: apache-2.0
	language:
	- ar
	base_model:
	- tarteel-ai/whisper-base-ar-quran
	- openai/whisper-base
	tags:
	- generated_from_trainer
	- islam
	- quran
	- arabic
	- asr
	- audio
	- whisper
	- pytorch
	- ctranslate2
	- automatic-speech-recognition
	metrics:
	- wer
	model-index:
	- name: faster-whisper-base-ar-quran
	results: []
	pipeline_tag: automatic-speech-recognition
	library_name: transformers
	---

	*TOC:*

	- [faster-whisper-base-ar-quran](#faster-whisper-base-ar-quran)
	- [Usage Example](#usage-example)
	- [Quantization Side Note](#quantization-side-note)
	- [CTranslate2 Installation](#ctranslate2-installation)

	---

	# faster-whisper-base-ar-quran

	[This model](https://huggingface.co/OdyAsh/faster-whisper-base-ar-quran) is a CTranslate2 version of [tarteel-ai/whisper-base-ar-quran](https://huggingface.co/tarteel-ai/whisper-base-ar-quran). The conversion was performed using the following command:

	```bash
	# Don't add vocab.json and config.json , as they already get created while converting to new model
	ct2-transformers-converter --model tarteel-ai/whisper-base-ar-quran --force --output_dir "path/to/output/dir/faster-whisper-base-ar-quran" --quantization float16 --copy_files added_tokens.json normalizer.json preprocessor_config.json special_tokens_map.json tokenizer_config.json
	```

	To use the `ct2-transformers-converter` command, you'll need to install the required dependencies:
	```bash
	pip install transformers[torch]
	```
	For more information about the converter, see the [CTranslate2 documentation](https://opennmt.net/CTranslate2/guides/transformers.html) or the "CTranslate2 Installation" section below

	This conversion transforms the model from OpenAI's [vanilla Whisper family](https://huggingface.co/openai?search_models=whisper) to the [faster-whisper family](https://huggingface.co/Systran), making it compatible with [WhisperX](https://github.com/m-bain/whisperX) which utilizes faster-whisper versions for improved performance.
	* For reference: "WhisperX enables significantly faster transcription speeds - up to 70x realtime with large-v2 models while requiring less than 8GB GPU memory."

	## Usage Example

	Follow the `Python Usage 🐍` section in WhisperX's README page [here](https://github.com/m-bain/whisperX/tree/main?tab=readme-ov-file#python-usage-), but change this line:

	```python
	model = whisperx.load_model("large-v2", device, compute_type=compute_type)
	```

	to this line:

	```python
	model = whisperx.load_model("OdyAsh/faster-whisper-base-ar-quran", device, compute_type=compute_type)
	```

	Another usage example: specific parts of a repo (called surah-splitter) [here](https://github.com/OdyAsh/surah-splitter/blob/8c2889aecb1eb00a856225d680ca08ec8fded8af/src/surah_splitter/app/main_cli.py#L236) and [here](https://github.com/OdyAsh/surah-splitter/blob/8c2889aecb1eb00a856225d680ca08ec8fded8af/src/surah_splitter/quran_toolkit/surah_processor.py#L70).

	## Quantization Side Note

	You'll see that in the command above, we're using `--quantization float16`. However, the original `tarteel-ai/whisper-base-ar-quran` is in float32 precision ([source](https://huggingface.co/tarteel-ai/whisper-base-ar-quran/blob/main/config.json#L37)). Yet, float16 conversion was kept for the following reasons:
	* Reduced size: 141mb instead of [290mb](https://huggingface.co/tarteel-ai/whisper-base-ar-quran/blob/main/pytorch_model.bin).
	* Negligible performance impact for the usecase in mind: When this model was tested on sample Quran audio files, there were few transcription errors that didn't affect the overall performance of [OdyAsh's whisperx solution](https://github.com/OdyAsh/surah-splitter) which used this model, since the subsequent steps in that solution's pipeline (e.g., alignment, reference <-> input matching DP algorithm, etc.) still yielded accurate results.

	However, if the transcription results are not satisfactory for your use case, you can always convert the model to float32 precision by:
	* Changing the `--quantization` argument to `float32` in the command above to get a larger model size (around 290mb).
	* Or, during inference runtime, you can set the `compute_type` argument of [whisperx.load_model()](https://github.com/m-bain/whisperX/tree/main?tab=readme-ov-file#python-usage-) to `float32`.
	* Read [this](https://github.com/OpenNMT/CTranslate2/blob/master/docs/quantization.md#:~:text=bfloat16-,For%20example,-%2C) and [this](https://www.perplexity.ai/search/search-about-the-ct2-transform-GdKee50NTp.iIOnoHIEuBw) for proof that the model can be recomputed to float32 precision at runtime.

	## CTranslate2 Installation

	In the [OdyAsh/faster-whisper-base-ar-quran](https://github.com/OdyAsh/faster-whisper-base-ar-quran) GitHub repo, you'll see `pyproject.toml` and `uv.lock` files, indicating that you can use `uv` to install the required packages for the `ct2-transformers-converter` command instead of pip (if you want).

	Steps:

	1. Install `uv` if not already installed by following [this](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer) section in their docs.

	2. In your terminal, navigate to your local directory in which you cloned the [OdyAsh/faster-whisper-base-ar-quran](https://github.com/OdyAsh/faster-whisper-base-ar-quran) GitHub repo.

	3. Install the required packages right away (since `uv.lock` file is already present in that local directory):

	```bash
	uv install
	```

	4. Verify installation:

	```bash
	ct2-transformers-converter --help
	```