|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- multilingual |
|
pipeline_tag: automatic-speech-recognition |
|
library_name: pruna |
|
tags: |
|
- pruna |
|
- whisper |
|
- speech-recognition |
|
base_model: |
|
- unsloth/whisper-large-v3-turbo |
|
--- |
|
|
|
# Whisper Large V3 Turbo - Pruna Smashed |
|
|
|
**Pruna-optimized version of Whisper Large V3 Turbo.** |
|
Compressed with `c_whisper` compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality. |
|
|
|
--- |
|
|
|
## π Usage |
|
|
|
**Best performance (Pruna runtime):** |
|
```python |
|
from pruna import PrunaModel |
|
|
|
model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed") |
|
result = model("audio.wav") |
|
```` |
|
|
|
**Standard Transformers:** |
|
|
|
```python |
|
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor |
|
|
|
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed") |
|
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed") |
|
``` |
|
|
|
β
Tested on Google Colab T4 GPU |
|
|
|
--- |
|
|
|
## π Evaluation Results |
|
|
|
**Dataset:** `librispeech_asr` test-clean (15%) |
|
**Device:** T4 GPU |
|
|
|
### Accuracy |
|
|
|
* **WER:** 3.49% |
|
* **CER:** 1.32% |
|
|
|
### Performance |
|
|
|
* **Avg inference time:** 0.688s |
|
* **P95 inference time:** 1.057s |
|
* **Throughput:** 1.38 samples/sec |
|
|
|
### Resource Usage |
|
|
|
* **Peak GPU memory:** 2.48 GB |
|
* **Final GPU utilization:** 15% |
|
* **Final RAM usage:** 49.4% |
|
|
|
--- |
|
|
|
## π Scalability Test |
|
|
|
Successfully transcribed **2 hours of audio** |
|
([sam\_altman\_lex\_podcast\_367.flac](https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac)) |
|
in **under 3 minutes** using minimal GPU. |
|
|
|
--- |
|
|
|
## π§ Notes |
|
|
|
* Use the **Pruna runtime** for maximum efficiency. |
|
* Works with both `transformers` and `pruna` APIs. |
|
* Optimized for **low VRAM environments** without loss in accuracy. |