manohar03's picture
Update README.md
e384b5c verified
---
license: apache-2.0
language:
- en
- multilingual
pipeline_tag: automatic-speech-recognition
library_name: pruna
tags:
- pruna
- whisper
- speech-recognition
base_model:
- unsloth/whisper-large-v3-turbo
---
# Whisper Large V3 Turbo - Pruna Smashed
**Pruna-optimized version of Whisper Large V3 Turbo.**
Compressed with `c_whisper` compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.
---
## πŸ“Œ Usage
**Best performance (Pruna runtime):**
```python
from pruna import PrunaModel
model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
result = model("audio.wav")
````
**Standard Transformers:**
```python
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
```
βœ… Tested on Google Colab T4 GPU
---
## πŸ“Š Evaluation Results
**Dataset:** `librispeech_asr` test-clean (15%)
**Device:** T4 GPU
### Accuracy
* **WER:** 3.49%
* **CER:** 1.32%
### Performance
* **Avg inference time:** 0.688s
* **P95 inference time:** 1.057s
* **Throughput:** 1.38 samples/sec
### Resource Usage
* **Peak GPU memory:** 2.48 GB
* **Final GPU utilization:** 15%
* **Final RAM usage:** 49.4%
---
## πŸš€ Scalability Test
Successfully transcribed **2 hours of audio**
([sam\_altman\_lex\_podcast\_367.flac](https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac))
in **under 3 minutes** using minimal GPU.
---
## πŸ”§ Notes
* Use the **Pruna runtime** for maximum efficiency.
* Works with both `transformers` and `pruna` APIs.
* Optimized for **low VRAM environments** without loss in accuracy.