---
license: apache-2.0
language:
  - en
  - multilingual
pipeline_tag: automatic-speech-recognition
library_name: pruna
tags:
  - pruna
  - whisper
  - speech-recognition
base_model:
  - unsloth/whisper-large-v3-turbo
---

# Whisper Large V3 Turbo - Pruna Smashed

**Pruna-optimized version of Whisper Large V3 Turbo.**  
Compressed with `c_whisper` compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.

---

## 📌 Usage

**Best performance (Pruna runtime):**
```python
from pruna import PrunaModel

model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
result = model("audio.wav")
````

**Standard Transformers:**

```python
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
```

✅ Tested on Google Colab T4 GPU

---

## 📊 Evaluation Results

**Dataset:** `librispeech_asr` test-clean (15%)
**Device:** T4 GPU

### Accuracy

* **WER:** 3.49%
* **CER:** 1.32%

### Performance

* **Avg inference time:** 0.688s
* **P95 inference time:** 1.057s
* **Throughput:** 1.38 samples/sec

### Resource Usage

* **Peak GPU memory:** 2.48 GB
* **Final GPU utilization:** 15%
* **Final RAM usage:** 49.4%

---

## 🚀 Scalability Test

Successfully transcribed **2 hours of audio**
([sam\_altman\_lex\_podcast\_367.flac](https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac))
in **under 3 minutes** using minimal GPU.

---

## 🔧 Notes

* Use the **Pruna runtime** for maximum efficiency.
* Works with both `transformers` and `pruna` APIs.
* Optimized for **low VRAM environments** without loss in accuracy.