--- license: apache-2.0 language: - en - multilingual pipeline_tag: automatic-speech-recognition library_name: pruna tags: - pruna - whisper - speech-recognition base_model: - unsloth/whisper-large-v3-turbo --- # Whisper Large V3 Turbo - Pruna Smashed **Pruna-optimized version of Whisper Large V3 Turbo.** Compressed with `c_whisper` compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality. --- ## 📌 Usage **Best performance (Pruna runtime):** ```python from pruna import PrunaModel model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed") result = model("audio.wav") ```` **Standard Transformers:** ```python from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed") processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed") ``` ✅ Tested on Google Colab T4 GPU --- ## 📊 Evaluation Results **Dataset:** `librispeech_asr` test-clean (15%) **Device:** T4 GPU ### Accuracy * **WER:** 3.49% * **CER:** 1.32% ### Performance * **Avg inference time:** 0.688s * **P95 inference time:** 1.057s * **Throughput:** 1.38 samples/sec ### Resource Usage * **Peak GPU memory:** 2.48 GB * **Final GPU utilization:** 15% * **Final RAM usage:** 49.4% --- ## 🚀 Scalability Test Successfully transcribed **2 hours of audio** ([sam\_altman\_lex\_podcast\_367.flac](https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac)) in **under 3 minutes** using minimal GPU. --- ## 🔧 Notes * Use the **Pruna runtime** for maximum efficiency. * Works with both `transformers` and `pruna` APIs. * Optimized for **low VRAM environments** without loss in accuracy.