metadata
license: apache-2.0
language:
- en
- multilingual
pipeline_tag: automatic-speech-recognition
library_name: pruna
tags:
- pruna
- whisper
- speech-recognition
base_model:
- unsloth/whisper-large-v3-turbo
Whisper Large V3 Turbo - Pruna Smashed
Pruna-optimized version of Whisper Large V3 Turbo.
Compressed with c_whisper
compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.
π Usage
Best performance (Pruna runtime):
from pruna import PrunaModel
model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
result = model("audio.wav")
Standard Transformers:
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
β Tested on Google Colab T4 GPU
π Evaluation Results
Dataset: librispeech_asr
test-clean (15%)
Device: T4 GPU
Accuracy
- WER: 3.49%
- CER: 1.32%
Performance
- Avg inference time: 0.688s
- P95 inference time: 1.057s
- Throughput: 1.38 samples/sec
Resource Usage
- Peak GPU memory: 2.48 GB
- Final GPU utilization: 15%
- Final RAM usage: 49.4%
π Scalability Test
Successfully transcribed 2 hours of audio (sam_altman_lex_podcast_367.flac) in under 3 minutes using minimal GPU.
π§ Notes
- Use the Pruna runtime for maximum efficiency.
- Works with both
transformers
andpruna
APIs. - Optimized for low VRAM environments without loss in accuracy.