Update README.md

e384b5c verified 11 days ago

1.87 kB

metadata

license: apache-2.0
language:
  - en
  - multilingual
pipeline_tag: automatic-speech-recognition
library_name: pruna
tags:
  - pruna
  - whisper
  - speech-recognition
base_model:
  - unsloth/whisper-large-v3-turbo

Whisper Large V3 Turbo - Pruna Smashed

Pruna-optimized version of Whisper Large V3 Turbo.
Compressed with c_whisper compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.

📌 Usage

Best performance (Pruna runtime):

from pruna import PrunaModel

model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
result = model("audio.wav")

Standard Transformers:

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")

✅ Tested on Google Colab T4 GPU

📊 Evaluation Results

Dataset: librispeech_asr test-clean (15%) Device: T4 GPU

Accuracy

WER: 3.49%
CER: 1.32%

Performance

Avg inference time: 0.688s
P95 inference time: 1.057s
Throughput: 1.38 samples/sec

Resource Usage

Peak GPU memory: 2.48 GB
Final GPU utilization: 15%
Final RAM usage: 49.4%

🚀 Scalability Test

Successfully transcribed 2 hours of audio (sam_altman_lex_podcast_367.flac) in under 3 minutes using minimal GPU.

🔧 Notes

Use the Pruna runtime for maximum efficiency.
Works with both transformers and pruna APIs.
Optimized for low VRAM environments without loss in accuracy.