manohar03's picture
Update README.md
e384b5c verified
metadata
license: apache-2.0
language:
  - en
  - multilingual
pipeline_tag: automatic-speech-recognition
library_name: pruna
tags:
  - pruna
  - whisper
  - speech-recognition
base_model:
  - unsloth/whisper-large-v3-turbo

Whisper Large V3 Turbo - Pruna Smashed

Pruna-optimized version of Whisper Large V3 Turbo.
Compressed with c_whisper compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.


πŸ“Œ Usage

Best performance (Pruna runtime):

from pruna import PrunaModel

model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
result = model("audio.wav")

Standard Transformers:

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")

βœ… Tested on Google Colab T4 GPU


πŸ“Š Evaluation Results

Dataset: librispeech_asr test-clean (15%) Device: T4 GPU

Accuracy

  • WER: 3.49%
  • CER: 1.32%

Performance

  • Avg inference time: 0.688s
  • P95 inference time: 1.057s
  • Throughput: 1.38 samples/sec

Resource Usage

  • Peak GPU memory: 2.48 GB
  • Final GPU utilization: 15%
  • Final RAM usage: 49.4%

πŸš€ Scalability Test

Successfully transcribed 2 hours of audio (sam_altman_lex_podcast_367.flac) in under 3 minutes using minimal GPU.


πŸ”§ Notes

  • Use the Pruna runtime for maximum efficiency.
  • Works with both transformers and pruna APIs.
  • Optimized for low VRAM environments without loss in accuracy.