manohar03
/

unsloth-whisper-large-v3-turbo-pruna-smashed

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

unsloth-whisper-large-v3-turbo-pruna-smashed / README.md

manohar03's picture

Update README.md

e384b5c verified 11 days ago

|

history blame contribute delete

1.87 kB

	---
	license: apache-2.0
	language:
	- en
	- multilingual
	pipeline_tag: automatic-speech-recognition
	library_name: pruna
	tags:
	- pruna
	- whisper
	- speech-recognition
	base_model:
	- unsloth/whisper-large-v3-turbo
	---

	# Whisper Large V3 Turbo - Pruna Smashed

	Pruna-optimized version of Whisper Large V3 Turbo.
	Compressed with `c_whisper` compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.

	---

	## 📌 Usage

	Best performance (Pruna runtime):
	```python
	from pruna import PrunaModel

	model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
	result = model("audio.wav")
	````

	Standard Transformers:

	```python
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

	model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
	processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
	```

	✅ Tested on Google Colab T4 GPU

	---

	## 📊 Evaluation Results

	Dataset: `librispeech_asr` test-clean (15%)
	Device: T4 GPU

	### Accuracy

	* WER: 3.49%
	* CER: 1.32%

	### Performance

	* Avg inference time: 0.688s
	* P95 inference time: 1.057s
	* Throughput: 1.38 samples/sec

	### Resource Usage

	* Peak GPU memory: 2.48 GB
	* Final GPU utilization: 15%
	* Final RAM usage: 49.4%

	---

	## 🚀 Scalability Test

	Successfully transcribed 2 hours of audio
	([sam\_altman\_lex\_podcast\_367.flac](https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac))
	in under 3 minutes using minimal GPU.

	---

	## 🔧 Notes

	* Use the Pruna runtime for maximum efficiency.
	* Works with both `transformers` and `pruna` APIs.
	* Optimized for low VRAM environments without loss in accuracy.