Cool-Whisper
Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data
Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee
⚠️ Due to privacy and security concerns, this model will be temporarily taken offline. We are sorry for the inconvenience.
⚠️ 因為隱私安全疑慮,本模型將暫時下架。非常抱歉造成大家困擾。
Introduction
- Cool-whisper is a distilled version of Whisper, mainly focused on Mandarin-English code-switching ASR for people in Taiwan.
- We use 60,000 hours of unlabeled audio to train the model.
- Practically, we utilize knowledge not only from the large model (Whisper-large-v2) but also from the small model (Whisper-base).
Basic Usage
- This model repository is in the form of CTranslate2 and is compatible with faster-whisper.
- Using faster-whisper can lead to about 3~5 times faster generation speed than the original implementation from OpenAI.
- If you prefer using the model through Hugging Face
transformers
, please visit https://huggingface.co/andybi7676/cool-whisper-hf
from faster_whisper import WhisperModel
import soundfile as sf
model_card = "andybi7676/cool-whisper"
audio_fpath = "/your/path/to/audio.wav"
audio_info = sf.info(audio_fpath)
print(audio_info) # for debug
model = WhisperModel(model_card, device="cuda", compute_type="float16")
segments, info = model.transcribe(audio_fpath, beam_size=5, language="zh", condition_on_previous_text=True) # zh for zh-en code-switching in cool-whisper
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
- Downloads last month
- 2