andybi7676
/

cool-whisper-hf

Automatic Speech Recognition

Model card Files Files and versions Community

cool-whisper-hf / README.md

andybi7676's picture

Update README.md

c048343 verified 12 months ago

|

history blame contribute delete

2.96 kB

	---
	language:
	- zh
	license: cc-by-nc-sa-4.0
	library_name: transformers
	tags:
	- audio
	- automatic-speech-recognition
	widget:
	- example_title: Model Introduction
	src: https://huggingface.co/andybi7676/cool-whisper-hf/resolve/main/sample1.weba
	pipeline_tag: automatic-speech-recognition
	---

	# Cool-Whisper

	### Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data
	<span style="font-size: 0.95em;">Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee</span>

	[![arXiv](https://img.shields.io/badge/arXiv-Paper-color.svg)](https://arxiv.org/abs/2407.10603) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ZikUWKch78Jv3Yw7LtUKUn4wMrFCx6lD?usp=sharing)

	> ⚠️ Due to privacy and security concerns, this model will be temporarily taken offline. We are sorry for the inconvenience.

	> ⚠️ 因為隱私安全疑慮，本模型將暫時下架。非常抱歉造成大家困擾。

	## Introduction

	* Cool-whisper is a distilled version of Whisper, mainly focused on Mandarin-English code-switching ASR for people in Taiwan.
	* We use 60,000 hours of unlabeled audio to train the model.
	* Practically, we utilize knowledge not only from the large model (Whisper-large-v2) but also from the small model (Whisper-base).

	## Basic Usage

	``` python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	from datasets import load_dataset

	device = f"cuda" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

	model_id = "andybi7676/cool-whisper-hf"

	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, use_safetensors=True
	)
	processor = AutoProcessor.from_pretrained(model_id)

	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	max_new_tokens=256,
	return_timestamps=True,
	torch_dtype=torch_dtype,
	device=device,
	)

	dataset = load_dataset("andybi7676/ntuml2021_long", "default", split="test")
	sample = dataset[0]["audio"]
	# or your own audio path
	# sample = "/your/path/to/audio.wav"

	result = pipe(sample)
	print("Basic Result: ")
	print(result["text"])
	# result with timestamps
	print("\nResult with timestamps: ")
	for chunk in result['chunks']:
	print(chunk)
	```

	## Faster-Whisper Support

	[Faster-Whisper](https://github.com/SYSTRAN/faster-whisper) is a commonly used tool to accelerate the transcription generation speed based on [CTranslate2](https://github.com/OpenNMT/CTranslate2/).
	We also deploy our model in the form of CTranslate2 to allow using it in faster-whisper.
	Please visit [cool-whisper](https://huggingface.co/andybi7676/cool-whisper) for more details.