JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english

A version with noise detection is trained base on this model, to reduce hallucination during streaming:

Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection

transformers-4.49.0
For Cantonese + English, use 'yue', for Cantonese + Mandarin + English, use 'zh'

TODO:
1.Improve zh-CN performance
2.Improve overall performance (yue+zh+en) with background noise (Please kindly suggest/provide dataset if possible, thx)

2025-07-21: CER:

Dataset	Lang	Split	CER(in %)
Training	yue	validation	8.05
mozilla-foundation/common_voice_17_0	yue	test	0.64
JackyHoCL/cleaned_mixed_cantonese_and_english_speech	yue	test	8.3
mozilla-foundation/common_voice_17_0	en	test(2k samples)	5.22
mozilla-foundation/common_voice_16_1	zh-CN	test	11.89

2025-07-19: CER:

Dataset	Lang	Split	CER(in %)
Training	yue	validation	8.94
mozilla-foundation/common_voice_17_0	yue	test	1.29
JackyHoCL/cleaned_mixed_cantonese_and_english_speech	yue	test	8.00
mozilla-foundation/common_voice_17_0	en	test	6.8
mozilla-foundation/common_voice_16_1	zh-CN	test	50.9

2025-07-06: CER:

Dataset	Lang	Split	CER(in %)
Training	yue	validation	8.92
mozilla-foundation/common_voice_17_0	yue	test	8.86
JackyHoCL/cleaned_mixed_cantonese_and_english_speech	yue	test	7.96
mozilla-foundation/common_voice_17_0	en	test	6.84
mozilla-foundation/common_voice_16_1	zh-CN	test	43.0

per_device_train_batch_size=32,
learning_rate=1e-7,

2025-07-03: CER:

Dataset	Lang	Split	CER(in %)
Training	yue	validation	9.705
mozilla-foundation/common_voice_17_0	yue	test	9.31
JackyHoCL/cleaned_mixed_cantonese_and_english_speech	yue	test	8.37

per_device_train_batch_size=32,
learning_rate=1e-5,

CER: 13.7%

Train Args:
per_device_train_batch_size=16,
gradient_accumulation_steps=1,
learning_rate=1e-5,
gradient_checkpointing=True,
per_device_eval_batch_size=16,
generation_max_length=225,

Hardware:
NVIDIA Tesla V100 16GB * 4

A Realtime Streaming application example is built on this model:
https://github.com/JackyHoCL/whisper-realtime.git

FAQ:

If having tokenizer issue during inference, please update your transformers version to >= 4.49.0

pip install --upgrade transformers

JackyHoCL
/

whisper-large-v3-turbo-cantonese-yue-english

A version with noise detection is trained base on this model, to reduce hallucination during streaming:

Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection

transformers-4.49.0
For Cantonese + English, use 'yue', for Cantonese + Mandarin + English, use 'zh'

Model tree for JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english

Datasets used to train JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english

A version with noise detection is trained base on this model, to reduce hallucination during streaming:

Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection transformers-4.49.0 For Cantonese + English, use 'yue', for Cantonese + Mandarin + English, use 'zh'

Model tree for JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english

Datasets used to train JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english

Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection

transformers-4.49.0
For Cantonese + English, use 'yue', for Cantonese + Mandarin + English, use 'zh'