Automatic Speech Recognition
Transformers
Safetensors
whisper

A version with noise detection is trained base on this model, to reduce hallucination during streaming:

Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection

transformers-4.49.0
For Cantonese + English, use 'yue', for Cantonese + Mandarin + English, use 'zh'

TODO:
1.Improve zh-CN performance
2.Improve overall performance (yue+zh+en) with background noise (Please kindly suggest/provide dataset if possible, thx)

2025-07-21: CER:

Dataset Lang Split CER(in %)
Training yue validation 8.05
mozilla-foundation/common_voice_17_0 yue test 0.64
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test 8.3
mozilla-foundation/common_voice_17_0 en test(2k samples) 5.22
mozilla-foundation/common_voice_16_1 zh-CN test 11.89

2025-07-19: CER:

Dataset Lang Split CER(in %)
Training yue validation 8.94
mozilla-foundation/common_voice_17_0 yue test 1.29
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test 8.00
mozilla-foundation/common_voice_17_0 en test 6.8
mozilla-foundation/common_voice_16_1 zh-CN test 50.9

2025-07-06: CER:

Dataset Lang Split CER(in %)
Training yue validation 8.92
mozilla-foundation/common_voice_17_0 yue test 8.86
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test 7.96
mozilla-foundation/common_voice_17_0 en test 6.84
mozilla-foundation/common_voice_16_1 zh-CN test 43.0

per_device_train_batch_size=32,
learning_rate=1e-7,


2025-07-03: CER:

Dataset Lang Split CER(in %)
Training yue validation 9.705
mozilla-foundation/common_voice_17_0 yue test 9.31
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test 8.37

per_device_train_batch_size=32,
learning_rate=1e-5,


CER: 13.7%

Train Args:
per_device_train_batch_size=16,
gradient_accumulation_steps=1,
learning_rate=1e-5,
gradient_checkpointing=True,
per_device_eval_batch_size=16,
generation_max_length=225,

Hardware:
NVIDIA Tesla V100 16GB * 4

A Realtime Streaming application example is built on this model:
https://github.com/JackyHoCL/whisper-realtime.git

FAQ:

  1. If having tokenizer issue during inference, please update your transformers version to >= 4.49.0
pip install --upgrade transformers
Downloads last month
1,302
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english

Finetuned
(307)
this model
Finetunes
2 models

Datasets used to train JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english