---
library_name: transformers
license: mit
datasets:
- mozilla-foundation/common_voice_17_0
- mozilla-foundation/common_voice_16_1
- JackyHoCL/cleaned_mixed_cantonese_and_english_speech
metrics:
- cer
base_model:
- openai/whisper-large-v3-turbo
---
---------------------------------------------------------------

## A version with noise detection is trained base on this model, to reduce hallucination during streaming:<br>
**Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection<br>**
https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
<br/>
<br/>
transformers-4.49.0
<br/>
For Cantonese + English, use 'yue', for Cantonese + Mandarin + English, use 'zh'
<br/>
---------------------------------------------------------------
TODO: <br/>
1.Improve zh-CN performance <br/>
2.Improve overall performance (yue+zh+en) with background noise **(Please kindly suggest/provide dataset if possible, thx)** <br/>

2025-07-21:
CER:
| Dataset  | Lang | Split | CER(in %) |
| -------- | ------- | ------- | ------- |
|Training|yue|validation|8.05|
|mozilla-foundation/common_voice_17_0|yue|test|**0.64**|
|JackyHoCL/cleaned_mixed_cantonese_and_english_speech|yue|test|8.3|
|mozilla-foundation/common_voice_17_0|en|test(2k samples)|5.22|
|mozilla-foundation/common_voice_16_1|zh-CN|test|11.89|


2025-07-19:
CER:
| Dataset  | Lang | Split | CER(in %) |
| -------- | ------- | ------- | ------- |
|Training|yue|validation|8.94|
|mozilla-foundation/common_voice_17_0|yue|test|1.29|
|JackyHoCL/cleaned_mixed_cantonese_and_english_speech|yue|test|8.00|
|mozilla-foundation/common_voice_17_0|en|test|6.8|
|mozilla-foundation/common_voice_16_1|zh-CN|test|50.9|

2025-07-06:
CER:
| Dataset  | Lang | Split | CER(in %) |
| -------- | ------- | ------- | ------- |
|Training|yue|validation|8.92|
|mozilla-foundation/common_voice_17_0|yue|test|8.86|
|JackyHoCL/cleaned_mixed_cantonese_and_english_speech|yue|test|7.96|
|mozilla-foundation/common_voice_17_0|en|test|6.84|
|mozilla-foundation/common_voice_16_1|zh-CN|test|43.0|

per_device_train_batch_size=32,<br/>
learning_rate=1e-7,<br/>

---------------------------------------------------------------
2025-07-03:
CER:
| Dataset  | Lang | Split | CER(in %) |
| -------- | ------- | ------- | ------- |
|Training|yue|validation|9.705|
|mozilla-foundation/common_voice_17_0|yue|test|9.31|
|JackyHoCL/cleaned_mixed_cantonese_and_english_speech|yue|test|8.37|

per_device_train_batch_size=32,<br/>
learning_rate=1e-5,<br/>

---------------------------------------------------------------

CER: 13.7% <br/>

Train Args:<br/>
per_device_train_batch_size=16,<br/>
gradient_accumulation_steps=1,<br/>
learning_rate=1e-5,<br/>
gradient_checkpointing=True,<br/>
per_device_eval_batch_size=16,<br/>
generation_max_length=225,<br/>

Hardware:<br/>
NVIDIA Tesla V100 16GB * 4<br/>

A Realtime Streaming application example is built on this model:<br/>
https://github.com/JackyHoCL/whisper-realtime.git
<br/>

FAQ:
1. If having tokenizer issue during inference, please update your transformers version to >= 4.49.0

```bash
pip install --upgrade transformers
```