bert-large-cantonese
Description
This model is tranied from scratch on Cantonese text. It is a BERT model with a large architecture (24-layer, 1024-hidden, 16-heads, 326M parameters).
The first training stage is to pre-train the model on 128 length sequences with a batch size of 512 for 1 epoch. the second stage is to continued pre-train the model on 512 length sequences with a batch size of 512 for one more epoch.
How to use
You can use this model directly with a pipeline for masked language modeling:
from transformers import pipeline
mask_filler = pipeline(
"fill-mask",
model="hon9kon9ize/bert-large-cantonese"
)
mask_filler("雞蛋六隻,糖呢就兩茶匙,仲有[MASK]橙皮添。")
; [{'score': 0.08160534501075745,
; 'token': 943,
; 'token_str': '個',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 個 橙 皮 添 。'},
; {'score': 0.06182105466723442,
; 'token': 1576,
; 'token_str': '啲',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 啲 橙 皮 添 。'},
; {'score': 0.04600336775183678,
; 'token': 1646,
; 'token_str': '嘅',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 嘅 橙 皮 添 。'},
; {'score': 0.03743772581219673,
; 'token': 3581,
; 'token_str': '橙',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 橙 橙 皮 添 。'},
; {'score': 0.031560592353343964,
; 'token': 5148,
; 'token_str': '紅',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 紅 橙 皮 添 。'}]
Training hyperparameters
The following hyperparameters were used during first training:
- Batch size: 512
- Learning rate: 1e-4
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1
Loss plot on WanDB
The following hyperparameters were used during second training:
- Batch size: 512
- Learning rate: 5e-5
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1
Loss plot on WanDB
- Downloads last month
- 195
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.