--- language: - ms - en - zh - ta --- # Malaysian gemma-3-1b-it Continue finetuning https://huggingface.co/google/gemma-3-1b-it on highly curated 1.5B tokens Malaysian instruction dataset. ## Improvement 1. Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu. 2. Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu. 3. Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages. ## Training session Finetune on [mesolitica/Malaysian-SFT](https://huggingface.co/datasets/mesolitica/Malaysian-SFT) to make the model understand Malaysian context. ## How we train 1. LoRA on `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]`. 2. 128 Rank with alpha 256, or alpha of 2.0 3. Multipacking 8192 context length with proper SDPA causal masking to prevent document contamination and also make sure proper position ids. 4. Chunk CCE loss for LoRA. 5. WanDB at https://wandb.ai/huseinzol05/lora-embedding-128-gemma3-1b-malaysian-8k?nw=nwuserhuseinzol05 Source code at https://github.com/mesolitica/malaya/tree/master/session/gemma3 ## Benchmark ### MalayMMLU Based on 0-shot first token accuracy, ``` Model Accuracy shot by_letter category 0 Malaysian-gemma-3-1b-it 48.096603 0shot True STEM 1 Malaysian-gemma-3-1b-it 47.423664 0shot True Language 2 Malaysian-gemma-3-1b-it 47.210176 0shot True Social science 3 Malaysian-gemma-3-1b-it 47.709283 0shot True Others 4 Malaysian-gemma-3-1b-it 51.786121 0shot True Humanities {'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443} Model : Malaysian-gemma-3-1b-it Metric : first Shot : 0shot average accuracy 48.27158964192789 accuracy for STEM 48.09660253786328 accuracy for Language 47.4236641221374 accuracy for Social science 47.21017635154669 accuracy for Others 47.70928280163108 accuracy for Humanities 51.786120591581344 ``` ## Acknowledgement Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!