Malaysian Qwen2.5 1.5B-Instruct

Continue finetuning Qwen/Qwen2.5-1.5B-Instruct on highly curated 1.2B tokens Malaysian instruction.

Improvement

  1. 128k context length.
  2. Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  3. Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  4. Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.
  5. Standard RAG.

MalayMMLU

                             Model   Accuracy   shot by_letter        category
0  Malaysian-Qwen2.5-1.5B-Instruct  53.336062  0shot      True            STEM
1  Malaysian-Qwen2.5-1.5B-Instruct  62.022901  0shot      True        Language
2  Malaysian-Qwen2.5-1.5B-Instruct  51.503325  0shot      True  Social science
3  Malaysian-Qwen2.5-1.5B-Instruct  49.652195  0shot      True          Others
4  Malaysian-Qwen2.5-1.5B-Instruct  55.608646  0shot      True      Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Malaysian-Qwen2.5-1.5B-Instruct
Metric : first
Shot : 0shot
average accuracy 54.846570024367075
accuracy for STEM 53.33606221858371
accuracy for Language 62.02290076335878
accuracy for Social science 51.50332466030645
accuracy for Others 49.65219477092828
accuracy for Humanities 55.60864618885096

Training session

Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.

How we train

  1. LoRA on ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"].
  2. 256 Rank with alpha 512, or alpha of 2.0
  3. Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
  4. Forked CCE loss for LoRA lm_head to reduce memory consumption.

Source code at https://github.com/malaysia-ai/cooking/tree/main/qwen/sft

Example

Load the model,

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch

tokenizer = AutoTokenizer.from_pretrained('malaysia-ai/Malaysian-Qwen2.5-1.5B-Instruct')
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(
    'malaysia-ai/Malaysian-Qwen2.5-1.5B-Instruct', torch_dtype = torch.bfloat16
).cuda()
  • All examples are using stochastic sampling method, might not able to reproduce the same results on different machines.
  • Some examples might been truncated, too long for this README.
Downloads last month
182
Safetensors
Model size
1.78B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mesolitica/Malaysian-Qwen2.5-1.5B-Instruct

Quantizations
1 model

Dataset used to train mesolitica/Malaysian-Qwen2.5-1.5B-Instruct