Malaysian Qwen2.5 1.5B-Instruct
Continue finetuning Qwen/Qwen2.5-1.5B-Instruct on highly curated 1.2B tokens Malaysian instruction.
Improvement
- 128k context length.
- Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
- Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
- Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.
- Standard RAG.
MalayMMLU
Model Accuracy shot by_letter category
0 Malaysian-Qwen2.5-1.5B-Instruct 53.336062 0shot True STEM
1 Malaysian-Qwen2.5-1.5B-Instruct 62.022901 0shot True Language
2 Malaysian-Qwen2.5-1.5B-Instruct 51.503325 0shot True Social science
3 Malaysian-Qwen2.5-1.5B-Instruct 49.652195 0shot True Others
4 Malaysian-Qwen2.5-1.5B-Instruct 55.608646 0shot True Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Malaysian-Qwen2.5-1.5B-Instruct
Metric : first
Shot : 0shot
average accuracy 54.846570024367075
accuracy for STEM 53.33606221858371
accuracy for Language 62.02290076335878
accuracy for Social science 51.50332466030645
accuracy for Others 49.65219477092828
accuracy for Humanities 55.60864618885096
Training session
Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.
How we train
- LoRA on
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]
. - 256 Rank with alpha 512, or alpha of 2.0
- Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
- Forked CCE loss for LoRA
lm_head
to reduce memory consumption.
Source code at https://github.com/malaysia-ai/cooking/tree/main/qwen/sft
Example
Load the model,
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch
tokenizer = AutoTokenizer.from_pretrained('malaysia-ai/Malaysian-Qwen2.5-1.5B-Instruct')
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(
'malaysia-ai/Malaysian-Qwen2.5-1.5B-Instruct', torch_dtype = torch.bfloat16
).cuda()
- All examples are using stochastic sampling method, might not able to reproduce the same results on different machines.
- Some examples might been truncated, too long for this README.
- Downloads last month
- 182
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.