Update README.md
Browse files
README.md
CHANGED
@@ -28,8 +28,6 @@ Finetune on [mesolitica/Malaysian-Reasoning](https://huggingface.co/datasets/mes
|
|
28 |
|
29 |
Source code at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5
|
30 |
|
31 |
-
## Benchmark
|
32 |
-
|
33 |
### Dialect Translation
|
34 |
|
35 |
All the benchmarks generate using vLLM, evaluation based on sacrebleu CHRF max@5.
|
@@ -39,15 +37,44 @@ Source code for evaluation at https://github.com/mesolitica/malaya/tree/master/s
|
|
39 |
Dialect to standard Malay,
|
40 |
|
41 |
```
|
|
|
42 |
```
|
43 |
|
44 |
Standard Malay to dialect,
|
45 |
|
46 |
```
|
|
|
47 |
```
|
48 |
|
49 |
### MalayMMLU
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
## Special thanks
|
52 |
|
53 |
Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
|
|
|
28 |
|
29 |
Source code at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5
|
30 |
|
|
|
|
|
31 |
### Dialect Translation
|
32 |
|
33 |
All the benchmarks generate using vLLM, evaluation based on sacrebleu CHRF max@5.
|
|
|
37 |
Dialect to standard Malay,
|
38 |
|
39 |
```
|
40 |
+
|
41 |
```
|
42 |
|
43 |
Standard Malay to dialect,
|
44 |
|
45 |
```
|
46 |
+
|
47 |
```
|
48 |
|
49 |
### MalayMMLU
|
50 |
|
51 |
+
Accuracy@5,
|
52 |
+
|
53 |
+
```
|
54 |
+
|
55 |
+
```
|
56 |
+
|
57 |
+
While the original model,
|
58 |
+
|
59 |
+
```
|
60 |
+
Model Accuracy shot by_letter category
|
61 |
+
0 Qwen2.5-1.5B-Instruct 57.306590 0shot True STEM
|
62 |
+
1 Qwen2.5-1.5B-Instruct 52.862595 0shot True Language
|
63 |
+
2 Qwen2.5-1.5B-Instruct 51.633420 0shot True Social science
|
64 |
+
3 Qwen2.5-1.5B-Instruct 52.554569 0shot True Others
|
65 |
+
4 Qwen2.5-1.5B-Instruct 57.224118 0shot True Humanities
|
66 |
+
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
|
67 |
+
Model : Qwen2.5-1.5B-Instruct
|
68 |
+
Metric : first
|
69 |
+
Shot : 0shot
|
70 |
+
average accuracy 53.69842646512204
|
71 |
+
accuracy for STEM 57.306590257879655
|
72 |
+
accuracy for Language 52.862595419847324
|
73 |
+
accuracy for Social science 51.633420063602195
|
74 |
+
accuracy for Others 52.554569441112974
|
75 |
+
accuracy for Humanities 57.22411831626849
|
76 |
+
```
|
77 |
+
|
78 |
## Special thanks
|
79 |
|
80 |
Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
|