Update README.md
Browse files
README.md
CHANGED
@@ -157,35 +157,35 @@ Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation
|
|
157 |
|
158 |
## baseline
|
159 |
```Shell
|
160 |
-
lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size
|
161 |
```
|
162 |
|
163 |
## int8 dynamic activation and int4 weight quantization (8da4w)
|
164 |
```Shell
|
165 |
-
lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-8da4w --tasks hellaswag --device cuda:0 --batch_size
|
166 |
```
|
167 |
|
168 |
| Benchmark | | |
|
169 |
|----------------------------------|-------------|-------------------|
|
170 |
| | Phi-4 mini-Ins | phi4-mini-8da4w|
|
171 |
| **Popular aggregated benchmark** | | |
|
172 |
-
| mmlu (0 shot) | 66.73 |
|
173 |
-
| mmlu_pro (5-shot) | 46.43 |
|
174 |
| **Reasoning** | | |
|
175 |
-
| arc_challenge | 56.91 |
|
176 |
-
| gpqa_main_zeroshot | 30.13 |
|
177 |
-
| hellaswag | 54.57 |
|
178 |
-
| openbookqa | 33.00 |
|
179 |
-
| piqa (0-shot) | 77.64 |
|
180 |
-
| siqa | 49.59 |
|
181 |
-
| truthfulqa_mc2 (0-shot) | 48.39 |
|
182 |
-
| winogrande (0-shot) | 71.11 | 70.
|
183 |
| **Multilingual** | | |
|
184 |
-
| mgsm_en_cot_en | 60.80 |
|
185 |
| **Math** | | |
|
186 |
-
| gsm8k (5-shot) | 81.88 |
|
187 |
-
| Mathqa (0-shot) | 42.31 |
|
188 |
-
| **Overall** | 55.35 |
|
189 |
|
190 |
|
191 |
# Exporting to ExecuTorch
|
|
|
157 |
|
158 |
## baseline
|
159 |
```Shell
|
160 |
+
lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
|
161 |
```
|
162 |
|
163 |
## int8 dynamic activation and int4 weight quantization (8da4w)
|
164 |
```Shell
|
165 |
+
lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-8da4w --tasks hellaswag --device cuda:0 --batch_size 8
|
166 |
```
|
167 |
|
168 |
| Benchmark | | |
|
169 |
|----------------------------------|-------------|-------------------|
|
170 |
| | Phi-4 mini-Ins | phi4-mini-8da4w|
|
171 |
| **Popular aggregated benchmark** | | |
|
172 |
+
| mmlu (0 shot) | 66.73 | 60.75 |
|
173 |
+
| mmlu_pro (5-shot) | 46.43 | 11.75 |
|
174 |
| **Reasoning** | | |
|
175 |
+
| arc_challenge | 56.91 | 48.46 |
|
176 |
+
| gpqa_main_zeroshot | 30.13 | 30.80 |
|
177 |
+
| hellaswag | 54.57 | 50.35 |
|
178 |
+
| openbookqa | 33.00 | 30.40 |
|
179 |
+
| piqa (0-shot) | 77.64 | 74.43 |
|
180 |
+
| siqa | 49.59 | 44.98 |
|
181 |
+
| truthfulqa_mc2 (0-shot) | 48.39 | 51.35 |
|
182 |
+
| winogrande (0-shot) | 71.11 | 70.32 |
|
183 |
| **Multilingual** | | |
|
184 |
+
| mgsm_en_cot_en | 60.80 | 57.60 |
|
185 |
| **Math** | | |
|
186 |
+
| gsm8k (5-shot) | 81.88 | 61.71 |
|
187 |
+
| Mathqa (0-shot) | 42.31 | 36.95 |
|
188 |
+
| **Overall** | 55.35 | 48.45 |
|
189 |
|
190 |
|
191 |
# Exporting to ExecuTorch
|