Update README.md

Browse files

Files changed (1) hide show

README.md +16 -16

README.md CHANGED Viewed

@@ -157,35 +157,35 @@ Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation
 ## baseline
 ```Shell
-lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 64
 ```
 ## int8 dynamic activation and int4 weight quantization (8da4w)
 ```Shell
-lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-8da4w --tasks hellaswag --device cuda:0 --batch_size 64
 ```
 | Benchmark                        |             |                   |
 |----------------------------------|-------------|-------------------|
 |                                  | Phi-4 mini-Ins | phi4-mini-8da4w|
 | **Popular aggregated benchmark** |             |                   |
-| mmlu (0 shot)                    | 66.73       | 63.11             |
-| mmlu_pro (5-shot)                | 46.43	     | 35.31             |
 | **Reasoning**                    |             |                   |
-| arc_challenge                    | 56.91       | 55.12             |
-| gpqa_main_zeroshot               | 30.13       | 29.02             |
-| hellaswag                        | 54.57       | 53.23             |
-| openbookqa                       | 33.00       | 32.40             |
-| piqa (0-shot)                    | 77.64       | 76.66             |
-| siqa                             | 49.59       | 47.08             |
-| truthfulqa_mc2 (0-shot)          | 48.39       | 47.99             |
-| winogrande (0-shot)              | 71.11       | 70.17             |
 | **Multilingual**                 |             |                   |
-| mgsm_en_cot_en                   | 60.80       | 58.8              |
 | **Math**                         |             |                   |
-| gsm8k (5-shot)                   | 81.88       | 70.43             |
-| Mathqa (0-shot)                  | 42.31       | 41.57             |
-| **Overall**                      | 55.35       | 52.38             |
 # Exporting to ExecuTorch

 ## baseline
 ```Shell
+lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
 ```
 ## int8 dynamic activation and int4 weight quantization (8da4w)
 ```Shell
+lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-8da4w --tasks hellaswag --device cuda:0 --batch_size 8
 ```
 | Benchmark                        |             |                   |
 |----------------------------------|-------------|-------------------|
 |                                  | Phi-4 mini-Ins | phi4-mini-8da4w|
 | **Popular aggregated benchmark** |             |                   |
+| mmlu (0 shot)                    | 66.73       | 60.75             |
+| mmlu_pro (5-shot)                | 46.43	     | 11.75             |
 | **Reasoning**                    |             |                   |
+| arc_challenge                    | 56.91       | 48.46             |
+| gpqa_main_zeroshot               | 30.13       | 30.80             |
+| hellaswag                        | 54.57       | 50.35             |
+| openbookqa                       | 33.00       | 30.40             |
+| piqa (0-shot)                    | 77.64       | 74.43             |
+| siqa                             | 49.59       | 44.98             |
+| truthfulqa_mc2 (0-shot)          | 48.39       | 51.35             |
+| winogrande (0-shot)              | 71.11       | 70.32             |
 | **Multilingual**                 |             |                   |
+| mgsm_en_cot_en                   | 60.80       | 57.60             |
 | **Math**                         |             |                   |
+| gsm8k (5-shot)                   | 81.88       | 61.71             |
+| Mathqa (0-shot)                  | 42.31       | 36.95             |
+| **Overall**                      | 55.35       | 48.45             |
 # Exporting to ExecuTorch