Update README.md
Browse files
README.md
CHANGED
@@ -119,16 +119,29 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq
|
|
119 |
`TODO: more complete eval results`
|
120 |
|
121 |
|
122 |
-
| Benchmark |
|
123 |
-
|
124 |
-
| | Phi-4 mini-Ins | phi4-mini-
|
125 |
-
| **Popular aggregated benchmark** |
|
126 |
-
|
|
127 |
-
|
|
128 |
-
| **
|
129 |
-
|
|
130 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
131 |
|
|
|
132 |
# Model Performance
|
133 |
|
134 |
## Results (H100 machine)
|
|
|
119 |
`TODO: more complete eval results`
|
120 |
|
121 |
|
122 |
+
| Benchmark | | |
|
123 |
+
|----------------------------------|----------------|---------------------|
|
124 |
+
| | Phi-4 mini-Ins | phi4-mini-int4wo |
|
125 |
+
| **Popular aggregated benchmark** | | |
|
126 |
+
| mmlu (0-shot) | | x |
|
127 |
+
| mmlu_pro (5-shot) | | x |
|
128 |
+
| **Reasoning** | | |
|
129 |
+
| arc_challenge (0-shot) | | x |
|
130 |
+
| gpqa_main_zeroshot | | x |
|
131 |
+
| HellaSwag | 54.57 | 54.55 |
|
132 |
+
| openbookqa | | x |
|
133 |
+
| piqa (0-shot) | | x |
|
134 |
+
| social_iqa | | x |
|
135 |
+
| truthfulqa_mc2 (0-shot) | | x |
|
136 |
+
| winogrande (0-shot) | | x |
|
137 |
+
| **Multilingual** | | |
|
138 |
+
| mgsm_en_cot_en | | x |
|
139 |
+
| **Math** | | |
|
140 |
+
| gsm8k (5-shot) | | x |
|
141 |
+
| mathqa (0-shot) | | x |
|
142 |
+
| **Overall** | **TODO** | **TODO** |
|
143 |
|
144 |
+
|
145 |
# Model Performance
|
146 |
|
147 |
## Results (H100 machine)
|