MaziyarPanahi
commited on
Commit
•
1f8989d
1
Parent(s):
e288c94
clean up the evals (#10)
Browse files- clean up the evals (a28484100ec7164da081963a94ce5d9c49872a4d)
README.md
CHANGED
@@ -147,6 +147,20 @@ GGUF (2/3/4/5/6/8 bits): [MaziyarPanahi/phi-2-logical-sft-GGUF](https://huggingf
|
|
147 |
### Response:
|
148 |
```
|
149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
## Examples
|
151 |
|
152 |
```
|
@@ -222,19 +236,6 @@ Now, let's eliminate the first possibility, because it contradicts the premise t
|
|
222 |
---
|
223 |
|
224 |
|
225 |
-
|
226 |
-
## Model description
|
227 |
-
|
228 |
-
More information needed
|
229 |
-
|
230 |
-
## Intended uses & limitations
|
231 |
-
|
232 |
-
More information needed
|
233 |
-
|
234 |
-
## Training and evaluation data
|
235 |
-
|
236 |
-
More information needed
|
237 |
-
|
238 |
## Training procedure
|
239 |
|
240 |
### Training hyperparameters
|
@@ -359,17 +360,6 @@ special_tokens:
|
|
359 |
pad_token: "<|endoftext|>"
|
360 |
```
|
361 |
|
362 |
-
</details
|
363 |
-
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
364 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__phi-2-logical-sft)
|
365 |
|
366 |
-
| Metric |Value|
|
367 |
-
|---------------------------------|----:|
|
368 |
-
|Avg. |61.50|
|
369 |
-
|AI2 Reasoning Challenge (25-Shot)|61.35|
|
370 |
-
|HellaSwag (10-Shot) |75.14|
|
371 |
-
|MMLU (5-Shot) |57.40|
|
372 |
-
|TruthfulQA (0-shot) |44.39|
|
373 |
-
|Winogrande (5-shot) |74.90|
|
374 |
-
|GSM8k (5-shot) |55.80|
|
375 |
|
|
|
147 |
### Response:
|
148 |
```
|
149 |
|
150 |
+
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
151 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__phi-2-logical-sft)
|
152 |
+
|
153 |
+
| Metric |Value|
|
154 |
+
|---------------------------------|----:|
|
155 |
+
|Avg. |61.50|
|
156 |
+
|AI2 Reasoning Challenge (25-Shot)|61.35|
|
157 |
+
|HellaSwag (10-Shot) |75.14|
|
158 |
+
|MMLU (5-Shot) |57.40|
|
159 |
+
|TruthfulQA (0-shot) |44.39|
|
160 |
+
|Winogrande (5-shot) |74.90|
|
161 |
+
|GSM8k (5-shot) |55.80|
|
162 |
+
|
163 |
+
|
164 |
## Examples
|
165 |
|
166 |
```
|
|
|
236 |
---
|
237 |
|
238 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
239 |
## Training procedure
|
240 |
|
241 |
### Training hyperparameters
|
|
|
360 |
pad_token: "<|endoftext|>"
|
361 |
```
|
362 |
|
363 |
+
</details>
|
|
|
|
|
364 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
365 |
|