siddartha-abacus commited on
Commit
ff5d63f
·
verified ·
1 Parent(s): e847d9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -27,4 +27,14 @@ parameters:
27
  dtype: float16
28
  ```
29
 
30
- Models chose to achieve a mix of performance on reasoning datasets like GSM8k and conversational tasks.
 
 
 
 
 
 
 
 
 
 
 
27
  dtype: float16
28
  ```
29
 
30
+ Models chosen to achieve a mix of performance on reasoning datasets like GSM8k and conversational tasks.
31
+
32
+ Evaluation results:
33
+
34
+ | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
35
+ | --- | --- | --- | --- | --- | --- | --- |
36
+ | 73.1 | 69.62 | 87.09 | 64.81 | 62.82 | 81.45 | 72.78 |
37
+
38
+ The model did achieve an improvement in TruthfulQA over `cookinai/CatMacaroni-Slerp` and GSM8K over `mncai/mistral-7b-dpo-v5`
39
+ which was the goal of the merge leading to an average score that was a better than both. It is unclear why the TruthfulQA metric
40
+ is still meaningfully lower than the base `mncai/mistral-7b-dpo-v5`.