DavidGF commited on
Commit
5b927d6
·
1 Parent(s): 722eddd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -38
README.md CHANGED
@@ -75,44 +75,38 @@ Please tell me about how merged models can benefit from existent top-models.<|im
75
 
76
  **Language Model evaluation Harness**
77
  ```
78
- | Task |Version| Metric | Value | |Stderr|
79
- |--------------------|------:|--------|------:|---|-----:|
80
- |arc_challenge | 0|acc | 0.5785|± |0.0144|
81
- | | |acc_norm| 0.5939|± |0.0144|
82
- |arc_easy | 0|acc | 0.8392|± |0.0075|
83
- | | |acc_norm| 0.8207|± |0.0079|
84
- |boolq | 1|acc | 0.8734|± |0.0058|
85
- |copa | 0|acc | 0.9000|± |0.0302|
86
- |hellaswag | 0|acc | 0.6429|± |0.0048|
87
- | | |acc_norm| 0.8259|± |0.0038|
88
- |lambada_openai_mt_de| 0|ppl |65.8632|± |4.5390|
89
- | | |acc | 0.3910|± |0.0068|
90
- |lambada_standard | 0|ppl | 3.6267|± |0.0831|
91
- | | |acc | 0.6831|± |0.0065|
92
- |multirc | 1|acc | 0.1448|± |0.0114|
93
- |openbookqa | 0|acc | 0.3500|± |0.0214|
94
- | | |acc_norm| 0.4640|± |0.0223|
95
- |piqa | 0|acc | 0.8183|± |0.0090|
96
- | | |acc_norm| 0.8346|± |0.0087|
97
- |race | 1|acc | 0.4641|± |0.0154|
98
- |rte | 0|acc | 0.7148|± |0.0272|
99
- |truthfulqa_mc | 1|mc1 | 0.3672|± |0.0169|
100
- | | |mc2 | 0.5347|± |0.0154|
101
- |webqs | 0|acc | 0.1422|± |0.0078|
102
- |wic | 0|acc | 0.6646|± |0.0187|
103
- |winogrande | 0|acc | 0.7419|± |0.0123|
104
- |wsc | 0|acc | 0.6058|± |0.0482|
105
- |drop | 1|em | 0.0430|± |0.0021|
106
- | | |f1 | 0.1860|± |0.0028|
107
- |triviaqa | 3|em | 0.5687|± |0.0037|
108
- |wmt16-de-en | 0|bleu |40.2285|± |0.3789|
109
- | | |chrf | 0.6432|± |0.0027|
110
- | | |ter | 0.4709|± |0.0037|
111
- |wmt16-en-de | 0|bleu |24.9193|± |0.3536|
112
- | | |chrf | 0.5267|± |0.0033|
113
- | | |ter | 0.6506|± |0.0041|
114
- |xnli_de | 0|acc | 0.4451|± |0.0070|
115
- |xnli_en | 0|acc | 0.5581|± |0.0070|
116
  ```
117
 
118
  ## Disclaimer
 
75
 
76
  **Language Model evaluation Harness**
77
  ```
78
+ |arc_challenge | 0|acc | 0.5555|± |0.0145|
79
+ | | |acc_norm| 0.5956|± |0.0143|
80
+ |arc_easy | 0|acc | 0.8388|± |0.0075|
81
+ | | |acc_norm| 0.8262|± |0.0078|
82
+ |boolq | 1|acc | 0.8725|± |0.0058|
83
+ |copa | 0|acc | 0.9100|± |0.0288|
84
+ |hellaswag | 0|acc | 0.6285|± |0.0048|
85
+ | | |acc_norm| 0.8125|± |0.0039|
86
+ |lambada_openai_mt_de| 0|ppl |45.7314|± |2.8280|
87
+ | | |acc | 0.4141|± |0.0069|
88
+ |lambada_standard | 0|ppl | 3.5467|± |0.0779|
89
+ | | |acc | 0.6922|± |0.0064|
90
+ |multirc | 1|acc | 0.1459|± |0.0114|
91
+ |openbookqa | 0|acc | 0.3640|± |0.0215|
92
+ | | |acc_norm| 0.4600|± |0.0223|
93
+ |piqa | 0|acc | 0.8123|± |0.0091|
94
+ | | |acc_norm| 0.8281|± |0.0088|
95
+ |race | 1|acc | 0.4507|± |0.0154|
96
+ |rte | 0|acc | 0.7040|± |0.0275|
97
+ |truthfulqa_mc | 1|mc1 | 0.3329|± |0.0165|
98
+ | | |mc2 | 0.4915|± |0.0150|
99
+ |webqs | 0|acc | 0.1924|± |0.0087|
100
+ |wic | 0|acc | 0.5752|± |0.0196|
101
+ |winogrande | 0|acc | 0.7301|± |0.0125|
102
+ |wsc | 0|acc | 0.6154|± |0.0479|
103
+ |drop | 1|em | 0.2140|± |0.0042|
104
+ | | |f1 | 0.4011|± |0.0041|
105
+ |triviaqa | 3|em | 0.6259|± |0.0036|
106
+ |wmt16-de-en | 0|bleu |39.2043|± |0.3982|
107
+ |wmt16-en-de | 0|bleu |25.5745|± |0.3492|
108
+ |xnli_de | 0|acc | 0.4547|± |0.0070|
109
+ |xnli_en | 0|acc | 0.5595|± |0.0070|
 
 
 
 
 
 
110
  ```
111
 
112
  ## Disclaimer