abideen commited on
Commit
c31b9a8
·
verified ·
1 Parent(s): e49c21c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md CHANGED
@@ -198,4 +198,96 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
198
 
199
  [More Information Needed]
200
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
 
 
198
 
199
  [More Information Needed]
200
 
201
+ Agieval
202
+
203
+ | Task | Version | Metric | Value | | StdErr |
204
+ |-------------------------------------------|---------|--------|-------|---|---------|
205
+ | agieval\_aqua\_rat | 0 | acc | 24.02 | _ | 2.69 |
206
+ | agieval\_aqua\_rat | 0 | acc\_norm | 24.02 | _ | 2.69 |
207
+ | agieval\_logiqa\_en | 0 | acc | 23.20 | _ | 1.66 |
208
+ | agieval\_logiqa\_en | 0 | acc\_norm | 24.42 | _ | 1.69 |
209
+ | agieval\_lsat\_ar | 0 | acc | 18.26 | _ | 2.55 |
210
+ | agieval\_lsat\_ar | 0 | acc\_norm | 18.70 | _ | 2.58 |
211
+ | agieval\_lsat\_lr | 0 | acc | 22.35 | _ | 1.85 |
212
+ | agieval\_lsat\_lr | 0 | acc\_norm | 23.53 | _ | 1.88 |
213
+ | agieval\_lsat\_rc | 0 | acc | 20.82 | _ | 2.48 |
214
+ | agieval\_lsat\_rc | 0 | acc\_norm | 20.07 | _ | 2.45 |
215
+ | agieval\_sat\_en | 0 | acc | 32.52 | _ | 3.27 |
216
+ | agieval\_sat\_en | 0 | acc\_norm | 32.52 | _ | 3.27 |
217
+ | agieval\_sat\_en\_without\_passage | 0 | acc | 25.73 | _ | 3.05 |
218
+ | agieval\_sat\_en\_without\_passage | 0 | acc\_norm | 24.27 | _ | 2.99 |
219
+ | agieval\_sat\_math | 0 | acc | 25.00 | _ | 2.93 |
220
+ | agieval\_sat\_math | 0 | acc\_norm | 20.91 | _ | 2.75 |
221
+ Average: 24.11
222
+
223
+ GPT4ALL
224
+
225
+ | Task | Version | Metric | Value | | StdErr |
226
+ |----------------------|---------|--------|-------|---|---------|
227
+ | arc\_challenge | 0 | acc | 21.77 | _ | 1.21 |
228
+ | arc\_challenge | 0 | acc\_norm | 24.15 | _ | 1.25 |
229
+ | arc\_easy | 0 | acc | 37.37 | _ | 0.99 |
230
+ | arc\_easy | 0 | acc\_norm | 36.95 | _ | 0.99 |
231
+ | boolq | 1 | acc | 65.60 | _ | 0.83 |
232
+ | hellaswag | 0 | acc | 34.54 | _ | 0.47 |
233
+ | hellaswag | 0 | acc\_norm | 40.54 | _ | 0.49 |
234
+ | openbookqa | 0 | acc | 15.00 | _ | 1.59 |
235
+ | openbookqa | 0 | acc\_norm | 27.40 | _ | 2.00 |
236
+ | piqa | 0 | acc | 60.88 | _ | 1.14 |
237
+ | piqa | 0 | acc\_norm | 60.55 | _ | 1.14 |
238
+ | winogrande | 0 | acc | 50.91 | _ | 1.41 |
239
+ Average: 40.01
240
+
241
+ BigBench
242
+
243
+ | Task | Version | Metric | Value | Std Err |
244
+ |-----------------------------------|---------|--------|--------|---------|
245
+ | bigbench\_causal\_judgement | 0 | MCG | 50 | 2.26 |
246
+ | bigbench\_date\_understanding | 0 | MCG | 49.14 | 2.18 |
247
+ | bigbench\_disambiguation\_qa | 0 | MCG | 49.31 | 2.74 |
248
+ | bigbench\_geometric\_shapes | 0 | MCG | 14.18 | 1.37 |
249
+ | bigbench\_logical\_deduction\_5objs | 0 | MCG | 49.41 | 2.73 |
250
+ | bigbench\_logical\_deduction\_7objs | 0 | MCG | 41.48 | 2.46 |
251
+ | bigbench\_logical\_deduction\_3objs | 0 | MCG | 69.33 | 2.75 |
252
+ | bigbench\_movie\_recommendation | 0 | MCG | 51.71 | 2.25 |
253
+ | bigbench\_navigate | 0 | MCG | 50 | 1.58 |
254
+ | bigbench\_reasoning\_colored\_obj | 0 | MCG | 51.92 | 0.99 |
255
+ | bigbench\_ruin\_names | 0 | MCG | 48.14 | 2.01 |
256
+ | bigbench\_salient\_trans\_err\_detec | 0 | MCG | 39.92 | 1.2 |
257
+ | bigbench\_snarks | 0 | MCG | 64.14 | 3.71 |
258
+ | bigbench\_sports\_understanding | 0 | MCG | 55.31 | 1.59 |
259
+ | bigbench\_temporal\_sequences | 0 | MCG | 46.92 | 1.4 |
260
+ | bigbench\_tsk\_shuff\_objs\_5 | 0 | MCG | 25.04 | 1.01 |
261
+ | bigbench\_tsk\_shuff\_objs\_7 | 0 | MCG | 15.04 | 0.72 |
262
+ | bigbench\_tsk\_shuff\_objs\_3 | 0 | MCG | 55.33 | 2.75 |
263
+ Average: 44.75
264
+
265
+ TruthfulQA
266
+
267
+ | Task | Version | Metric | Value | Std Err |
268
+ |----------------------------------|---------|--------|--------|----------|
269
+ | truthfulqa\_mc | 1 | mc1 | 30.11 | 1.61 |
270
+ | truthfulqa\_mc | 1 | mc2 | 47.69 | 1.61 |
271
+ Average: 38.90
272
+
273
+
274
+ # Openllm Benchmark
275
+
276
+ | Task |Version| Metric |Value| |Stderr|
277
+ |-------------|------:|--------|----:|---|-----:|
278
+ |arc_challenge| 0|acc |40.44|± | 1.43|
279
+ | | |acc_norm|43.81|± | 1.34|
280
+ |hellaswag | 0|acc |48.1 |± | 0.45|
281
+ | | |acc_norm|62.73|± | 0.32|
282
+ |gsm8k | 0|acc |5.6 |± | 0.6 |
283
+ |winogrande | 0|acc |60.91|± | 1.3 |
284
+ |mmlu | 0|acc |37.62 |±| 0.6 |
285
+
286
+ Average: 73.5%
287
+
288
+ ### TruthfulQA
289
+ | Task |Version|Metric|Value| |Stderr|
290
+ |-------------|------:|------|----:|---|-----:|
291
+ |truthfulqa_mc| 1|mc1 |29.00|± | 1.58|
292
+ | | |mc2 |45.83|± | 1.59|
293