ParaLlama-p-small / README.md
crumb's picture
Update README.md
73e3245 verified
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 25 acc 0.1741 ± 0.0111
none 25 acc_norm 0.2014 ± 0.0117
truthfulqa_mc2 2 none 0 acc 0.4726 ± 0.0156
winogrande 1 none 5 acc 0.4964 ± 0.0141
hellaswag 1 none 10 acc 0.2725 ± 0.0044
none 10 acc_norm 0.2794 ± 0.0045
0.2563298245614035
Tasks Version Filter n-shot Metric Value Stderr
abstract_algebra 0 none 5 acc 0.2900 ± 0.0456
anatomy 0 none 5 acc 0.2815 ± 0.0389
astronomy 0 none 5 acc 0.1776 ± 0.0311
business_ethics 0 none 5 acc 0.1700 ± 0.0378
clinical_knowledge 0 none 5 acc 0.2113 ± 0.0251
college_biology 0 none 5 acc 0.2292 ± 0.0351
college_chemistry 0 none 5 acc 0.1800 ± 0.0386
college_computer_science 0 none 5 acc 0.3000 ± 0.0461
college_mathematics 0 none 5 acc 0.2800 ± 0.0451
college_medicine 0 none 5 acc 0.2023 ± 0.0306
college_physics 0 none 5 acc 0.2451 ± 0.0428
computer_security 0 none 5 acc 0.2800 ± 0.0451
conceptual_physics 0 none 5 acc 0.2723 ± 0.0291
econometrics 0 none 5 acc 0.2456 ± 0.0405
electrical_engineering 0 none 5 acc 0.2207 ± 0.0346
elementary_mathematics 0 none 5 acc 0.2540 ± 0.0224
formal_logic 0 none 5 acc 0.1587 ± 0.0327
global_facts 0 none 5 acc 0.2200 ± 0.0416
high_school_biology 0 none 5 acc 0.2161 ± 0.0234
high_school_chemistry 0 none 5 acc 0.2808 ± 0.0316
high_school_computer_science 0 none 5 acc 0.2000 ± 0.0402
high_school_european_history 0 none 5 acc 0.2303 ± 0.0329
high_school_geography 0 none 5 acc 0.2828 ± 0.0321
high_school_government_and_politics 0 none 5 acc 0.2539 ± 0.0314
high_school_macroeconomics 0 none 5 acc 0.3282 ± 0.0238
high_school_mathematics 0 none 5 acc 0.2630 ± 0.0268
high_school_microeconomics 0 none 5 acc 0.3487 ± 0.0310
high_school_physics 0 none 5 acc 0.3245 ± 0.0382
high_school_psychology 0 none 5 acc 0.2514 ± 0.0186
high_school_statistics 0 none 5 acc 0.4722 ± 0.0340
high_school_us_history 0 none 5 acc 0.3039 ± 0.0323
high_school_world_history 0 none 5 acc 0.2616 ± 0.0286
human_aging 0 none 5 acc 0.2377 ± 0.0286
human_sexuality 0 none 5 acc 0.2672 ± 0.0388
international_law 0 none 5 acc 0.2397 ± 0.0390
jurisprudence 0 none 5 acc 0.2593 ± 0.0424
logical_fallacies 0 none 5 acc 0.2331 ± 0.0332
machine_learning 0 none 5 acc 0.3214 ± 0.0443
management 0 none 5 acc 0.1748 ± 0.0376
marketing 0 none 5 acc 0.2265 ± 0.0274
medical_genetics 0 none 5 acc 0.3000 ± 0.0461
miscellaneous 0 none 5 acc 0.2401 ± 0.0153
moral_disputes 0 none 5 acc 0.2399 ± 0.0230
moral_scenarios 0 none 5 acc 0.2358 ± 0.0142
nutrition 0 none 5 acc 0.2320 ± 0.0242
philosophy 0 none 5 acc 0.2026 ± 0.0228
prehistory 0 none 5 acc 0.2284 ± 0.0234
professional_accounting 0 none 5 acc 0.2376 ± 0.0254
professional_law 0 none 5 acc 0.2379 ± 0.0109
professional_medicine 0 none 5 acc 0.4412 ± 0.0302
professional_psychology 0 none 5 acc 0.2500 ± 0.0175
public_relations 0 none 5 acc 0.1818 ± 0.0369
security_studies 0 none 5 acc 0.3429 ± 0.0304
sociology 0 none 5 acc 0.2438 ± 0.0304
us_foreign_policy 0 none 5 acc 0.2900 ± 0.0456
virology 0 none 5 acc 0.2892 ± 0.0353
world_religions 0 none 5 acc 0.2222 ± 0.0319