Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 25 | acc | 0.1775 | ± | 0.0112 |
none | 25 | acc_norm | 0.2167 | ± | 0.0120 | ||
truthfulqa_mc2 | 2 | none | 0 | acc | 0.4689 | ± | 0.0156 |
winogrande | 1 | none | 5 | acc | 0.5122 | ± | 0.014 |
hellaswag | 1 | none | 10 | acc | 0.2697 | ± | 0.0044 |
none | 10 | acc_norm | 0.2827 | ± | 0.0045 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
abstract_algebra | 0 | none | 5 | acc | 0.2200 | ± | 0.0416 |
anatomy | 0 | none | 5 | acc | 0.3333 | ± | 0.0407 |
astronomy | 0 | none | 5 | acc | 0.1776 | ± | 0.0311 |
business_ethics | 0 | none | 5 | acc | 0.2100 | ± | 0.0409 |
clinical_knowledge | 0 | none | 5 | acc | 0.2528 | ± | 0.0267 |
college_biology | 0 | none | 5 | acc | 0.2778 | ± | 0.0375 |
college_chemistry | 0 | none | 5 | acc | 0.1800 | ± | 0.0386 |
college_computer_science | 0 | none | 5 | acc | 0.1900 | ± | 0.0394 |
college_mathematics | 0 | none | 5 | acc | 0.2200 | ± | 0.0416 |
college_medicine | 0 | none | 5 | acc | 0.1965 | ± | 0.0303 |
college_physics | 0 | none | 5 | acc | 0.2451 | ± | 0.0428 |
computer_security | 0 | none | 5 | acc | 0.2000 | ± | 0.0402 |
conceptual_physics | 0 | none | 5 | acc | 0.2511 | ± | 0.0283 |
econometrics | 0 | none | 5 | acc | 0.2719 | ± | 0.0419 |
electrical_engineering | 0 | none | 5 | acc | 0.2138 | ± | 0.0342 |
elementary_mathematics | 0 | none | 5 | acc | 0.2460 | ± | 0.0222 |
formal_logic | 0 | none | 5 | acc | 0.1905 | ± | 0.0351 |
global_facts | 0 | none | 5 | acc | 0.1500 | ± | 0.0359 |
high_school_biology | 0 | none | 5 | acc | 0.3000 | ± | 0.0261 |
high_school_chemistry | 0 | none | 5 | acc | 0.2562 | ± | 0.0307 |
high_school_computer_science | 0 | none | 5 | acc | 0.2800 | ± | 0.0451 |
high_school_european_history | 0 | none | 5 | acc | 0.2788 | ± | 0.0350 |
high_school_geography | 0 | none | 5 | acc | 0.3232 | ± | 0.0333 |
high_school_government_and_politics | 0 | none | 5 | acc | 0.3212 | ± | 0.0337 |
high_school_macroeconomics | 0 | none | 5 | acc | 0.3308 | ± | 0.0239 |
high_school_mathematics | 0 | none | 5 | acc | 0.2593 | ± | 0.0267 |
high_school_microeconomics | 0 | none | 5 | acc | 0.2815 | ± | 0.0292 |
high_school_physics | 0 | none | 5 | acc | 0.2384 | ± | 0.0348 |
high_school_psychology | 0 | none | 5 | acc | 0.2716 | ± | 0.0191 |
high_school_statistics | 0 | none | 5 | acc | 0.4769 | ± | 0.0341 |
high_school_us_history | 0 | none | 5 | acc | 0.2598 | ± | 0.0308 |
high_school_world_history | 0 | none | 5 | acc | 0.2194 | ± | 0.0269 |
human_aging | 0 | none | 5 | acc | 0.2197 | ± | 0.0278 |
human_sexuality | 0 | none | 5 | acc | 0.2748 | ± | 0.0392 |
international_law | 0 | none | 5 | acc | 0.3306 | ± | 0.0429 |
jurisprudence | 0 | none | 5 | acc | 0.2130 | ± | 0.0396 |
logical_fallacies | 0 | none | 5 | acc | 0.2331 | ± | 0.0332 |
machine_learning | 0 | none | 5 | acc | 0.2232 | ± | 0.0395 |
management | 0 | none | 5 | acc | 0.2039 | ± | 0.0399 |
marketing | 0 | none | 5 | acc | 0.1966 | ± | 0.0260 |
medical_genetics | 0 | none | 5 | acc | 0.3000 | ± | 0.0461 |
miscellaneous | 0 | none | 5 | acc | 0.2580 | ± | 0.0156 |
moral_disputes | 0 | none | 5 | acc | 0.1850 | ± | 0.0209 |
moral_scenarios | 0 | none | 5 | acc | 0.2380 | ± | 0.0142 |
nutrition | 0 | none | 5 | acc | 0.3039 | ± | 0.0263 |
philosophy | 0 | none | 5 | acc | 0.1929 | ± | 0.0224 |
prehistory | 0 | none | 5 | acc | 0.2160 | ± | 0.0229 |
professional_accounting | 0 | none | 5 | acc | 0.2518 | ± | 0.0259 |
professional_law | 0 | none | 5 | acc | 0.2419 | ± | 0.0109 |
professional_medicine | 0 | none | 5 | acc | 0.4375 | ± | 0.0301 |
professional_psychology | 0 | none | 5 | acc | 0.2190 | ± | 0.0167 |
public_relations | 0 | none | 5 | acc | 0.2273 | ± | 0.0401 |
security_studies | 0 | none | 5 | acc | 0.3633 | ± | 0.0308 |
sociology | 0 | none | 5 | acc | 0.2338 | ± | 0.0299 |
us_foreign_policy | 0 | none | 5 | acc | 0.2900 | ± | 0.0456 |
virology | 0 | none | 5 | acc | 0.2169 | ± | 0.0321 |
world_religions | 0 | none | 5 | acc | 0.1930 | ± | 0.0303 |
- Downloads last month
- 15
Inference API (serverless) does not yet support model repos that contain custom code.