Mechanistic Interpretability Benchmark

university

https://mib-bench.github.io

AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity

hij authored a paper 5 days ago

Blackbox Model Provenance via Palimpsestic Membership Inference

amueller updated a Space about 1 month ago

mib-bench/leaderboard

hij authored a paper 3 months ago

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

View all activity

mib-bench 's datasets 7

mib-bench/ravel

Viewer • Updated May 31 • 117k • 42

mib-bench/arithmetic_subtraction

Viewer • Updated May 31 • 20.9k • 42

mib-bench/arithmetic_addition

Viewer • Updated May 31 • 40.4k • 69

mib-bench/ioi

Viewer • Updated May 29 • 21k • 1.52k

mib-bench/arc_easy

Viewer • Updated Jan 25 • 4.01k • 99

mib-bench/arc_challenge

Viewer • Updated Jan 25 • 2k • 25

mib-bench/copycolors_mcqa

Viewer • Updated Jan 16 • 1.89k • 286