Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published 23 days ago • 10
CoverBench: A Challenging Benchmark for Complex Claim Verification Paper • 2408.03325 • Published Aug 6, 2024 • 15
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings Paper • 2407.20581 • Published Jul 30, 2024 • 24
Evaluating the Ripple Effects of Knowledge Editing in Language Models Paper • 2307.12976 • Published Jul 24, 2023 • 12
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty Paper • 2407.06071 • Published Jul 8, 2024 • 7
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 101 items • Updated 2 days ago • 97
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces Paper • 2406.11614 • Published Jun 17, 2024 • 5
Estimating Knowledge in Large Language Models Without Generating a Single Token Paper • 2406.12673 • Published Jun 18, 2024 • 7 • 1
Do Large Language Models Latently Perform Multi-Hop Reasoning? Paper • 2402.16837 • Published Feb 26, 2024 • 25
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models Paper • 2401.06102 • Published Jan 11, 2024 • 21
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers Paper • 2401.04695 • Published Jan 9, 2024 • 12