Outstanding research in LM interpretability and evaluation, summarized
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 164 -
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Paper • 2505.23183 • Published • 2 -
Improved Representation Steering for Language Models
Paper • 2505.20809 • Published • 1 -
SAEs Are Good for Steering -- If You Select the Right Features
Paper • 2505.20063 • Published • 1