Gabriele Sarti's picture

Gabriele Sarti

gsarti

·

https://gsarti.com

AI & ML interests

Interpretability for generative language models

Recent Activity

upvoted a paper 4 days ago

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

updated a collection 4 days ago

🔍 Interpretability & Analysis of LMs

liked a Space 4 days ago

MonetLLM/monet-vd-1.4B-100BT-hf-viewer

View all activity

Organizations

upvoted 2 papers 4 days ago

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Paper • 2507.16795 • Published 7 days ago • 2

Monet: Mixture of Monosemantic Experts for Transformers

Paper • 2412.04139 • Published Dec 5, 2024 • 14

upvoted a collection 15 days ago

🥨 Bavarian NLP Papers

Awesome papers about Bavarian NLP • 9 items • Updated 15 days ago • 2

upvoted 2 papers 19 days ago

Can Interpretation Predict Behavior on Unseen Data?

Paper • 2507.06445 • Published 21 days ago • 1

Thought Anchors: Which LLM Reasoning Steps Matter?

Paper • 2506.19143 • Published Jun 23 • 11

upvoted an article 27 days ago

Article

Bringing Fusion Down to Earth: ML for Stellarator Optimization

By

•

27 days ago

• 70

upvoted a paper 28 days ago

TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs

Paper • 2506.23423 • Published 30 days ago • 1

upvoted a paper 29 days ago

Stochastic Parameter Decomposition

Paper • 2506.20790 • Published Jun 25 • 1

upvoted a collection about 1 month ago

ELI-Why

🧠 ELI-Why: Evaluating the Pedagogical Utility of Language Model Explanations ACL Findings 2025 • 4 items • Updated Jun 11 • 3

upvoted an article about 1 month ago

Article

Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub

By

and 6 others •

Jun 12

• 116

upvoted 2 papers about 2 months ago

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Paper • 2506.10920 • Published Jun 12 • 6

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Paper • 2506.03093 • Published Jun 3 • 2

upvoted a collection about 2 months ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 162

upvoted a paper about 2 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 176

upvoted 2 articles about 2 months ago

Article

The Transformers Library: standardizing model definitions

By

and 3 others •

May 15

• 116

Article

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

By

and 1 other •

Jun 2

• 24

upvoted a collection 2 months ago

FAMA

The First Large-Scale Open-Science Speech Foundation Model for English and Italian • 5 items • Updated May 30 • 10

upvoted 3 papers 2 months ago

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Paper • 2505.23183 • Published May 29 • 2

SAEs Are Good for Steering -- If You Select the Right Features

Paper • 2505.20063 • Published May 26 • 1

Mechanistic evaluation of Transformers and state space models

Paper • 2505.15105 • Published May 21 • 1