Papers
arxiv:2506.11930

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

Published on Jun 13
· Submitted by Dongwei on Jun 16
#1 Paper of the day
Authors:
,
,
,

Abstract

LLMs show resistance to feedback, termed feedback friction, even under ideal conditions, and sampling-based strategies only partially mitigate this issue.

AI-generated summary

Recent studies have shown LLMs possess some ability to improve their responses when given external feedback. However, it remains unclear how effectively and thoroughly these models can incorporate extrinsic feedback. In an ideal scenario, if LLMs receive near-perfect and complete feedback, we would expect them to fully integrate the feedback and change their incorrect answers to correct ones. In this paper, we systematically investigate LLMs' ability to incorporate feedback by designing a controlled experimental environment. For each problem, a solver model attempts a solution, then a feedback generator with access to near-complete ground-truth answers produces targeted feedback, after which the solver tries again. We evaluate this pipeline across a diverse range of tasks, including math reasoning, knowledge reasoning, scientific reasoning, and general multi-domain evaluations with state-of-the-art language models including Claude 3.7 (with and without extended thinking). Surprisingly, even under these near-ideal conditions, solver models consistently show resistance to feedback, a limitation that we term FEEDBACK FRICTION. To mitigate this limitation, we experiment with sampling-based strategies like progressive temperature increases and explicit rejection of previously attempted incorrect answers, which yield improvements but still fail to help models achieve target performance. We also perform a rigorous exploration of potential causes of FEEDBACK FRICTION, ruling out factors such as model overconfidence and data familiarity. We hope that highlighting this issue in LLMs and ruling out several apparent causes will help future research in self-improvement.

Community

Paper submitter

Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it?

We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

please check: Feedback Friction: Evaluating How Effectively Large Language Models Incorporate External Feedback https://medium.com/ai-artistry/feedback-friction-evaluating-how-effectively-large-language-models-incorporate-external-feedback-484bf27a96a3

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.11930 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.11930 in a Space README.md to link it from this page.

Collections including this paper 1