Papers
arxiv:2509.13683

Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Published on Sep 17
· Submitted by Suyuchen Wang on Sep 18
Authors:
,
,
,
,
,

Abstract

CARE, a retrieval-augmented reasoning framework, enhances LLMs by integrating in-context evidence, improving retrieval accuracy and answer generation performance.

AI-generated summary

Large language models (LLMs) often struggle with context fidelity, producing inconsistent answers when responding to questions based on provided information. Existing approaches either rely on expensive supervised fine-tuning to generate evidence post-answer or train models to perform web searches without necessarily improving utilization of the given context. We propose CARE, a novel native retrieval-augmented reasoning framework that teaches LLMs to explicitly integrate in-context evidence within their reasoning process with the model's own retrieval capabilities. Our method requires limited labeled evidence data while significantly enhancing both retrieval accuracy and answer generation performance through strategically retrieved in-context tokens in the reasoning chain. Extensive experiments on multiple real-world and counterfactual QA benchmarks demonstrate that our approach substantially outperforms supervised fine-tuning, traditional retrieval-augmented generation methods, and external retrieval solutions. This work represents a fundamental advancement in making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks.

Community

Paper author Paper submitter

We introduce CARE, a framework that solves the persistent context hallucination problem by making LLMs perform retrieval and reasoning as a unified process. Instead of relying on external retrievers or post-hoc evidence generation, our approach trains models to dynamically identify relevant context spans and weave them directly into their reasoning chains using special <retrieval> tokens. The paper has been accepted as a main conference paper at EMNLP 2025.

Why separate retrieval from reasoning when LLMs already understand language well enough to do both simultaneously? CARE uses a two-phase training approach:supervised fine-tuning on reasoning chains enriched with golden evidence, followed by RLVR with retrieval-aware rewards and curriculum learning.

Results show substantial improvements over traditional RAG and online search methods across multiple QA benchmarks, with particularly strong gains on counterfactual scenarios. The approach eliminates external API overhead while requiring minimal labeled evidence data to train on.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 3

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.13683 in a Space README.md to link it from this page.

Collections including this paper 1