Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Abstract
CARE, a retrieval-augmented reasoning framework, enhances LLMs by integrating in-context evidence, improving retrieval accuracy and answer generation performance.
Large language models (LLMs) often struggle with context fidelity, producing inconsistent answers when responding to questions based on provided information. Existing approaches either rely on expensive supervised fine-tuning to generate evidence post-answer or train models to perform web searches without necessarily improving utilization of the given context. We propose CARE, a novel native retrieval-augmented reasoning framework that teaches LLMs to explicitly integrate in-context evidence within their reasoning process with the model's own retrieval capabilities. Our method requires limited labeled evidence data while significantly enhancing both retrieval accuracy and answer generation performance through strategically retrieved in-context tokens in the reasoning chain. Extensive experiments on multiple real-world and counterfactual QA benchmarks demonstrate that our approach substantially outperforms supervised fine-tuning, traditional retrieval-augmented generation methods, and external retrieval solutions. This work represents a fundamental advancement in making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks.
Community
We introduce CARE, a framework that solves the persistent context hallucination problem by making LLMs perform retrieval and reasoning as a unified process. Instead of relying on external retrievers or post-hoc evidence generation, our approach trains models to dynamically identify relevant context spans and weave them directly into their reasoning chains using special <retrieval>
tokens. The paper has been accepted as a main conference paper at EMNLP 2025.
Why separate retrieval from reasoning when LLMs already understand language well enough to do both simultaneously? CARE uses a two-phase training approach:supervised fine-tuning on reasoning chains enriched with golden evidence, followed by RLVR with retrieval-aware rewards and curriculum learning.
Results show substantial improvements over traditional RAG and online search methods across multiple QA benchmarks, with particularly strong gains on counterfactual scenarios. The approach eliminates external API overhead while requiring minimal labeled evidence data to train on.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Injecting External Knowledge into the Reasoning Process Enhances Retrieval-Augmented Generation (2025)
- From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs (2025)
- EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes (2025)
- UR2: Unify RAG and Reasoning through Reinforcement Learning (2025)
- GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning (2025)
- Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation (2025)
- Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 3
Datasets citing this paper 3
Spaces citing this paper 0
No Space linking this paper