Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Abstract
Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in Large Language Models (LLMs), research on assessing and improving instruction-following (IF) alignment within the RAG domain remains limited. To address this issue, we propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems. We start by manually crafting a minimal set of atomic instructions (<100) and developing combination rules to synthesize and verify complex instructions for a seed set. We then use supervised models for instruction rewriting while simultaneously generating code to automate the verification of instruction quality via a Python executor. Finally, we integrate these instructions with extensive RAG and general data samples, scaling up to a high-quality VIF-RAG-QA dataset (>100k) through automated processes. To further bridge the gap in instruction-following auto-evaluation for RAG systems, we introduce FollowRAG Benchmark, which includes approximately 3K test samples, covering 22 categories of general instruction constraints and four knowledge-intensive QA datasets. Due to its robust pipeline design, FollowRAG can seamlessly integrate with different RAG benchmarks. Using FollowRAG and eight widely-used IF and foundational abilities benchmarks for LLMs, we demonstrate that VIF-RAG markedly enhances LLM performance across a broad range of general instruction constraints while effectively leveraging its capabilities in RAG scenarios. Further analysis offers practical insights for achieving IF alignment in RAG systems. Our code and datasets are released at https://FollowRAG.github.io.
Community
TLDR: We present VIF-RAG, an automated, scalable, and verifiable framework that significantly enhances instruction-following alignment in RAG systems, backed by the FollowRAG Benchmark for thorough evaluation and practical insights.
VIF-RAG, the first automated, scalable, and verifiable data synthetic framework. VIFRAG uniquely combines augmented rewriting with diverjse validation processes to synthesize high-quality instruction-following alignment data from almost scratch (<100), scaling up to over 100K samples.
FollowRAG, the first benchmark designed to comprehensively evaluate LLM’s complex instructionfollowing abilities in RAG tasks. FollowRAG includes nearly 3K test samples, spanning four knowledgeintensive QA benchmarks and 22 types of constraints. Its design ensures seamless integration with various RAG benchmarks, providing strong scalability.
Our code and data are released at:
Project: https://FollowRAG.github.io
Arxiv: https://arxiv.org/pdf/2410.09584
Code: https://github.com/dongguanting/FollowRAG
VIF-RAG Dataset: https://huggingface.co/datasets/dongguanting/VIF-RAG-QA-110K
Please check out and follow!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation (2024)
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation (2024)
- Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models (2024)
- TableBench: A Comprehensive and Complex Benchmark for Table Question Answering (2024)
- Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 4
Spaces citing this paper 0
No Space linking this paper