GAIR/DeepResearcher-7b

Introduction

DeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers.

Model Details

  • License: Apache 2.0
  • Model type: Reinforcement learning-based LLM (Large Language Model).
  • Language(s): The model is designed for tasks in English.
  • Finetuned from model: The model is built using the Qwen2.5-7B-Instruct architecture .

Model Description

Model Sources

How to Get Started with the Model

To get started, you can visit the DeepResearcher repository on GitHub, where the model's code and setup instructions are provided .

Training Details

Training Data

The model was trained on open-domain question-answering datasets, including:

  • NaturalQuestions (NQ)
  • TriviaQA (TQ)
  • HotpotQA
  • 2Wiki MultiHopQA

Training Procedure

DeepResearcher was trained using reinforcement learning (RL) with the Group Relative Policy Optimization (GRPO) algorithm. It was tested in both in-domain (NQ, TQ, HotpotQA) and out-of-domain (Musique, Bamboogle, PopQA) settings .

Evaluation

Testing Data

The model was evaluated on several datasets, including:

  • NQ (Natural Questions)
  • TQ (TriviaQA)
  • HotpotQA
  • 2Wiki
  • Musique
  • Bamboogle
  • PopQA .

Results

DeepResearcher outperforms all baseline models, achieving a substantial improvement in task completion across the datasets, particularly in out-of-domain scenarios.

Citation

@misc{zheng2025deepresearcherscalingdeepresearch,
      title={DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments}, 
      author={Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu},
      year={2025},
      eprint={2504.03160},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.03160}, 
}
Downloads last month
94
Safetensors
Model size
7.62B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for GAIR/DeepResearcher-7b

Base model

Qwen/Qwen2.5-7B
Finetuned
(1156)
this model
Quantizations
1 model