ask to learn: Question Solver Agent, Judge Agent, SFT&RL data

#13

by XinglinZhao - opened Sep 23

Sep 23

After reading the WebResearcher paper https://arxiv.org/pdf/2509.13309, I have a few questions:

what model does the Question Solver Agent / Judge Agent use?

I assume the DeepResearch-30B-A3B is trained from data generated from WebFrontier. So wonder if Question Solver Agent / Judge Agent is a bigger & better model, and DeepResearch-30B-A3B effectively learns from these models.

Second question is: do SFT and RL use the same dataset? it's unclear in the paper.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment