ask to learn: Question Solver Agent, Judge Agent, SFT&RL data
#13
by
XinglinZhao
- opened
After reading the WebResearcher paper https://arxiv.org/pdf/2509.13309, I have a few questions:
- what model does the Question Solver Agent / Judge Agent use?
I assume the DeepResearch-30B-A3B is trained from data generated from WebFrontier. So wonder if Question Solver Agent / Judge Agent is a bigger & better model, and DeepResearch-30B-A3B effectively learns from these models.
Second question is: do SFT and RL use the same dataset? it's unclear in the paper.