ScaleAI/PRBench
Viewer
•
Updated
•
1.9k
•
80
•
3
None defined yet.
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training