AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper β’ 2504.08942 β’ Published 10 days ago β’ 27