Boxi Yu
Bertsekas
ยท
AI & ML interests
Coding Agent, Automated Operator
Recent Activity
authored
a paper
3 days ago
How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For
LLMs
authored
a paper
3 days ago
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
upvoted
a
paper
4 days ago
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
Organizations
None yet