Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
2
2
Boxi Yu
Bertsekas
Follow
0 followers
·
1 following
https://boxiyu.github.io/
BoshCavendish
BoxiYu
AI & ML interests
Coding Agent, Automated Operator
Recent Activity
authored
a paper
3 days ago
How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs
authored
a paper
3 days ago
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
upvoted
a
paper
4 days ago
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
View all activity
Organizations
None yet
Papers
2
arxiv:
2506.09289
arxiv:
2501.10711
models
0
None public yet
datasets
2
Sort: Recently updated
Bertsekas/SWE-Bench_Lite_UTBoost
Viewer
•
Updated
6 days ago
•
300
Bertsekas/SWE-Bench_Verified_UTBoost
Viewer
•
Updated
6 days ago
•
500