-
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Paper • 2505.19253 • Published • 25 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 84 -
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Paper • 2505.21497 • Published • 91
Penghui Qi
QPHutu
AI & ML interests
None yet
Recent Activity
updated
a collection
about 20 hours ago
LLM Agent
updated
a collection
3 days ago
LLM Agent
upvoted
a
paper
8 days ago
Fostering Video Reasoning via Next-Event Prediction
Organizations
Collections
3
models
0
None public yet
datasets
0
None public yet