The test and validation set of BuildBench paper
ZEHUA ZHANG PRO
STEVENZHANG904
AI & ML interests
AI for Science, Multimodal ML, AI for Info Sec
Recent Activity
authored
a paper
7 days ago
When "Competency" in Reasoning Opens the Door to Vulnerability:
Jailbreaking LLMs via Novel Complex Ciphers
authored
a paper
7 days ago
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source
Software
upvoted
a
paper
7 days ago
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source
Software