ThinMQM (automated translation evaluation, MQM) model and data collection.
Runzhe Zhan
rzzhan
AI & ML interests
None yet
Recent Activity
upvoted a paper about 3 hours ago
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics upvoted a paper 20 days ago
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows updated a model 22 days ago
Simplified-Reasoning/SU-01