Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation Paper • 2505.00612 • Published May 1 • 9 • 2