2 new incorrect (100%) scores on top of the leaderboard in Validation Results (Master Agent & Master Agent 2025)

#60
by Rocky125 - opened

I'd like to report that there are 2 new entries at the top of the Validation Results leaderboard with suspicious 100% scores: Master Agent and Master Agent 2025. These appear to be incorrect entries similar to the previous case with the tt4 agent (https://huggingface.co/spaces/gaia-benchmark/leaderboard/discussions/27), where incomplete submissions were mistakenly counted as correct answers.

Could the GAIA benchmark team please review these entries and remove them if they're found to be invalid? This would help maintain the integrity of the leaderboard. Thank you for your attention to this matter.

Hi! Thanks for the report!
People have indeed been cheating a lot on the validation set, so we decided to remove it since performance of real models on it was very high already and it was no longer indicative.

clefourrier changed discussion status to closed

Sign up or log in to comment