Submitted by YongdingTao 2 Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models Peking University 3 2