Easy2Hard-Bench Collection Easy2Hard-Bench offers six datasets with continuous difficulty ratings, enabling profiling of LLM performance and generalization across difficulties. • 7 items • Updated Jul 3, 2024
Correct-DPO Evaluations Collection Evaluations of Correct-DPO Experiments • 143 items • Updated May 21, 2024