Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple Judges Paper • 2508.00454 • Published 23 days ago • 9
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios Paper • 2307.13528 • Published Jul 25, 2023 • 1
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models Paper • 2406.09098 • Published Jun 13, 2024 • 1
InstructBioMol: Advancing Biomolecule Understanding and Design Following Human Instructions Paper • 2410.07919 • Published Oct 10, 2024 • 1
ERPO: Advancing Safety Alignment via Ex-Ante Reasoning Preference Optimization Paper • 2504.02725 • Published Apr 3 • 1
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition Paper • 2404.08008 • Published Apr 10, 2024 • 1