EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models Paper • 2312.06281 • Published Dec 11, 2023 • 2
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy Paper • 2508.07485 • Published Aug 10 • 10
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models Paper • 2510.15061 • Published Oct 16 • 1