GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published Jun 6, 2024 • 23
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 46
Long-context LLMs Struggle with Long In-context Learning Paper • 2404.02060 • Published Apr 2, 2024 • 37