FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios Paper • 2307.13528 • Published Jul 25, 2023 • 1
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 47
Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate Paper • 2401.16788 • Published Jan 30, 2024 • 1
Align on the Fly: Adapting Chatbot Behavior to Established Norms Paper • 2312.15907 • Published Dec 26, 2023 • 1
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published Jul 8, 2024 • 23
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18, 2024 • 16