PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10 • 62
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature Paper • 2408.15836 • Published Aug 28 • 11
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published Aug 23 • 20
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Paper • 2408.03837 • Published Aug 7 • 17