Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published 10 days ago • 67
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3 • 27
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 86
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 74
StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 147
ChatCell: Facilitating Single-Cell Analysis with Natural Language Paper • 2402.08303 • Published Feb 13, 2024 • 14
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science Paper • 2402.04247 • Published Feb 6, 2024 • 2
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents Paper • 2311.11797 • Published Nov 20, 2023 • 2
QTSumm: A New Benchmark for Query-Focused Table Summarization Paper • 2305.14303 • Published May 23, 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity Paper • 2310.07521 • Published Oct 11, 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 11
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Paper • 2307.16789 • Published Jul 31, 2023 • 100
Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers Paper • 2305.14987 • Published May 24, 2023 • 1
RWKV: Reinventing RNNs for the Transformer Era Paper • 2305.13048 • Published May 22, 2023 • 19
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts Paper • 2202.01279 • Published Feb 2, 2022
Crosslingual Generalization through Multitask Finetuning Paper • 2211.01786 • Published Nov 3, 2022 • 2
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 32
DART: Open-Domain Structured Data Record to Text Generation Paper • 2007.02871 • Published Jul 6, 2020
Weaver: Foundation Models for Creative Writing Paper • 2401.17268 • Published Jan 30, 2024 • 46
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks Paper • 2311.09835 • Published Nov 16, 2023 • 11