Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published 10 days ago • 67
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published 10 days ago • 67
LocAgent: Graph-Guided LLM Agents for Code Localization Paper • 2503.09089 • Published Mar 12 • 13
LocAgent: Graph-Guided LLM Agents for Code Localization Paper • 2503.09089 • Published Mar 12 • 13 • 2
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10 • 16
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10 • 16 • 3
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3 • 27
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 86
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 74
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 74
StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 147
ChatCell: Facilitating Single-Cell Analysis with Natural Language Paper • 2402.08303 • Published Feb 13, 2024 • 14
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science Paper • 2402.04247 • Published Feb 6, 2024 • 2
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents Paper • 2311.11797 • Published Nov 20, 2023 • 2
QTSumm: A New Benchmark for Query-Focused Table Summarization Paper • 2305.14303 • Published May 23, 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity Paper • 2310.07521 • Published Oct 11, 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 11
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Paper • 2307.16789 • Published Jul 31, 2023 • 100
Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers Paper • 2305.14987 • Published May 24, 2023 • 1
RWKV: Reinventing RNNs for the Transformer Era Paper • 2305.13048 • Published May 22, 2023 • 19