ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition Paper • 2503.21248 • Published Mar 27 • 20
Mirror: A Universal Framework for Various Information Extraction Tasks Paper • 2311.05419 • Published Nov 9, 2023
Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark Paper • 2405.08355 • Published May 14, 2024
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models Paper • 2410.11805 • Published Oct 15, 2024 • 14
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Paper • 2503.16779 • Published Mar 21 • 1
LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published Jan 8 • 37
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM Paper • 2408.07246 • Published Aug 14, 2024 • 22
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6, 2024 • 87
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs Paper • 2407.10058 • Published Jul 14, 2024 • 32