ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition Paper • 2503.21248 • Published 25 days ago • 20
Mirror: A Universal Framework for Various Information Extraction Tasks Paper • 2311.05419 • Published Nov 9, 2023
Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark Paper • 2405.08355 • Published May 14, 2024
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models Paper • 2410.11805 • Published Oct 15, 2024 • 14
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Paper • 2503.16779 • Published about 1 month ago • 1
view post Post 2806 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)Critic-V has been accepted by CVPR2025!Bonus! VRI-160K uploaded now! di-zhang-fdu/R1-Vision-Reasoning-Instructions See translation 🔥 4 4 + Reply
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14 • 66
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published Jan 8 • 15