DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25 • 28
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published May 22 • 57
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper • 2412.13018 • Published Dec 17, 2024 • 42