Evaluation & Benchmark Methodology - a nodz Collection

nodz 's Collections

Evaluation & Benchmark Methodology

Evaluation & Benchmark Methodology

updated Aug 13, 2024

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Paper • 2408.04682 • Published Aug 8, 2024 • 18