Running 539 539 Scaling test-time compute 📈 Enhance math problem solving by scaling test-time compute
Running 92 92 Nexus Function Calling Leaderboard 🐠 Visualize model performance on function calling tasks