benchmark - a MisakiWang Collection

MisakiWang 's Collections

IAI

Model

Align

Agent

benchmark

updated Oct 17, 2024

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Paper • 2402.17553 • Published Feb 27, 2024 • 26
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Paper • 2410.04698 • Published Oct 7, 2024 • 13