Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
MisakiWang 's Collections
IAI
Model
Data-driven
benchmark
speed up
Align
Agent

benchmark

updated Oct 17, 2024
Upvote
-

  • OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

    Paper • 2402.17553 • Published Feb 27, 2024 • 26

  • MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

    Paper • 2410.04698 • Published Oct 7, 2024 • 13
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs