Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
JoshFourie 's Collections
LLM-Stuff
Healthcare
3d
Benchmarks
Transformers

Benchmarks

updated Mar 21, 2024
Upvote
-

  • Rethinking FID: Towards a Better Evaluation Metric for Image Generation

    Paper • 2401.09603 • Published Nov 30, 2023 • 18

  • LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

    Paper • 2402.10524 • Published Feb 16, 2024 • 24

  • Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

    Paper • 2402.14261 • Published Feb 22, 2024 • 11

  • RewardBench: Evaluating Reward Models for Language Modeling

    Paper • 2403.13787 • Published Mar 20, 2024 • 23

  • Evaluating Frontier Models for Dangerous Capabilities

    Paper • 2403.13793 • Published Mar 20, 2024 • 7
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs