Community Evals Feedback

#1
by burtenshaw - opened

The Hub provides a decentralized system for tracking model evaluation results. Benchmark datasets host leaderboards, and model repos store evaluation scores that automatically appear on both the model page and the benchmark’s leaderboard.

image

🔊 Let us know what you think of this feature:

Looks great! Could you provide instruction how to run locally evaluation using inspect ai on these 3 benchmarks?

OpenEvals org

Hey @djstrong you can run something like:

inspect eval hf/cais/hle --model hf/openai-community/gpt2

to run a local transformers model, here are the docs: https://inspect.aisi.org.uk/providers.html

@burtenshaw
Is it possible to configure a subset of the data to be closed and private. I think that would be super valuable

Sign up or log in to comment