Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
clefourrier
's Collections
Leaderboards and benchmarks ✨
LLM evaluation datasets
LLM evaluation datasets
updated
Nov 28, 2024
Upvote
10
facebook/anli
Viewer
•
Updated
Dec 21, 2023
•
169k
•
5.65k
•
43
codeparrot/apps
Viewer
•
Updated
Oct 20, 2022
•
20k
•
4.68k
•
174
allenai/ai2_arc
Viewer
•
Updated
Dec 21, 2023
•
7.79k
•
219k
•
202
EleutherAI/asdiv
Updated
Nov 2, 2023
•
946
•
4
facebook/babi_qa
Viewer
•
Updated
Jan 25, 2023
•
10.4k
•
467
•
7
heegyu/bbq
Viewer
•
Updated
Jul 14, 2023
•
58.5k
•
3.16k
•
15
nyu-mll/blimp
Viewer
•
Updated
Jan 23, 2024
•
67k
•
5.29k
•
37
AmazonScience/bold
Viewer
•
Updated
Oct 6, 2022
•
7.2k
•
992
•
15
google/boolq
Viewer
•
Updated
Jan 22, 2024
•
12.7k
•
18.6k
•
78
cam-cst/cbt
Viewer
•
Updated
Jan 16, 2024
•
687k
•
1.33k
•
16
aps/super_glue
Viewer
•
Updated
19 days ago
•
196k
•
129k
•
168
nyu-mll/glue
Viewer
•
Updated
Jan 30, 2024
•
1.49M
•
252k
•
419
google/civil_comments
Viewer
•
Updated
Jan 25, 2024
•
2M
•
1.44k
•
21
abisee/cnn_dailymail
Viewer
•
Updated
Jan 18, 2024
•
936k
•
69.1k
•
268
tuetschek/e2e_nlg_cleaned
Updated
Jan 18, 2024
•
161
•
3
tau/commonsense_qa
Viewer
•
Updated
Jan 4, 2024
•
12.1k
•
88.5k
•
101
stanfordnlp/coqa
Viewer
•
Updated
Jan 4, 2024
•
7.7k
•
3.4k
•
74
ucinlp/drop
Viewer
•
Updated
Jan 17, 2024
•
86.9k
•
2.96k
•
54
lighteval/DyckLanguage
Viewer
•
Updated
May 12, 2023
•
1.51k
•
33
openai/gsm8k
Viewer
•
Updated
Jan 4, 2024
•
17.6k
•
540k
•
749
Rowan/hellaswag
Viewer
•
Updated
Sep 28, 2023
•
60k
•
186k
•
124
openai/openai_humaneval
Viewer
•
Updated
Jan 4, 2024
•
164
•
64.5k
•
321
stanfordnlp/imdb
Viewer
•
Updated
Jan 4, 2024
•
100k
•
95.5k
•
319
cimec/lambada
Viewer
•
Updated
Jan 4, 2024
•
12.7k
•
17.5k
•
59
lighteval/LegalSupport
Viewer
•
Updated
May 10, 2023
•
20k
•
36
•
1
lighteval/lsat_qa
Updated
May 16, 2023
•
29
•
4
deepmind/math_dataset
Updated
Jan 18, 2024
•
2.56k
•
127
deepmind/aqua_rat
Viewer
•
Updated
Jan 9, 2024
•
196k
•
3.9k
•
63
google-research-datasets/mbpp
Viewer
•
Updated
Jan 4, 2024
•
1.4k
•
39.2k
•
175
cais/mmlu
Viewer
•
Updated
Mar 8, 2024
•
231k
•
140k
•
476
microsoft/ms_marco
Viewer
•
Updated
Jan 4, 2024
•
1.11M
•
5.59k
•
164
CogComp/trec
Updated
Jan 18, 2024
•
10k
•
44
Running
on
CPU Upgrade
850
850
Open ASR Leaderboard
🏆
Request evaluation for new speech models
Upvote
10
+6
Share collection
View history
Collection guide
Browse collections