Each dataset is split into easy, medium and a difficult split using the familiarity metric. Please see our paper for details.
Jonas Golde
whoisjones
AI & ML interests
Data-efficient transfer learning
Recent Activity
authored
a paper
about 1 month ago
MastermindEval: A Simple But Scalable Reasoning Benchmark
new activity
about 1 month ago
flair/mastermind_35_mcq_close:Librarian Bot: Add language metadata for dataset
new activity
about 1 month ago
flair/mastermind_24_prompt:Librarian Bot: Add language metadata for dataset