MastermindEval Prompting and multiple-choice (MCQ) benchmarks to evaluate reasoning capabilities of LLMs using Mastermind. flair/mastermind_24_mcq_random Viewer • Updated Mar 12 • 30.4k • 897 flair/mastermind_35_mcq_random Viewer • Updated Mar 12 • 37.1k • 311 flair/mastermind_46_mcq_random Viewer • Updated Mar 12 • 36.1k • 1.09k flair/mastermind_24_mcq_close Viewer • Updated Mar 12 • 30.4k • 322
Familiarity Easy, medium and a difficult training splits for training general NER models using the familiarity metric! Please see our paper for details. flair/pilener_entropy_splits Viewer • Updated Feb 4 • 78.8k • 7 flair/pilener_max_splits Viewer • Updated Feb 4 • 43.7k • 13 flair/nuner_entropy_splits Viewer • Updated Feb 4 • 618k • 10 flair/nuner_max_splits Viewer • Updated Feb 4 • 546k • 7
MastermindEval Prompting and multiple-choice (MCQ) benchmarks to evaluate reasoning capabilities of LLMs using Mastermind. flair/mastermind_24_mcq_random Viewer • Updated Mar 12 • 30.4k • 897 flair/mastermind_35_mcq_random Viewer • Updated Mar 12 • 37.1k • 311 flair/mastermind_46_mcq_random Viewer • Updated Mar 12 • 36.1k • 1.09k flair/mastermind_24_mcq_close Viewer • Updated Mar 12 • 30.4k • 322
Familiarity Easy, medium and a difficult training splits for training general NER models using the familiarity metric! Please see our paper for details. flair/pilener_entropy_splits Viewer • Updated Feb 4 • 78.8k • 7 flair/pilener_max_splits Viewer • Updated Feb 4 • 43.7k • 13 flair/nuner_entropy_splits Viewer • Updated Feb 4 • 618k • 10 flair/nuner_max_splits Viewer • Updated Feb 4 • 546k • 7