MastermindEval Collection Prompting and multiple-choice (MCQ) benchmarks to evaluate reasoning capabilities of LLMs using Mastermind. • 10 items • Updated May 29
whoisjones/gemma-fineweb-edu-scorer-mdeberta-binary-lr5e-05-20250411_140230 Text Classification • 0.3B • Updated Apr 12 • 5
whoisjones/gemma-fineweb-edu-scorer-mdeberta-multilabel-lr5e-05-20250411_135828 Text Classification • 0.3B • Updated Apr 12 • 15
whoisjones/openai-fineweb-edu-scorer-mdeberta-multilabel-lr5e-05-20250411_133317 Text Classification • 0.3B • Updated Apr 11 • 7
whoisjones/openai-fineweb-edu-scorer-mdeberta-multilabel-lr5e-05-20250411_133317 Text Classification • 0.3B • Updated Apr 11 • 7
whoisjones/gemma-fineweb-edu-scorer-mdeberta-multilabel-lr5e-05-20250411_135828 Text Classification • 0.3B • Updated Apr 12 • 15
whoisjones/gemma-fineweb-edu-scorer-mdeberta-binary-lr5e-05-20250411_140230 Text Classification • 0.3B • Updated Apr 12 • 5
whoisjones/openai-fineweb-edu-scorer-mdeberta-binary-lr5e-05-20250411_132948 Text Classification • 0.3B • Updated Apr 11 • 23
whoisjones/openai-fineweb-edu-scorer-mdeberta-binary-lr5e-05-20250411_132948 Text Classification • 0.3B • Updated Apr 11 • 23
whoisjones/gemma-fineweb-edu-scorer-mdeberta-multilabel-lr5e-05-20250411_135828 Text Classification • 0.3B • Updated Apr 12 • 15
whoisjones/gemma-fineweb-edu-scorer-mdeberta-binary-lr5e-05-20250411_140230 Text Classification • 0.3B • Updated Apr 12 • 5
whoisjones/openai-fineweb-edu-scorer-mdeberta-multilabel-lr5e-05-20250411_133317 Text Classification • 0.3B • Updated Apr 11 • 7