MMLU Results of Llama-3.2-1B-Instruct cannot be reproduced

#27
by ou7791 - opened

My result:
micro average is 46.31
macro average across categories is 46.67
macro average across sub-categories is 48.2

I used this code "https://github.com/meta-llama/llama3"

Could it be because FlashAttention is not used in this code, resulting in a difference from the evaluation results on Hugging Face?

Sign up or log in to comment