MMLU Results of Llama-3.2-1B-Instruct cannot be reproduced
#27
by
ou7791
- opened
My result:
micro average is 46.31
macro average across categories is 46.67
macro average across sub-categories is 48.2
I used this code "https://github.com/meta-llama/llama3"
Could it be because FlashAttention is not used in this code, resulting in a difference from the evaluation results on Hugging Face?