MMLU Results of Llama-3.2-1B-Instruct cannot be reproduced

#27

by ou7791 - opened Nov 14, 2024

ou7791

Nov 14, 2024

My result:
micro average is 46.31
macro average across categories is 46.67
macro average across sub-categories is 48.2

Could it be because FlashAttention is not used in this code, resulting in a difference from the evaluation results on Hugging Face?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment