Question about experts select

#186
by waynebian - opened

Why choose topk is equal to 2 in the moegate function? Isn't it a group choice here, I think that all scores should be considered.
code: https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py#441
group_scores = (
scores_for_choice.view(bsz * seq_len, self.n_group, -1).topk(2, dim=-1)[0].sum(dim = -1)
) # [n, n_group]

Sign up or log in to comment