Does LLama4 have chunked attention in generation phase ?

#64
by vanshils - opened

Same as title.
I know chunked attention mask is there for context phase. But does LLama4 implement chunked attention mask in generation phase too ?

Meta Llama org

yes

Sign up or log in to comment