Is 2x13B or 3x13B possible?

#5
by itsnottme - opened

Great model. I am wondering though if a similar model with less experts is possible, something like 2x13B or 3x13B, so it would fit on smaller computers.

itsnottme changed discussion status to closed

Hello, sorry I didn't saw your message earlier, it was not possible back in the day because, for GGUF, Llama.cpp accepted only model of ^2 experts, but not 2, so it was only 4, 8, or 16 (or 32, 64...)

It could be possible today, but Llama2 model below 70B don't have GQA, and it make the model very heavy to load and use. So I don't think I will do another Mixtral-like MoE model made of Llama2, like this one.

Hope it answer the question.

Sign up or log in to comment