Text Generation
Transformers
Safetensors
mixtral
Not-For-All-Audiences
nsfw
text-generation-inference
Is 2x13B or 3x13B possible?
#5
by
itsnottme
- opened
Great model. I am wondering though if a similar model with less experts is possible, something like 2x13B or 3x13B, so it would fit on smaller computers.
itsnottme
changed discussion status to
closed
Hello, sorry I didn't saw your message earlier, it was not possible back in the day because, for GGUF, Llama.cpp accepted only model of ^2 experts, but not 2, so it was only 4, 8, or 16 (or 32, 64...)
It could be possible today, but Llama2 model below 70B don't have GQA, and it make the model very heavy to load and use. So I don't think I will do another Mixtral-like MoE model made of Llama2, like this one.
Hope it answer the question.