Why does Deepseek 67B have a bigger file size than 70B models?
#1
by
OrangeApples
- opened
@LoneStriker I'm comparing your 2.65bpw quants of Deepseek 67B and Euryale 1.4 L2 70B, and Deepseek's is bigger by about 700 MBs. However, when I checked TheBloke's GGUF quants of both, the opposite was true which is what I expected. Not really a big deal, but I am curious about what caused this.
I'm not sure actually, that's something we would need to ask the author of Exllamav2 for his insight. I have anecdotally noticed, however that Deepseek seems to take more VRAM vs. L2 70B under exl2 as well. So the size is reflected in the VRAM usage. As to why? I would just guess the tokenizer being 100K vs. 32K.