VRAM Usage
It would be so fantastic if you could include a note about typical VRAM requirements in the model cards for the miqu models (and other models in general); while it was possible to get a sense of how much VRAM a model would use a few months ago, now that we have MoE's etc, I'm finding it a lot harder.
If there is a reliable way to determine this without being spoon fed the info, please let me know :) and kind thanks for all the wonderful quants!
Lots of variables unfortunately. The only way to know is to load the model with the context length that you want to use. Different front ends like ooba vs. exui vs. TabbyAPI vs. bare exllamav2 may report different VRAM usage. There's also overhead associated with splitting models across GPUs vs. running models entirely on a single GPU. We can take some measurements and just record them, but newer models like this one that supports up to 32K context will need their own measurements. I'll keep the feature in mind for future work.