Quantized Llama-3 koboldcpp/mmproj?
#7
by
Lewdiculous
- opened
https://huggingface.co/koboldcpp/mmproj/blob/main/LLaMA3-8B_mmproj-Q4_1.gguf
Thoughts on this versus ChaoticNeutrals/Llava_1.5_Llama3_mmproj
unquantized?
Not a huge point in running it quanted just adds extra time de-quanting at inference time, and its small enough already not to take up much space or vram. Id say it depends on users hardware.
400MB VRAM can be extra context for the constrained folk KEK
Valid point about inference time.
Lewdiculous
changed discussion status to
closed