AetherArchitectural
/

Community-Discussions

Model card Files Files and versions

Quantized Llama-3 koboldcpp/mmproj?

#7

by Lewdiculous - opened Apr 22, 2024

AetherArchitectural org Apr 22, 2024

•

edited Apr 22, 2024

https://huggingface.co/koboldcpp/mmproj/blob/main/LLaMA3-8B_mmproj-Q4_1.gguf

@Nitral-AI @jeiku

Thoughts on this versus ChaoticNeutrals/Llava_1.5_Llama3_mmproj unquantized?

Apr 22, 2024

•

edited Apr 22, 2024

Not a huge point in running it quanted just adds extra time de-quanting at inference time, and its small enough already not to take up much space or vram. Id say it depends on users hardware.

AetherArchitectural org Apr 22, 2024

•

edited Apr 22, 2024

400MB VRAM can be extra context for the constrained folk KEK

Valid point about inference time.

Lewdiculous changed discussion status to closed Apr 27, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment