Q4 quants

by DrRos - opened May 4

May 4

•

Hi, guys, can you, please also upload a Q4 quants? It seems model cannot be converted to gguf using latest llama.cpp (I'm getting NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre() error while running ./convert_hf_to_gguf.py --outfile ../mellum-non-quant.gguf --verbose ../Mellum-4b-sft-python/

UpPingwin

23 days ago

Use llama-quantize

llama-quantize --allow-requantize Mellum-4B-SFT-Python.Q8_0.gguf Mellum-4B-SFT-Python.Q4_0.gguf Q4_0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment