How to make Q_4_K_M with your GGUF converter?

#1
by PixelPlayer - opened

I load the full Flux 24 gb safetensors model into GGUF Convertor (Alpha) using ComfyUI_GGUF_windows_portable 0.0.10 (latest). At the output I get bf16.gguf with size 24 gb too, and that's all. How to get the desired quantization? For example Q_4_K_M? There are no settings in the node itself, maybe it should be set somewhere?

the convertor is merely passing the torch tensor to form a gguf file; to further quantize it, i.e., q8 to q2; need to comply the c/c++ script from llama.cpp and make an executable tool for doing that task, since those are all customized models, not following the hf standard and not on gg's list recently; we think we are able to make it a simple step and integrate it in our tool very soon; btw, you should be able to find the new gguf node from comfyui-manager right away, the new version doesn't need dependency; could use two nodes at the same time

Owner

ok, done; you can take the cutter here to make your own q4_k_m gguf; enjoy

Thank you for that! I'll try it.

Hey, @calcuis ,

I opened an issue over on gguf-quantizor.

I cannot get the .exe to work on my Windows machine (nothing in the cli no matter what, and some Mingw errors when I double click.)

I tried applying your patch to the tags found in https://github.com/city96/ComfyUI-GGUF/tree/main/tools but neither of them were correct, etither.

Having encountered this before (and before I saw your repos) I went in and manually added the modifications to the files but when attempting to convert HunyuanVideo I got a "GGML_MAX_DIMS" error and decided I'd open an issue.

It would be awesome to have both linux and win11 binaries for this (I was planning on doing something similar to many of the tools in your chain over in ComfyUI-MultiGPU). If you could provide me the llama.cpp tag or commit, I am sure I can either directly patch it or redo my manual edits if all else fails.

Cheers!

seems .dll(s) is missing somehow (from c compiler) for that .exe to work; btw, you could convert the safetensors to gguf right? the tag is on another machine, get back to you later on github then; since llama.cpp was changed a lot in this short period of time, guess the rule might not be applied for the most updated version; but could still try

@calcuis Re: Convert to FP16 GGUF

Yes! Your convertor ZERO node worked flawlessly. Really good work there. I was able to take a HunyuanVideo .safetensor fine-tune I had made earlier and convert it to a BF16.gguf and was able to load that resultant 25G .gguf model and see the clear evidence that the LoRA was properly merged into the base model. The final step is just quantizing the resultant FP16 into something a bit more memory-friendly.

Thanks again for your response. I'll post back here how it goes.

Re: latest version of llama.cpp: The are indeed major issues attempting to apply this patch, as in things like these all appear to have moved to gguf.h instead of ggml.h as one example:

     GGML_API void gguf_add_tensor(struct gguf_context * ctx, const struct ggml_tensor * tensor);
     GGML_API void gguf_set_tensor_type(struct gguf_context * ctx, const char * name, enum ggml_type type);
     GGML_API void gguf_set_tensor_data(struct gguf_context * ctx, const char * name, const void * data, size_t size);

Still, I know you got it working based on all your HF model uploads, so I am just trying to follow in your footsteps. :)

As I mentioned, following City96's original instructions for llama.cpp were also not 100% accurate, so I ended-up just committing a branch of llama.cpp that had the patch working (where I, in a fit of desperation, replaced "FLUX" with "LTX" just to get 0.9.1 files converted.) I might just end up doing it for this latest one, too, with all the latest additions you've made to it, I would love to be able to both offer a "compile-from-source" or "targeted binary" for both windows and linux for the entire community to leverage.

I am curious, is your latest convertor able to do all of the latest models you've added? It seems like based on your commits that the intent of gguf-quantizor is to have one that works for all current image/video models you have support for in Comfy. Which is awesome.

I would love to get the .exe created from gguf-quantizor functioning, too, on my Win11 machine.

thanks @pollockjj ; appreciate; glad to know it works for you; ltxv 0.9.1 is a bit tricky, need to tune some parameters, direct conversion doesn't work for it (not unsuccessfully converted; just the converted file doesn't work) but works for 0.9.0; convertor zero should be ok to convert all type of safetensors; since anyone can convert it by him/herself without coding anything; in that case, doesn't need to wait for others' gguf upload/feed; an universal model indeed; sure, will look into the latest llama.cpp

Sign up or log in to comment