@ngxson on Hugging Face: "Check out my collection of pre-made GGUF LoRA adapters! This allow you to use…"

ngxson

posted an update Jan 14

Post

3683

Check out my collection of pre-made GGUF LoRA adapters!

This allow you to use both normal + abliterated version of popular models like llama, qwen, etc, without having to double to amount of VRAM usage.

ngxson/gguf_lora_collection

ngxson

Jan 14

Tagging @bartowski @MaziyarPanahi and @mradermacher , you may want to give this a try!

John6666

Jan 14

•

edited Jan 14

With my llama-cpp-python (0.3.4), the following PR maybe have not been merged yet, so an error occurs when applying LoRA. I tried it with Qwen 2.5 14B Instruct. Well, it will be updated eventually.🙄
https://github.com/ggerganov/llama.cpp/issues/9114

MaziyarPanahi

Jan 15

This is super cool!!! Would you mind sharing the process of these GGUF LoRA adapters? Did you convert the LoRA into GGUF or made LoRA from the GGUF itself?

ngxson

Jan 15

•

edited Jan 15

Yes, sure!

The first step is to generate the PEFT-compatible LoRA adapter, I used mergekit-extract-lora to do that. Please note that some bigger models (Qwen/Llama 70B) give some errors that I don't know how to fix, hopefully they will fix that soon. You can find more info about mergekit here: https://github.com/arcee-ai/mergekit

Next step is to convert PEFT to GGUF, I used this space: https://huggingface.co/spaces/ggml-org/gguf-my-lora

Then it's good to go!

Please note that, the space can convert any PEFT LoRA adapters to GGUF, so if you're using something like unsloth, it will be straight-forward to convert into GGUF LoRA (so no need to merge to base model)

Fishtiks

Feb 8

This comment has been hidden

Join the conversation