unsloth/DeepSeek-R1-0528-GGUF · Help with finetuning

Is it possible to finetune R1 (not using unsloth, bc of single gpu only). I have a few questions:

Can I finetune the GGUF directly (as mentioned in https://github.com/ggml-org/llama.cpp/discussions/6680)
When I'm using QLoRA does it work by:
Download model -> Quantize to Q4 -> Tune -> Upload adapters
OR
Download model -> Tune -> Quantize Adapters to Q4 -> Upload adapters

Basically, is it better to use QLoRA on a 16bit or a 4bit model, like what's the difference between:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Deepseek/deekseek-R1-0528-4bit", 
    max_seq_length = max_seq_length,
    load_in_4bit = True,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Deepseek/deekseek-R1-0528", 
    max_seq_length = max_seq_length,
    load_in_4bit = True,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)