New 1.66-bit TQ1_0 quant that is 162GB

#14

pinned

by shimmyshimmer - opened 4 days ago

Discussion

shimmyshimmer

Unsloth AI org 4 days ago

•

edited 4 days ago

We added a new TQ1_0 quant that is 1.66-bit and 162GB in size. For those who want it to fit exactly on some setups and some more lenient combinations

Made for setups with 192GB RAM and Ollama

shimmyshimmer changed discussion title from New TQ1_0 quant that is 162GB to New 1.66-bit TQ1_0 quant that is 162GB 4 days ago

shimmyshimmer pinned discussion 4 days ago

jaxchang

4 days ago

What’s the difference compared to IQ1_S? What layers are compressed more?

daxiongshu

3 days ago

•

edited 3 days ago

could you please share how to run it with llama cpp? Thanks.
Edit: it seems working

./llama.cpp/llama-cli \
    -hf unsloth/DeepSeek-R1-0528-GGUF:TQ1_0 \
    --cache-type-k q4_0 \
    --threads -1 \
    --n-gpu-layers 99 \
    --prio 3 \
    --temp 0.6 \
    --top_p 0.95 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --seed 3407 \
    -ot ".ffn_.*_exps.=CPU"

shimmyshimmer

Unsloth AI org 3 days ago

What’s the difference compared to IQ1_S? What layers are compressed more?

Correct but the correct layers

daxiongshu

3 days ago

Is there a comparison of performance (benchmark score) between each quantized version and the original checkpoint? Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment