Jinx-Qwen3-4B-gguf-q6_k-q-8 (mixed-precision): selected weights (output, token embeddings, attention/FFN layers in first and last blocks) quantized to Q8_0, remaining tensors Q6_k, reducing memory footprint while preserving inference fidelity.
Chat template
4-bit
6-bit
8-bit
16-bit
32-bit
Base model