File size: 2,552 Bytes

3bacb30

---
license: gemma
language:
- en
- zh
- es
base_model:
- google/gemma-3-1b-it
tags:
- Google
- Gemma3
- GGUF
- 1b-it
---

# Google Gemma 3 1B Instruction-Tuned GGUF Quantized Models

This repository contains GGUF quantized versions of [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it), optimized for efficient deployment across various hardware configurations.

## Quantization Results

| Model | Size | Compression Ratio | Size Reduction |
|-------|------|-------------------|---------------|
| Q8_0  | 1.07 GB | 54% | 46% |
| Q6_K  | 1.01 GB | 51% | 49% |
| Q4_K  | 0.81 GB | 40% | 60% |
| Q2_K  | 0.69 GB | 34% | 66% |

## Quality vs Size Trade-offs

- **Q8_0**: Near-lossless quality, minimal degradation compared to F16
- **Q6_K**: Very good quality, slight degradation in some rare cases
- **Q4_K**: Decent quality, noticeable degradation but still usable for most tasks
- **Q2_K**: Heavily reduced quality, substantial degradation but smallest file size

## Recommendations

- For **maximum quality**: Use Q8_0
- For **balanced performance**: Use Q6_K
- For **minimum size**: Use Q2_K
- For **most use cases**: Q4_K provides a good balance of quality and size

## Usage with llama.cpp

These models can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and its various interfaces. Example:

```bash
# Running with llama-gemma3-cli.exe (adjust paths as needed)
./llama-gemma3-cli --model Google.Gemma-3-1b-it-Q4_K.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."
```

## License

This model is released under the same [Gemma license](https://ai.google.dev/gemma/terms) as the original model.

## Original Model Information

This quantized set is derived from [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it).

### Model Specifications
- **Architecture**: Gemma 3
- **Size Label**: 1B
- **Type**: Instruction-tuned
- **Context Length**: 32K tokens
- **Embedding Length**: 2048
- **Languages**: Support for multiple languages

## Citation & Attribution

```
@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

@misc{gemma3_quantization_2025,
    title={Quantized Versions of Google's Gemma 3 1B Model},
    author={Lex-au},
    year={2025},
    month={March},
    note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 1B},
    url={https://huggingface.co/lex-au}
}
```