|
--- |
|
license: gemma |
|
language: |
|
- en |
|
- zh |
|
- es |
|
base_model: |
|
- google/gemma-3-1b-it |
|
tags: |
|
- Google |
|
- Gemma3 |
|
- GGUF |
|
- 1b-it |
|
--- |
|
|
|
# Google Gemma 3 1B Instruction-Tuned GGUF Quantized Models |
|
|
|
This repository contains GGUF quantized versions of [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it), optimized for efficient deployment across various hardware configurations. |
|
|
|
## Quantization Results |
|
|
|
| Model | Size | Compression Ratio | Size Reduction | |
|
|-------|------|-------------------|---------------| |
|
| Q8_0 | 1.07 GB | 54% | 46% | |
|
| Q6_K | 1.01 GB | 51% | 49% | |
|
| Q4_K | 0.81 GB | 40% | 60% | |
|
| Q2_K | 0.69 GB | 34% | 66% | |
|
|
|
## Quality vs Size Trade-offs |
|
|
|
- **Q8_0**: Near-lossless quality, minimal degradation compared to F16 |
|
- **Q6_K**: Very good quality, slight degradation in some rare cases |
|
- **Q4_K**: Decent quality, noticeable degradation but still usable for most tasks |
|
- **Q2_K**: Heavily reduced quality, substantial degradation but smallest file size |
|
|
|
## Recommendations |
|
|
|
- For **maximum quality**: Use Q8_0 |
|
- For **balanced performance**: Use Q6_K |
|
- For **minimum size**: Use Q2_K |
|
- For **most use cases**: Q4_K provides a good balance of quality and size |
|
|
|
## Usage with llama.cpp |
|
|
|
These models can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and its various interfaces. Example: |
|
|
|
```bash |
|
# Running with llama-gemma3-cli.exe (adjust paths as needed) |
|
./llama-gemma3-cli --model Google.Gemma-3-1b-it-Q4_K.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings." |
|
``` |
|
|
|
## License |
|
|
|
This model is released under the same [Gemma license](https://ai.google.dev/gemma/terms) as the original model. |
|
|
|
## Original Model Information |
|
|
|
This quantized set is derived from [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it). |
|
|
|
### Model Specifications |
|
- **Architecture**: Gemma 3 |
|
- **Size Label**: 1B |
|
- **Type**: Instruction-tuned |
|
- **Context Length**: 32K tokens |
|
- **Embedding Length**: 2048 |
|
- **Languages**: Support for multiple languages |
|
|
|
## Citation & Attribution |
|
|
|
``` |
|
@article{gemma_2025, |
|
title={Gemma 3}, |
|
url={https://goo.gle/Gemma3Report}, |
|
publisher={Kaggle}, |
|
author={Gemma Team}, |
|
year={2025} |
|
} |
|
|
|
@misc{gemma3_quantization_2025, |
|
title={Quantized Versions of Google's Gemma 3 1B Model}, |
|
author={Lex-au}, |
|
year={2025}, |
|
month={March}, |
|
note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 1B}, |
|
url={https://huggingface.co/lex-au} |
|
} |
|
``` |