File size: 2,552 Bytes
3bacb30 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: gemma
language:
- en
- zh
- es
base_model:
- google/gemma-3-1b-it
tags:
- Google
- Gemma3
- GGUF
- 1b-it
---
# Google Gemma 3 1B Instruction-Tuned GGUF Quantized Models
This repository contains GGUF quantized versions of [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it), optimized for efficient deployment across various hardware configurations.
## Quantization Results
| Model | Size | Compression Ratio | Size Reduction |
|-------|------|-------------------|---------------|
| Q8_0 | 1.07 GB | 54% | 46% |
| Q6_K | 1.01 GB | 51% | 49% |
| Q4_K | 0.81 GB | 40% | 60% |
| Q2_K | 0.69 GB | 34% | 66% |
## Quality vs Size Trade-offs
- **Q8_0**: Near-lossless quality, minimal degradation compared to F16
- **Q6_K**: Very good quality, slight degradation in some rare cases
- **Q4_K**: Decent quality, noticeable degradation but still usable for most tasks
- **Q2_K**: Heavily reduced quality, substantial degradation but smallest file size
## Recommendations
- For **maximum quality**: Use Q8_0
- For **balanced performance**: Use Q6_K
- For **minimum size**: Use Q2_K
- For **most use cases**: Q4_K provides a good balance of quality and size
## Usage with llama.cpp
These models can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and its various interfaces. Example:
```bash
# Running with llama-gemma3-cli.exe (adjust paths as needed)
./llama-gemma3-cli --model Google.Gemma-3-1b-it-Q4_K.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."
```
## License
This model is released under the same [Gemma license](https://ai.google.dev/gemma/terms) as the original model.
## Original Model Information
This quantized set is derived from [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it).
### Model Specifications
- **Architecture**: Gemma 3
- **Size Label**: 1B
- **Type**: Instruction-tuned
- **Context Length**: 32K tokens
- **Embedding Length**: 2048
- **Languages**: Support for multiple languages
## Citation & Attribution
```
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
@misc{gemma3_quantization_2025,
title={Quantized Versions of Google's Gemma 3 1B Model},
author={Lex-au},
year={2025},
month={March},
note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 1B},
url={https://huggingface.co/lex-au}
}
``` |