lex-au
/

Google.Gemma-3-1b-it-GGUF

Model card Files Files and versions Community

Google.Gemma-3-1b-it-GGUF / README.md

lex-au's picture

Update README.md

3bacb30 verified 3 months ago

|

history blame contribute delete

2.55 kB

	---
	license: gemma
	language:
	- en
	- zh
	- es
	base_model:
	- google/gemma-3-1b-it
	tags:
	- Google
	- Gemma3
	- GGUF
	- 1b-it
	---

	# Google Gemma 3 1B Instruction-Tuned GGUF Quantized Models

	This repository contains GGUF quantized versions of [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it), optimized for efficient deployment across various hardware configurations.

	## Quantization Results

	\| Model \| Size \| Compression Ratio \| Size Reduction \|
	\|-------\|------\|-------------------\|---------------\|
	\| Q8_0 \| 1.07 GB \| 54% \| 46% \|
	\| Q6_K \| 1.01 GB \| 51% \| 49% \|
	\| Q4_K \| 0.81 GB \| 40% \| 60% \|
	\| Q2_K \| 0.69 GB \| 34% \| 66% \|

	## Quality vs Size Trade-offs

	- Q8_0: Near-lossless quality, minimal degradation compared to F16
	- Q6_K: Very good quality, slight degradation in some rare cases
	- Q4_K: Decent quality, noticeable degradation but still usable for most tasks
	- Q2_K: Heavily reduced quality, substantial degradation but smallest file size

	## Recommendations

	- For maximum quality: Use Q8_0
	- For balanced performance: Use Q6_K
	- For minimum size: Use Q2_K
	- For most use cases: Q4_K provides a good balance of quality and size

	## Usage with llama.cpp

	These models can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and its various interfaces. Example:

	```bash
	# Running with llama-gemma3-cli.exe (adjust paths as needed)
	./llama-gemma3-cli --model Google.Gemma-3-1b-it-Q4_K.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."
	```

	## License

	This model is released under the same [Gemma license](https://ai.google.dev/gemma/terms) as the original model.

	## Original Model Information

	This quantized set is derived from [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it).

	### Model Specifications
	- Architecture: Gemma 3
	- Size Label: 1B
	- Type: Instruction-tuned
	- Context Length: 32K tokens
	- Embedding Length: 2048
	- Languages: Support for multiple languages

	## Citation & Attribution

	```
	@article{gemma_2025,
	title={Gemma 3},
	url={https://goo.gle/Gemma3Report},
	publisher={Kaggle},
	author={Gemma Team},
	year={2025}
	}

	@misc{gemma3_quantization_2025,
	title={Quantized Versions of Google's Gemma 3 1B Model},
	author={Lex-au},
	year={2025},
	month={March},
	note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 1B},
	url={https://huggingface.co/lex-au}
	}
	```