--- license: mit base_model: - google/gemma-3-1b-it pipeline_tag: text-generation tags: - Google - Gemini - Gemma-3 - LLm --- # Gemma-3-1b-it Q4_0 Quantized Model This is a Q4_0 quantized version of the `google/gemma-3-1b-it` model, converted to GGUF format and optimized for efficient inference. It was created using `llama.cpp` tools in Google Colab. ## Model Details - **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) - **Quantization**: Q4_0 (4-bit quantization) - **Format**: GGUF - **Size**: ~1–1.5 GB - **Converted Using**: `llama.cpp` (commit from April 2025) - **License**: Inherits the license from `google/gemma-3-1b-it` ## Usage To use this model with `llama.cpp`: ```bash ./llama-cli -m gemma-3-1b-it-Q4_0.gguf --prompt "Hello, world!" --no-interactive ``` ## How It Was Created 1. Downloaded `google/gemma-3-1b-it` from Hugging Face. 2. Converted to GGUF using `convert_hf_to_gguf.py`. 3. Quantized to Q4_0 using `llama-quantize` from `llama.cpp`. 4. Tested in Google Colab with `llama-cli`. ## Limitations - Quantization may reduce accuracy compared to the original model. - Requires `llama.cpp` or compatible software for inference. ## Acknowledgments - Based on the work of [bartowski](https://huggingface.co/bartowski) for GGUF quantization. - Uses `llama.cpp` by Georgi Gerganov.