---
license: mit
base_model:
- google/gemma-3-1b-it
pipeline_tag: text-generation
tags:
- Google
- Gemini
- Gemma-3
- LLm
---

# Gemma-3-1b-it Q4_0 Quantized Model

This is a Q4_0 quantized version of the `google/gemma-3-1b-it` model, converted to GGUF format and optimized for efficient inference. It was created using `llama.cpp` tools in Google Colab.

## Model Details
- **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
- **Quantization**: Q4_0 (4-bit quantization)
- **Format**: GGUF
- **Size**: ~1–1.5 GB
- **Converted Using**: `llama.cpp` (commit from April 2025)
- **License**: Inherits the license from `google/gemma-3-1b-it`

## Usage
To use this model with `llama.cpp`:
```bash
./llama-cli -m gemma-3-1b-it-Q4_0.gguf --prompt "Hello, world!" --no-interactive
```

## How It Was Created
1. Downloaded `google/gemma-3-1b-it` from Hugging Face.
2. Converted to GGUF using `convert_hf_to_gguf.py`.
3. Quantized to Q4_0 using `llama-quantize` from `llama.cpp`.
4. Tested in Google Colab with `llama-cli`.

## Limitations
- Quantization may reduce accuracy compared to the original model.
- Requires `llama.cpp` or compatible software for inference.

## Acknowledgments
- Based on the work of [bartowski](https://huggingface.co/bartowski) for GGUF quantization.
- Uses `llama.cpp` by Georgi Gerganov.