gemma-3-1b-it-Q4_0 / README.md
tanujrai's picture
Update README.md
6f128f0 verified
metadata
license: mit
base_model:
  - google/gemma-3-1b-it
pipeline_tag: text-generation
tags:
  - Google
  - Gemini
  - Gemma-3
  - LLm

Gemma-3-1b-it Q4_0 Quantized Model

This is a Q4_0 quantized version of the google/gemma-3-1b-it model, converted to GGUF format and optimized for efficient inference. It was created using llama.cpp tools in Google Colab.

Model Details

  • Base Model: google/gemma-3-1b-it
  • Quantization: Q4_0 (4-bit quantization)
  • Format: GGUF
  • Size: ~1–1.5 GB
  • Converted Using: llama.cpp (commit from April 2025)
  • License: Inherits the license from google/gemma-3-1b-it

Usage

To use this model with llama.cpp:

./llama-cli -m gemma-3-1b-it-Q4_0.gguf --prompt "Hello, world!" --no-interactive

How It Was Created

  1. Downloaded google/gemma-3-1b-it from Hugging Face.
  2. Converted to GGUF using convert_hf_to_gguf.py.
  3. Quantized to Q4_0 using llama-quantize from llama.cpp.
  4. Tested in Google Colab with llama-cli.

Limitations

  • Quantization may reduce accuracy compared to the original model.
  • Requires llama.cpp or compatible software for inference.

Acknowledgments

  • Based on the work of bartowski for GGUF quantization.
  • Uses llama.cpp by Georgi Gerganov.