abhishekchohan
/

gemma-3-27b-it-quantized-W4A16

Image-Text-to-Text

text-generation-inference

compressed-tensors

Model card Files Files and versions

abhishekchohan commited on Mar 17

Commit

56eadf5

·

verified ·

1 Parent(s): 4171c69

Create README.md

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: gemma
+library_name: transformers
+pipeline_tag: image-text-to-text
+extra_gated_heading: Access Gemma on Hugging Face
+extra_gated_prompt: To access Gemma on Hugging Face, you're required to review and agree to Google's usage license. To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately.
+extra_gated_button_content: Acknowledge license
+base_model: google/gemma-3-27b-it
+---
+# Gemma 3 Quantized Models
+This repository contains W4A16 quantized versions of Google's Gemma 3 instruction-tuned models, making them more accessible for deployment on consumer hardware while maintaining good performance.
+## Models
+- **abhishekchohan/gemma-3-27b-it-quantized-W4A16**
+- **abhishekchohan/gemma-3-12b-it-quantized-W4A16**
+- **abhishekchohan/gemma-3-4b-it-quantized-W4A16**
+## Repository Structure
+```
+gemma-3-{size}-it-quantized-W4A16/
+├── README.md
+├── templates/
+│   └── chat_template.jinja
+├── tools/
+│   └── tool_parser.py
+└── [model files]
+```
+## Quantization Details
+These models use W4A16 quantization via LLM Compressor:
+- Weights quantized to 4-bit precision
+- Activations use 16-bit precision
+- Significantly reduced memory requirements
+## Usage with vLLM
+```bash
+vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py
+```
+## License
+These models are subject to the Gemma license. Users must acknowledge and accept the license terms before using the models.
+## Citation
+```
+@article{gemma_2025,
+    title={Gemma 3},
+    url={https://goo.gle/Gemma3Report},
+    publisher={Kaggle},
+    author={Gemma Team},
+    year={2025}
+}
+```