Model Details

This model is an int4 model with group_size 128 and symmetric quantization of google/gemma-3-12b-it generated by intel/auto-round algorithm.

Please follow the license of the original model.

Inference on CPU

we found the unquantized layer must run on BF16 or FP32, so cuda inference is not available now.

Requirements

pip install auto-round
pip uninstall intel-extension-for-pytorch
pip install intel-extension-for-transformers
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch
from auto_round import AutoRoundConfig

model_id = "OPEA/gemma-3-12b-it-int4-AutoRound-cpu"

quantization_config = AutoRoundConfig(backend="cpu")
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="cpu", quantization_config=quantization_config
).eval()

processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image",
             "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
"""Here's a detailed description of the image:

**Overall Impression:**

The image is a close-up shot of a vibrant garden scene, focusing on a pink cosmos flower with a bumblebee visiting it. The overall feel is natural, colorful, and slightly wild.

**Main Elements:**

*   **Cosmos Flower:** The central focus is a large, pink cosmos flower with broad petals. It's in full bloom and appears healthy.
*   **Bumblebee:**"""

Generate the model

Here is the sample command to reproduce the model.

pip install git+https://github.com/intel/auto-round.git@main
auto-round-mllm \
--model google/gemma-3-12b-it \
--device 0 \
--bits 4 \
--batch_size 1 \
--gradient_accumulate_steps 8 \
--format 'auto_round' \
--output_dir "./tmp_autoround"
Downloads last month
14
Safetensors
Model size
2.87B params
Tensor type
I32
·
BF16
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OPEA/gemma-3-12b-it-int4-AutoRound-cpu

Quantized
(52)
this model

Dataset used to train OPEA/gemma-3-12b-it-int4-AutoRound-cpu

Collection including OPEA/gemma-3-12b-it-int4-AutoRound-cpu