VLMs-AutoRound
Collection
21 items
•
Updated
•
3
This model is an int4 model with group_size 128 and symmetric quantization of google/gemma-3-12b-it generated by intel/auto-round algorithm.
Please follow the license of the original model.
we found the unquantized layer must run on BF16 or FP32, so cuda inference is not available now.
Requirements
pip install auto-round
pip uninstall intel-extension-for-pytorch
pip install intel-extension-for-transformers
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch
from auto_round import AutoRoundConfig
model_id = "OPEA/gemma-3-12b-it-int4-AutoRound-cpu"
quantization_config = AutoRoundConfig(backend="cpu")
model = Gemma3ForConditionalGeneration.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="cpu", quantization_config=quantization_config
).eval()
processor = AutoProcessor.from_pretrained(model_id)
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
},
{
"role": "user",
"content": [
{"type": "image",
"image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
{"type": "text", "text": "Describe this image in detail."}
]
}
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
"""Here's a detailed description of the image:
**Overall Impression:**
The image is a close-up shot of a vibrant garden scene, focusing on a pink cosmos flower with a bumblebee visiting it. The overall feel is natural, colorful, and slightly wild.
**Main Elements:**
* **Cosmos Flower:** The central focus is a large, pink cosmos flower with broad petals. It's in full bloom and appears healthy.
* **Bumblebee:**"""
Here is the sample command to reproduce the model.
pip install git+https://github.com/intel/auto-round.git@main
auto-round-mllm \
--model google/gemma-3-12b-it \
--device 0 \
--bits 4 \
--batch_size 1 \
--gradient_accumulate_steps 8 \
--format 'auto_round' \
--output_dir "./tmp_autoround"