This is the Qwen/Qwen2.5-VL-3B-Instruct model, converted to OpenVINO, with nf4 weights for the language model, int8 weights for the other models. The nf4 weights are compressed with symmetric, channel-wise quantization. The model works on Intel NPU. See below for the model export command/properties.
Download Model
To download the model, run pip install huggingface-hub[cli]
and then:
huggingface-cli download helenai/Qwen2.5-VL-3B-Instruct-ov-nf4-npu --local-dir Qwen2.5-VL-3B-Instruct-ov-nf4-npu
Run inference with OpenVINO GenAI
Use OpenVINO GenAI to run inference on this model. This model works with OpenVINO GenAI 2025.3 and later. Make sure to use the latest NPU driver (Windows, Linux)
- Install OpenVINO GenAI and pillow:
pip install --upgrade openvino-genai pillow
- Download a test image:
curl -O "https://storage.openvinotoolkit.org/test_data/images/dog.jpg"
- Run inference:
import numpy as np
import openvino as ov
import openvino_genai
from PIL import Image
# CACHE_DIR caches the model the first time, so subsequent model loading will be faster
pipeline_config = {"CACHE_DIR": "model_cache"}
pipe = openvino_genai.VLMPipeline("Qwen2.5-VL-3B-Instruct-ov-nf4-npu", "NPU", **pipeline_config)
image = Image.open("dog.jpg")
# optional: resizing to a smaller size (depending on image and prompt) is often useful to speed up inference.
image = image.resize((128, 128))
image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.uint8)
image_data = ov.Tensor(image_data)
prompt = "Can you describe the image?"
result = pipe.generate(prompt, image=image_data, max_new_tokens=100)
print(result.texts[0])
Model export properties
Model export command:
optimum-cli export openvino -m Qwen/Qwen2.5-VL-3B-Instruct --weight-format nf4 --group-size -1 --sym Qwen2.5-VL-3B-Instruct-ov-nf4-npu
Framework versions
openvino : 2025.3.0-19807-44526285f24-releases/2025/3
nncf : 2.18.0
optimum_intel : 1.26.0.dev0+bc13ae5
optimum : 1.27.0
pytorch : 2.7.1
transformers : 4.51.3
LLM export properties
all_layers : False
awq : False
backup_mode : int8_asym
compression_format : dequantize
gptq : False
group_size : -1
ignored_scope : []
lora_correction : False
mode : nf4
ratio : 1.0
scale_estimation : False
sensitivity_metric : weight_quantization_error
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for helenai/Qwen2.5-VL-3B-Instruct-ov-nf4-npu
Base model
Qwen/Qwen2.5-VL-3B-Instruct