Model Details

This model is an int4 model with group_size 128 and symmetric quantization of Qwen/QVQ-72B-Preview generated by intel/auto-round. Load the model with revision="118625f" to use AutoGPTQ format.

How To Use

INT4 Inference

from auto_round import AutoRoundConfig ## must import for auto-round format
import requests
from PIL import Image
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
quantized_model_path="OPEA/QVQ-72B-Preview-int4-sym-inc"
model = Qwen2VLForConditionalGeneration.from_pretrained(
    quantized_model_path,
    torch_dtype="auto",
    device_map="auto",
    ##revision="118625f" ##AutoGPTQ format
)
processor = AutoProcessor.from_pretrained(quantized_model_path)
image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/QVQ/demo.png"
messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."}
        ],
    },
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image_url,
            },
            {"type": "text", "text": "What value should be filled in the blank space?"},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs = Image.open(requests.get(image_url, stream=True).raw)
inputs = processor(
    text=[text],
    images=image_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])

##INT4:
# So I've got this puzzle here with emojis representing numbers, and I need to figure out what goes in the blank space. Let's see, there are hearts, bows, and a dog emoji. Each probably stands for a certain number, and I have to solve for them based on the equations given.
# First, there's an equation with four hearts added up to make 24. So, 4 hearts = 24. That seems straightforward. If I divide both sides by 4, I get one heart = 6. Okay, that makes sense.
# Next, there's an equation with one heart minus one bow equals

##BF16:
# So I've got this puzzle here with emojis representing numbers, and I need to figure out what goes in the blank space. Let's see, there are four equations, and the last one has a blank box where the result should be. The emojis used are hearts, bows, and dogs. I need to assign numbers to these emojis based on the equations provided.
# First, let's look at the first equation:
# Heart + Heart + Heart + Heart + Heart = 24
# So, there are five hearts added together equaling 24. Let's call the heart value "h". So:5h = 24



image_url = "http://images.cocodataset.org/train2017/000000411975.jpg"
messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."}
        ],
    },
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image_url,
            },
            {"type": "text", "text": "图片中的棒球场上有多少人?"},
        ],
    }
]
##INT4:
# So I've got this image of a baseball field, and there are a few people on it. Let me try to count how many people there are. First, I see three people near the infield grass. One of them is bending over, maybe picking something up, and another is standing next to them, also bent over. Then there's a third person standing nearby, wearing a striped shirt and khaki shorts. He seems to be observing or waiting.
# Wait, actually, looking closer, it seems like two of them are bending over towards the ground, perhaps picking up baseballs or something similar. The person in the striped shirt is


##BF16:
## So I've got this image of a baseball field, and there are a few people on it. Let me try to describe what I see.
# First off, the field itself has a mix of grass and dirt. The infield is dirt, and the outfield is grass, which is pretty standard for a baseball field. There are three main people in the scene.
# Starting from the left, there's a person bending over, picking something up from the ground. They're wearing a light blue shirt and dark blue pants. Next to them, another person is also bent over, but they're wearing a white shirt and light-colored pants. Both of

image_url = "https://intelcorp.scene7.com/is/image/intelcorp/processor-overview-framed-badge:1920-1080?wid=480&hei=270"
messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."}
        ],
    },
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image_url,
            },
            {"type": "text", "text": "这张图片代表哪家公司?"},
        ],
    }
]
##INT4:
## 这张图片代表了Intel公司,是一家美国的半导体公司。

##BF16:
## 这张图片代表的是英特尔公司(Intel Corporation)。图中的标志是英特尔著名的“intel inside”标识,它由两个单词组成:“intel”和“inside”,其中“intel”是公司名称,“inside”表示英特尔的处理器或芯片被内置于各种电子设备中,尤其是计算机和笔记本电脑中。这个标志通常被贴在使用英特尔处理器的设备上,以表明其内部搭载了英特尔的芯片。英特尔是一家总部位于美国加州圣克拉拉的跨国公
# 司,是全球最大的半导体芯片制造商之一,也是x86架构微处理器的开创者。

Generate the model

Here is the sample command to reproduce the model.

pip install auto-round
auto-round-mllm
--model Qwen/QVQ-72B-Preview \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsample 512 \
--low_gpu_mem_usage \
--seqlen 2048 \
--model_dtype "float16" \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

  • Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Downloads last month
49
Safetensors
Model size
12.6B params
Tensor type
I32
·
FP16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for OPEA/QVQ-72B-Preview-int4-sym-inc

Base model

Qwen/Qwen2-VL-72B
Quantized
(31)
this model

Dataset used to train OPEA/QVQ-72B-Preview-int4-sym-inc

Collections including OPEA/QVQ-72B-Preview-int4-sym-inc