Qwen-VL-2.5 CPPE-5 Finetuned

This model has been trained with 900 samples of the CPPE-5 Dataset.

In order to use it:

!pip install vllm transformers accelerate pillow qwen-vl-utils[decord]

Load the model:

import os
from transformers import AutoProcessor
from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info  # Ensure installed: !pip install qwen-vl-utils[decord]

# Use the AWQ-quantized (or full-precision if needed) multimodal model.

MODEL_PATH = "javiagu/Qwen-2.5-CPPE-5-AWQ"
# Initialize the vLLM LLM for offline inference.
llm = LLM(
    model=MODEL_PATH,
    dtype="half",  # Use half (float16) instead of bfloat16
    max_model_len=4048,
    # Removed 'limit_mm_per_prompt' to avoid the error.
)

Call the model as follows:

# Define sampling parameters.
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.8,
    repetition_penalty=1.05,
    max_tokens=2000
)

# Prepare the multi-modal message.
local_image_path = os.path.abspath("/content/test.jpg")
image_message = {
    "type": "image",
    "image": f"file://{local_image_path}",
    "min_pixels": 224 * 224,         # adjust as needed
    "max_pixels": 1280 * 28 * 28,      # adjust as needed
}
messages = [
    {"role": "system", "content": "Your task will be he predefined main task.\n"},
    {
        "role": "user",
        "content": [
            image_message,
            {"type": "text", "text": "Please, perform the main task:"}
        ]
    }
]

# Load the processor to prepare the prompt.
processor = AutoProcessor.from_pretrained(MODEL_PATH)
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Process visual inputs using qwen_vl_utils.
image_inputs, video_inputs, video_kwargs = process_vision_info(messages, return_video_kwargs=True)
mm_data = {}
if image_inputs is not None:
    mm_data["image"] = image_inputs
if video_inputs is not None:
    mm_data["video"] = video_inputs

# Prepare the input for the LLM.
llm_inputs = {
    "prompt": prompt,
    "multi_modal_data": mm_data,
    "mm_processor_kwargs": video_kwargs,
}

# Run offline inference with vLLM.
outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
generated_text = outputs[0].outputs[0].text

print("Generated Response:")
print(generated_text)

This model was trained in AWS p4de cluster. The input text that will perform the bounding box should be the one described avobe in the code (just copy and paste the code).

System: Your task will be he predefined main task.\n User: Please, perform the main task:

The reason this was the system and user prompt is because I was lazy to get creative. The model works, so, enjoy :)

javiagu
/

Qwen-2.5-CPPE-5

Qwen-VL-2.5 CPPE-5 Finetuned

Model tree for javiagu/Qwen-2.5-CPPE-5

Dataset used to train javiagu/Qwen-2.5-CPPE-5