Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

This repository contains Q-MLLM, a novel architecture presented in the paper Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security.

Code: https://github.com/Amadeuszhao/QMLLM

πŸ“– Overview

Q-MLLM is a novel architecture that integrates two-level vector quantization to create a discrete bottleneck against adversarial attacks while preserving multimodal reasoning capabilities. Our approach achieves:

  • 98.4% average Defense Success Rate (DSR) against jailbreak attacks
  • 75.9% DSR against toxic image attacks
  • 100% perfect defense against gradient-based ImgJP attacks
  • Minimal inference overhead with competitive utility performance

⚠️ Content Warning

The original GitHub repository's images/ folder contains test images depicting harmful content including:

  • Violence and gore (blood, weapons)
  • Adult content (pornography)
  • Harmful substances (alcohol, cigarettes)
  • Offensive gestures

Please exercise caution when accessing these files. They are included solely for research and evaluation purposes.

πŸš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/Amadeuszhao/Q-MLLM.git
cd Q-MLLM

# Install requirements
pip install -r requirements.txt

Usage

from transformers import LlavaProcessor
from unitok_qllava import LlavaForConditionalGeneration
from utils.process import process_with_unitok
from PIL import Image
import torch

# Load model and processor
model_path = "vincentchao/qmllm_unitok"
model = LlavaForConditionalGeneration.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
).cuda()

processor = LlavaProcessor.from_pretrained(model_path)

# Process an image
image = Image.open("path/to/image.jpg").convert('RGB')

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image"},
        ],
    }, 
]

prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = process_with_unitok(processor, images=image, text=prompt, return_tensors="pt").to(model.device).to(model.dtype)

# Generate response
input_length = inputs['input_ids'].shape[1]

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=100)
    
description = processor.decode(output[0][input_length:], skip_special_tokens=True)
print(description)

Handling Toxic Content Detection

When harmful content is detected, the model raises a ValueError with category information:

try:
    # Process image as shown above
    generated_text = processor.decode(output[0][input_length:], skip_special_tokens=True)
    print(f"βœ“ Image is safe, generated result:\n{generated_text}")
    
except ValueError as e:
    # Extract category information from error message
    error_msg = str(e)
    category_id = int(error_msg.split("category=")[-1].strip())
    category_name = id2class[category_id]
    print(f"⚠️ Harmful content detected!")
    print(f"   Category ID: {category_id}")
    print(f"   Category Name: {category_name}")

πŸ“Š Performance Highlights

Defense Method Jailbreak DSR Image DSR Overhead
LLaVA-1.5 49.5% 1.0% -
MLLM-Protector 91.7% 53.3% High
ETA 92.1% 54.7% High
Q-MLLM-7B 98.4% 75.9% Minimal
Q-MLLM v2.0 (UniTok) 98.4% 78.4% Minimal

πŸ”¬ Technical Details

Two-Level Vector Quantization

  1. Semantic-Level Quantization: Maps global CLS tokens to discrete semantic embeddings (K=128 codebook)
  2. Patch-Level Quantization: Discretizes spatial features for fine-grained robustness (P=16000 codebook)

Defense Mechanisms

  • Stop-Gradient Operation: Blocks backpropagation through quantization
  • Discretization Bottleneck: Forces continuous features into finite discrete space
  • Enhanced CLS Detection: Improved semantic alignment for toxic content classification

⚠️ Ethical Use Notice ⚠️

This research tool is designed to improve AI safety. The toxic content detection capabilities should only be used for:

  • Research purposes
  • Safety evaluation
  • Model robustness testing

Do not use this tool to generate, distribute, or process harmful content for malicious purposes.

Downloads last month
26
Safetensors
Model size
8B params
Tensor type
F16
Β·
I64
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support