Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
This repository contains Q-MLLM, a novel architecture presented in the paper Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security.
Code: https://github.com/Amadeuszhao/QMLLM
π Overview
Q-MLLM is a novel architecture that integrates two-level vector quantization to create a discrete bottleneck against adversarial attacks while preserving multimodal reasoning capabilities. Our approach achieves:
- 98.4% average Defense Success Rate (DSR) against jailbreak attacks
- 75.9% DSR against toxic image attacks
- 100% perfect defense against gradient-based ImgJP attacks
- Minimal inference overhead with competitive utility performance
β οΈ Content Warning
The original GitHub repository's images/ folder contains test images depicting harmful content including:
- Violence and gore (blood, weapons)
- Adult content (pornography)
- Harmful substances (alcohol, cigarettes)
- Offensive gestures
Please exercise caution when accessing these files. They are included solely for research and evaluation purposes.
π Quick Start
Installation
# Clone the repository
git clone https://github.com/Amadeuszhao/Q-MLLM.git
cd Q-MLLM
# Install requirements
pip install -r requirements.txt
Usage
from transformers import LlavaProcessor
from unitok_qllava import LlavaForConditionalGeneration
from utils.process import process_with_unitok
from PIL import Image
import torch
# Load model and processor
model_path = "vincentchao/qmllm_unitok"
model = LlavaForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.float16,
).cuda()
processor = LlavaProcessor.from_pretrained(model_path)
# Process an image
image = Image.open("path/to/image.jpg").convert('RGB')
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image"},
],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = process_with_unitok(processor, images=image, text=prompt, return_tensors="pt").to(model.device).to(model.dtype)
# Generate response
input_length = inputs['input_ids'].shape[1]
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=100)
description = processor.decode(output[0][input_length:], skip_special_tokens=True)
print(description)
Handling Toxic Content Detection
When harmful content is detected, the model raises a ValueError with category information:
try:
# Process image as shown above
generated_text = processor.decode(output[0][input_length:], skip_special_tokens=True)
print(f"β Image is safe, generated result:\n{generated_text}")
except ValueError as e:
# Extract category information from error message
error_msg = str(e)
category_id = int(error_msg.split("category=")[-1].strip())
category_name = id2class[category_id]
print(f"β οΈ Harmful content detected!")
print(f" Category ID: {category_id}")
print(f" Category Name: {category_name}")
π Performance Highlights
| Defense Method | Jailbreak DSR | Image DSR | Overhead |
|---|---|---|---|
| LLaVA-1.5 | 49.5% | 1.0% | - |
| MLLM-Protector | 91.7% | 53.3% | High |
| ETA | 92.1% | 54.7% | High |
| Q-MLLM-7B | 98.4% | 75.9% | Minimal |
| Q-MLLM v2.0 (UniTok) | 98.4% | 78.4% | Minimal |
π¬ Technical Details
Two-Level Vector Quantization
- Semantic-Level Quantization: Maps global CLS tokens to discrete semantic embeddings (K=128 codebook)
- Patch-Level Quantization: Discretizes spatial features for fine-grained robustness (P=16000 codebook)
Defense Mechanisms
- Stop-Gradient Operation: Blocks backpropagation through quantization
- Discretization Bottleneck: Forces continuous features into finite discrete space
- Enhanced CLS Detection: Improved semantic alignment for toxic content classification
β οΈ Ethical Use Notice β οΈ
This research tool is designed to improve AI safety. The toxic content detection capabilities should only be used for:
- Research purposes
- Safety evaluation
- Model robustness testing
Do not use this tool to generate, distribute, or process harmful content for malicious purposes.
- Downloads last month
- 26