SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Paper
•
2502.14786
•
Published
•
157
Image-Guard-2.0-Post0.1 is a multiclass image safety classification model fine-tuned from google/siglip2-base-patch16-224. It classifies images into multiple safety-related categories using the SiglipForImageClassification architecture.
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786
Classification report:
precision recall f1-score support
Anime-SFW 0.8906 0.8766 0.8835 5600
Hentai 0.9081 0.8892 0.8986 4180
Normal-SFW 0.9010 0.8784 0.8896 5503
Pornography 0.9489 0.9448 0.9469 5600
Enticing or Sensual 0.8900 0.9436 0.9160 5600
accuracy 0.9076 26483
macro avg 0.9077 0.9065 0.9069 26483
weighted avg 0.9077 0.9076 0.9074 26483
| Class ID | Label | Description |
|---|---|---|
| 0 | Anime-SFW | Safe-for-work anime-style images. |
| 1 | Hentai | Explicit or adult anime content. |
| 2 | Normal-SFW | Realistic or photographic images that are safe for work. |
| 3 | Pornography | Explicit adult content involving nudity or sexual acts. |
| 4 | Enticing or Sensual | Suggestive imagery that is not explicit but intended to evoke sensuality. |
This model is experimental and may or may not be considered for actual use.
pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Image-Guard-2.0-Post0.1"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
"0": "Anime-SFW",
"1": "Hentai",
"2": "Normal-SFW",
"3": "Pornography",
"4": "Enticing or Sensual"
}
def classify_image_safety(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_image_safety,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=5, label="Image Safety Classification"),
title="Image-Guard-2.0-Post0.1",
description="Upload an image to classify it into one of five safety categories: Anime-SFW, Hentai, Normal-SFW, Pornography, or Enticing/Sensual."
)
if __name__ == "__main__":
iface.launch()
Image-Guard-2.0-Post0.1 is designed for:
Base model
google/siglip2-base-patch16-224