arxiv:2503.12127

Hyperbolic Safety-Aware Vision-Language Models

Published on Mar 15

· Submitted by

tobi1modna on Mar 19

Upvote

Authors:

Tobia Poppi ,

Tejaswi Kasarla ,

Abstract

Addressing the retrieval of unsafe content from vision-language models such as CLIP is an important step towards real-world integration. Current efforts have relied on unlearning techniques that try to erase the model's knowledge of unsafe concepts. While effective in reducing unwanted outputs, unlearning limits the model's capacity to discern between safe and unsafe content. In this work, we introduce a novel approach that shifts from unlearning to an awareness paradigm by leveraging the inherent hierarchical properties of the hyperbolic space. We propose to encode safe and unsafe content as an entailment hierarchy, where both are placed in different regions of hyperbolic space. Our HySAC, Hyperbolic Safety-Aware CLIP, employs entailment loss functions to model the hierarchical and asymmetrical relations between safe and unsafe image-text pairs. This modelling, ineffective in standard vision-language models due to their reliance on Euclidean embeddings, endows the model with awareness of unsafe content, enabling it to serve as both a multimodal unsafe classifier and a flexible content retriever, with the option to dynamically redirect unsafe queries toward safer alternatives or retain the original output. Extensive experiments show that our approach not only enhances safety recognition but also establishes a more adaptable and interpretable framework for content moderation in vision-language models. Our source code is available at https://github.com/aimagelab/HySAC.

View arXiv page View PDF Add to collection

Community

tobi1modna

Paper author Paper submitter about 13 hours ago

I am excited to share our work "Hyperbolic Safety-Aware Vision-Language Models", where we introduce HySAC (Hyperbolic Safety-Aware CLIP), a novel approach to handling unsafe content in vision-language models. Instead of unlearning NSFW content—thereby removing the ability to discern between safe and unsafe content—we propose a safety-awareness paradigm by leveraging hyperbolic space to structure safe and unsafe embeddings hierarchically.

🔹 Key Contributions:
Hyperbolic Safety Encoding: We structure safe content closer to the hyperbolic origin while placing unsafe content in more distant regions, allowing for dynamic safe retrievals or controlled NSFW exposure.`

Entailment-Based Safety Learning: Our approach introduces entailment constraints that enable safety-aware retrieval, outperforming previous unlearning-based methods.

Flexible Safety Traversals: By adjusting embeddings along the hyperbolic space, we can redirect unsafe queries toward relevant but safe alternatives.

📌 Results:
State-of-the-art retrieval performance in safe content retrieval and NSFW moderation.
Clear hierarchical separation of safe and unsafe content within the embedding space.
Improved interpretability and control over content moderation in vision-language models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.12127 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.12127 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.