Improve model card: Add `library_name`, overview, key features, and usage example
Browse filesThis PR significantly enhances the model card for the `psp-dada/LLaVA-v1.5-13B-SENTINEL` model by:
* Adding the `library_name: transformers` metadata, which enables the "Use in Transformers" button and improves discoverability on the Hub.
* Including a comprehensive overview of the **SENTINEL** framework, derived from the paper's abstract and the project's GitHub README, detailing its purpose, advantages, and key features.
* Adding a runnable Python code snippet demonstrating how to load and use the model with the `transformers` library and `peft`, providing a clear starting point for users.
* Incorporating additional badges for the associated Hugging Face Dataset, Model Collection, and the Hugging Face Paper page (as a "Discussion" link) for better navigation.
* Adding dedicated sections for the **SENTINEL Dataset**, **Model Weights**, **Acknowledgments**, and **Citation** for a more complete documentation.
These improvements make the model card much more informative and user-friendly directly on huggingface.co.
@@ -1,19 +1,128 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
datasets:
|
4 |
- psp-dada/SENTINEL
|
5 |
language:
|
6 |
- en
|
7 |
-
|
8 |
-
- liuhaotian/llava-v1.5-13b
|
9 |
pipeline_tag: image-text-to-text
|
|
|
10 |
---
|
11 |
|
12 |
-
# Model Card for SENTINEL
|
13 |
|
14 |
<a href='https://arxiv.org/abs/2507.12455'>
|
15 |
<img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
|
16 |
<a href='https://github.com/pspdada/SENTINEL'>
|
17 |
<img src='https://img.shields.io/badge/Github-Repo-Green'></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- liuhaotian/llava-v1.5-13b
|
4 |
datasets:
|
5 |
- psp-dada/SENTINEL
|
6 |
language:
|
7 |
- en
|
8 |
+
license: apache-2.0
|
|
|
9 |
pipeline_tag: image-text-to-text
|
10 |
+
library_name: transformers
|
11 |
---
|
12 |
|
13 |
+
# Model Card for SENTINEL: Mitigating Object Hallucinations via Sentence-Level Early Intervention
|
14 |
|
15 |
<a href='https://arxiv.org/abs/2507.12455'>
|
16 |
<img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
|
17 |
<a href='https://github.com/pspdada/SENTINEL'>
|
18 |
<img src='https://img.shields.io/badge/Github-Repo-Green'></a>
|
19 |
+
<a href='https://huggingface.co/datasets/psp-dada/SENTINEL'>
|
20 |
+
<img src='https://img.shields.io/badge/Datasets-HF-Green'></a>
|
21 |
+
<a href='https://huggingface.co/collections/psp-dada/sentinel-686ea70912079af142015286'>
|
22 |
+
<img src='https://img.shields.io/badge/Models-HF-orange'></a>
|
23 |
+
<a href='https://huggingface.co/papers/2507.12455'>
|
24 |
+
<img src='https://img.shields.io/badge/Discussion-HF-blue'></a>
|
25 |
+
|
26 |
+
## Overview
|
27 |
+
|
28 |
+
**SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning) is a novel framework designed to prevent and mitigate object hallucinations in multimodal large language models (MLLMs). It addresses the critical insight that hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.
|
29 |
+
|
30 |
+
SENTINEL introduces an automatic, sentence-level early intervention strategy that eliminates the dependency on human annotations. It achieves this by iteratively sampling model outputs, validating object existence through cross-checking with open-vocabulary detectors, and classifying sentences into hallucinated/non-hallucinated categories. Subsequently, it builds context-aware preference data and trains models using a novel Context-aware Preference Loss (C-DPO) that emphasizes discriminative learning at the sentence level where hallucinations initially manifest.
|
31 |
+
|
32 |
+
## Key Features
|
33 |
+
|
34 |
+
* **Annotation-free**: No human labeling required for training data construction.
|
35 |
+
* **Model-agnostic**: Compatible with any MLLM architecture.
|
36 |
+
* **Efficient**: Achieves robust hallucination mitigation through lightweight LoRA fine-tuning.
|
37 |
+
* **Early intervention**: Halts hallucination propagation by addressing them at their initial manifestation, maximizing mitigation.
|
38 |
+
* **In-domain contextual preference learning**: Constructs high-quality preference pairs via detector cross-validation without relying on proprietary LLMs or manual annotations.
|
39 |
+
* **Context-aware robustness**: Prioritizes context-coherent positive samples to significantly boost generalization.
|
40 |
+
* **State-of-the-art results**: SENTINEL reduces hallucinations by over 90% compared to original models and outperforms prior state-of-the-art methods on various hallucination and general capabilities benchmarks.
|
41 |
+
|
42 |
+
## Usage
|
43 |
+
|
44 |
+
This model is a LoRA adapter designed to be seamlessly plugged into its corresponding base model, `liuhaotian/llava-v1.5-13b`, for inference or further fine-tuning. You can use it with the `transformers` and `peft` libraries.
|
45 |
+
|
46 |
+
First, ensure you have the necessary libraries installed:
|
47 |
+
```bash
|
48 |
+
pip install transformers peft accelerate bitsandbytes
|
49 |
+
pip install "flash-attn<2.4" --no-build-isolation
|
50 |
+
```
|
51 |
+
|
52 |
+
### Example Inference
|
53 |
+
|
54 |
+
Here’s how to load the `LLaVA-v1.5-13B-SENTINEL` LoRA adapter on top of the base `LLaVA-v1.5-13B` model and perform inference:
|
55 |
+
|
56 |
+
```python
|
57 |
+
import torch
|
58 |
+
from transformers import AutoProcessor, AutoModelForCausalLM
|
59 |
+
from peft import PeftModel
|
60 |
+
from PIL import Image
|
61 |
+
import requests
|
62 |
+
|
63 |
+
# 1. Load the base model and its processor
|
64 |
+
base_model_id = "liuhaotian/llava-v1.5-13b"
|
65 |
+
model = AutoModelForCausalLM.from_pretrained(
|
66 |
+
base_model_id,
|
67 |
+
torch_dtype=torch.float16,
|
68 |
+
device_map="auto",
|
69 |
+
trust_remote_code=True # LLaVA models often require this for proper loading
|
70 |
+
)
|
71 |
+
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
|
72 |
+
|
73 |
+
# 2. Load the SENTINEL LoRA adapter
|
74 |
+
# This model is specifically designed as a LoRA adapter on top of LLaVA-v1.5-13B.
|
75 |
+
lora_model_id = "psp-dada/LLaVA-v1.5-13B-SENTINEL"
|
76 |
+
model = PeftModel.from_pretrained(model, lora_model_id)
|
77 |
+
|
78 |
+
# Optionally, merge the LoRA weights into the base model for simpler inference
|
79 |
+
model = model.merge_and_unload()
|
80 |
+
model.eval() # Set model to evaluation mode
|
81 |
+
|
82 |
+
# 3. Prepare your image and prompt
|
83 |
+
# Example image from a public dataset (replace with your own image path or URL)
|
84 |
+
image_url = "https://huggingface.co/datasets/hf-internal-testing/dummy-images/resolve/main/cat-and-dog.jpg"
|
85 |
+
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
|
86 |
+
|
87 |
+
# LLaVA models typically use a specific prompt format, e.g., for instruct models.
|
88 |
+
# Adjust the prompt format as per the base model's requirements if different.
|
89 |
+
prompt = "USER: What are the animals in the picture? ASSISTANT:"
|
90 |
+
|
91 |
+
# 4. Process inputs and generate a response
|
92 |
+
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device, torch.float16)
|
93 |
+
|
94 |
+
with torch.no_grad():
|
95 |
+
output_ids = model.generate(**inputs, max_new_tokens=200, do_sample=False)
|
96 |
+
|
97 |
+
# Decode and print the generated text
|
98 |
+
generated_text = processor.decode(output_ids[0], skip_special_tokens=True)
|
99 |
+
print(generated_text)
|
100 |
+
```
|
101 |
+
For more detailed environment setup, data generation, training, and evaluation instructions, please refer to the [official GitHub repository](https://github.com/pspdada/SENTINEL).
|
102 |
+
|
103 |
+
## Dataset
|
104 |
+
|
105 |
+
The [**SENTINEL Dataset**](https://huggingface.co/datasets/psp-dada/SENTINEL) is an in-domain multimodal preference dataset for mitigating object hallucination, constructed entirely without human annotation. It includes preference pairs for various LLaVA and Qwen2-VL family models, enabling robust and scalable hallucination mitigation.
|
106 |
+
|
107 |
+
## Model Weights
|
108 |
+
|
109 |
+
This repository contains the LoRA adapter weights for `LLaVA-v1.5-13B-SENTINEL`. All models released in the SENTINEL project are trained using LoRA. These weights can be seamlessly plugged into their corresponding base models for inference or further fine-tuning. For a full list of all released SENTINEL models and their base models, please refer to the [Hugging Face collection](https://huggingface.co/collections/psp-dada/sentinel-686ea70912079af142015286).
|
110 |
+
|
111 |
+
## Acknowledgement
|
112 |
+
|
113 |
+
* [LLaVA](https://github.com/haotian-liu/LLaVA): LLaVA-v1.5 is an excellent open-source project on MLLMs.
|
114 |
+
* [HA-DPO](https://github.com/opendatalab/HA-DPO): Our code for the LLaVA-v1.5 part is based on HA-DPO, an influential work in the field of object hallucination in MLLMs. It provided us with valuable inspiration.
|
115 |
+
* [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): A unified and efficient fine-tuning framework of LLMs. Our implementations for LLaVA-v1.6, Qwen2-VL, and Qwen2.5-VL are based on this framework.
|
116 |
+
|
117 |
+
## Citation
|
118 |
+
|
119 |
+
If you find our model/code/data/paper helpful, please consider citing our papers 📝 and starring us ⭐️!
|
120 |
|
121 |
+
```bibtex
|
122 |
+
@article{peng2025mitigating,
|
123 |
+
title={Mitigating Object Hallucinations via Sentence-Level Early Intervention},
|
124 |
+
author={Peng, Shangpin and Yang, Senqiao and Jiang, Li and Tian, Zhuotao},
|
125 |
+
journal={arXiv preprint arXiv:2507.12455},
|
126 |
+
year={2025}
|
127 |
+
}
|
128 |
+
```
|