Improve model card: Add `library_name`, overview, and usage example
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,12 +1,13 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
datasets:
|
4 |
- psp-dada/SENTINEL
|
5 |
language:
|
6 |
- en
|
7 |
-
|
8 |
-
- llava-hf/llava-v1.6-vicuna-13b-hf
|
9 |
pipeline_tag: image-text-to-text
|
|
|
10 |
---
|
11 |
|
12 |
# Model Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc -->
|
@@ -15,5 +16,79 @@ pipeline_tag: image-text-to-text
|
|
15 |
<img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
|
16 |
<a href='https://github.com/pspdada/SENTINEL'>
|
17 |
<img src='https://img.shields.io/badge/Github-Repo-Green'></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- llava-hf/llava-v1.6-vicuna-13b-hf
|
4 |
datasets:
|
5 |
- psp-dada/SENTINEL
|
6 |
language:
|
7 |
- en
|
8 |
+
license: apache-2.0
|
|
|
9 |
pipeline_tag: image-text-to-text
|
10 |
+
library_name: transformers
|
11 |
---
|
12 |
|
13 |
# Model Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc -->
|
|
|
16 |
<img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
|
17 |
<a href='https://github.com/pspdada/SENTINEL'>
|
18 |
<img src='https://img.shields.io/badge/Github-Repo-Green'></a>
|
19 |
+
<a href='https://huggingface.co/datasets/psp-dada/SENTINEL'>
|
20 |
+
<img src='https://img.shields.io/badge/Datasets-HF-Green'></a>
|
21 |
+
|
22 |
+
## About SENTINEL
|
23 |
+
|
24 |
+
**SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning) is a framework designed to prevent and mitigate object hallucinations in multimodal large language models (MLLMs). It introduces an automatic, sentence-level early intervention strategy to address the critical insight that hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.
|
25 |
+
|
26 |
+
SENTINEL eliminates the dependency on human annotations by bootstrapping high-quality in-domain preference pairs. This is achieved by iteratively sampling model outputs, validating object existence through cross-checking with two open-vocabulary detectors, and classifying sentences into hallucinated/non-hallucinated categories. Subsequently, it uses context-coherent positive samples and hallucinated negative samples to build context-aware preference data iteratively. Finally, models are trained using a context-aware preference loss (C-DPO) that emphasizes discriminative learning at the sentence level where hallucinations initially manifest. Experimental results show SENTINEL can reduce hallucinations by over 90% compared to the original model and outperforms previous state-of-the-art methods on both hallucination benchmarks and general capabilities benchmarks, demonstrating its superiority and generalization ability.
|
27 |
+
|
28 |
+
## Key Features
|
29 |
+
|
30 |
+
* **Annotation-free**: SENTINEL constructs high-quality in-domain preference pairs without requiring human labeling.
|
31 |
+
* **Model-agnostic**: The framework is compatible with any MLLM architecture.
|
32 |
+
* **Efficient**: It achieves state-of-the-art results with lightweight LoRA fine-tuning.
|
33 |
+
* **Early intervention**: Halts hallucination propagation by intervening at the sentence level where hallucinations initially emerge.
|
34 |
+
* **Context-aware preference learning**: Emphasizes discriminative learning from context-coherent positive samples to boost generalization and robustness.
|
35 |
+
* **State-of-the-art results**: Achieves significant reduction in hallucinations and improved general task performance across various benchmarks.
|
36 |
+
|
37 |
+
## Usage
|
38 |
+
|
39 |
+
You can use this model with the Hugging Face `transformers` library. Since this is a LoRA adapter, it will automatically load the base model (`llava-hf/llava-v1.6-vicuna-13b-hf`) and apply the SENTINEL weights.
|
40 |
+
|
41 |
+
```python
|
42 |
+
from transformers import AutoProcessor, AutoModelForCausalLM
|
43 |
+
from PIL import Image
|
44 |
+
import requests
|
45 |
+
import torch
|
46 |
+
|
47 |
+
# Load the model and processor
|
48 |
+
model_id = "psp-dada/LLaVA-v1.6-Vicuna-13B-SENTINEL"
|
49 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
50 |
+
model = AutoModelForCausalLM.from_pretrained(
|
51 |
+
model_id,
|
52 |
+
torch_dtype=torch.float16,
|
53 |
+
low_cpu_mem_usage=True
|
54 |
+
)
|
55 |
+
# Move model to GPU if available
|
56 |
+
if torch.cuda.is_available():
|
57 |
+
model = model.to("cuda")
|
58 |
+
|
59 |
+
# Example inference: Describe a cat image
|
60 |
+
prompt = "What is shown in this image?"
|
61 |
+
image_url = "https://llava-vl.github.io/static/images/a_picture_of_a_cat.jpg" # Example image
|
62 |
+
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
|
63 |
+
|
64 |
+
# Prepare inputs
|
65 |
+
inputs = processor(text=prompt, images=image, return_tensors="pt")
|
66 |
+
if torch.cuda.is_available():
|
67 |
+
inputs = {k: v.to("cuda") for k, v in inputs.items()}
|
68 |
+
|
69 |
+
# Generate output
|
70 |
+
with torch.no_grad():
|
71 |
+
output_ids = model.generate(**inputs, max_new_tokens=100, do_sample=False)
|
72 |
+
|
73 |
+
# Decode and print the result
|
74 |
+
generated_text = processor.decode(output_ids[0], skip_special_tokens=True)
|
75 |
+
print(f"Prompt: {prompt}
|
76 |
+
Response: {generated_text}")
|
77 |
+
```
|
78 |
+
|
79 |
+
## More Details
|
80 |
+
|
81 |
+
For further details on data generation, training, and evaluation, please refer to the [official GitHub repository](https://github.com/pspdada/SENTINEL). You can also find additional model weights in the [SENTINEL models collection](https://huggingface.co/collections/psp-dada/sentinel-686ea70912079af142015286).
|
82 |
+
|
83 |
+
## Citation
|
84 |
+
|
85 |
+
If you find our model/code/data/paper helpful, please consider citing our papers 📝 and star us ⭐️!
|
86 |
|
87 |
+
```bibtex
|
88 |
+
@article{peng2025mitigating,
|
89 |
+
title={Mitigating Object Hallucinations via Sentence-Level Early Intervention},
|
90 |
+
author={Peng, Shangpin and Yang, Senqiao and Jiang, Li and Tian, Zhuotao},
|
91 |
+
journal={arXiv preprint arXiv:2507.12455},
|
92 |
+
year={2025}
|
93 |
+
}
|
94 |
+
```
|