Improve model card: Add `library_name`, overview, and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +79 -4
README.md CHANGED
@@ -1,12 +1,13 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - psp-dada/SENTINEL
5
  language:
6
  - en
7
- base_model:
8
- - llava-hf/llava-v1.6-vicuna-13b-hf
9
  pipeline_tag: image-text-to-text
 
10
  ---
11
 
12
  # Model Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc -->
@@ -15,5 +16,79 @@ pipeline_tag: image-text-to-text
15
  <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
16
  <a href='https://github.com/pspdada/SENTINEL'>
17
  <img src='https://img.shields.io/badge/Github-Repo-Green'></a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- For the details of this model, please refer to the [documentation](https://github.com/pspdada/SENTINEL?tab=readme-ov-file#-model-weights) of the GitHub repo.
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - llava-hf/llava-v1.6-vicuna-13b-hf
4
  datasets:
5
  - psp-dada/SENTINEL
6
  language:
7
  - en
8
+ license: apache-2.0
 
9
  pipeline_tag: image-text-to-text
10
+ library_name: transformers
11
  ---
12
 
13
  # Model Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc -->
 
16
  <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
17
  <a href='https://github.com/pspdada/SENTINEL'>
18
  <img src='https://img.shields.io/badge/Github-Repo-Green'></a>
19
+ <a href='https://huggingface.co/datasets/psp-dada/SENTINEL'>
20
+ <img src='https://img.shields.io/badge/Datasets-HF-Green'></a>
21
+
22
+ ## About SENTINEL
23
+
24
+ **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning) is a framework designed to prevent and mitigate object hallucinations in multimodal large language models (MLLMs). It introduces an automatic, sentence-level early intervention strategy to address the critical insight that hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.
25
+
26
+ SENTINEL eliminates the dependency on human annotations by bootstrapping high-quality in-domain preference pairs. This is achieved by iteratively sampling model outputs, validating object existence through cross-checking with two open-vocabulary detectors, and classifying sentences into hallucinated/non-hallucinated categories. Subsequently, it uses context-coherent positive samples and hallucinated negative samples to build context-aware preference data iteratively. Finally, models are trained using a context-aware preference loss (C-DPO) that emphasizes discriminative learning at the sentence level where hallucinations initially manifest. Experimental results show SENTINEL can reduce hallucinations by over 90% compared to the original model and outperforms previous state-of-the-art methods on both hallucination benchmarks and general capabilities benchmarks, demonstrating its superiority and generalization ability.
27
+
28
+ ## Key Features
29
+
30
+ * **Annotation-free**: SENTINEL constructs high-quality in-domain preference pairs without requiring human labeling.
31
+ * **Model-agnostic**: The framework is compatible with any MLLM architecture.
32
+ * **Efficient**: It achieves state-of-the-art results with lightweight LoRA fine-tuning.
33
+ * **Early intervention**: Halts hallucination propagation by intervening at the sentence level where hallucinations initially emerge.
34
+ * **Context-aware preference learning**: Emphasizes discriminative learning from context-coherent positive samples to boost generalization and robustness.
35
+ * **State-of-the-art results**: Achieves significant reduction in hallucinations and improved general task performance across various benchmarks.
36
+
37
+ ## Usage
38
+
39
+ You can use this model with the Hugging Face `transformers` library. Since this is a LoRA adapter, it will automatically load the base model (`llava-hf/llava-v1.6-vicuna-13b-hf`) and apply the SENTINEL weights.
40
+
41
+ ```python
42
+ from transformers import AutoProcessor, AutoModelForCausalLM
43
+ from PIL import Image
44
+ import requests
45
+ import torch
46
+
47
+ # Load the model and processor
48
+ model_id = "psp-dada/LLaVA-v1.6-Vicuna-13B-SENTINEL"
49
+ processor = AutoProcessor.from_pretrained(model_id)
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ model_id,
52
+ torch_dtype=torch.float16,
53
+ low_cpu_mem_usage=True
54
+ )
55
+ # Move model to GPU if available
56
+ if torch.cuda.is_available():
57
+ model = model.to("cuda")
58
+
59
+ # Example inference: Describe a cat image
60
+ prompt = "What is shown in this image?"
61
+ image_url = "https://llava-vl.github.io/static/images/a_picture_of_a_cat.jpg" # Example image
62
+ image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
63
+
64
+ # Prepare inputs
65
+ inputs = processor(text=prompt, images=image, return_tensors="pt")
66
+ if torch.cuda.is_available():
67
+ inputs = {k: v.to("cuda") for k, v in inputs.items()}
68
+
69
+ # Generate output
70
+ with torch.no_grad():
71
+ output_ids = model.generate(**inputs, max_new_tokens=100, do_sample=False)
72
+
73
+ # Decode and print the result
74
+ generated_text = processor.decode(output_ids[0], skip_special_tokens=True)
75
+ print(f"Prompt: {prompt}
76
+ Response: {generated_text}")
77
+ ```
78
+
79
+ ## More Details
80
+
81
+ For further details on data generation, training, and evaluation, please refer to the [official GitHub repository](https://github.com/pspdada/SENTINEL). You can also find additional model weights in the [SENTINEL models collection](https://huggingface.co/collections/psp-dada/sentinel-686ea70912079af142015286).
82
+
83
+ ## Citation
84
+
85
+ If you find our model/code/data/paper helpful, please consider citing our papers 📝 and star us ⭐️!
86
 
87
+ ```bibtex
88
+ @article{peng2025mitigating,
89
+ title={Mitigating Object Hallucinations via Sentence-Level Early Intervention},
90
+ author={Peng, Shangpin and Yang, Senqiao and Jiang, Li and Tian, Zhuotao},
91
+ journal={arXiv preprint arXiv:2507.12455},
92
+ year={2025}
93
+ }
94
+ ```