nielsr HF Staff commited on
Commit
45ea069
·
verified ·
1 Parent(s): 30fe126

Improve model card: Add `library_name`, overview, key features, and usage example

Browse files

This PR significantly enhances the model card for the `psp-dada/LLaVA-v1.5-13B-SENTINEL` model by:

* Adding the `library_name: transformers` metadata, which enables the "Use in Transformers" button and improves discoverability on the Hub.
* Including a comprehensive overview of the **SENTINEL** framework, derived from the paper's abstract and the project's GitHub README, detailing its purpose, advantages, and key features.
* Adding a runnable Python code snippet demonstrating how to load and use the model with the `transformers` library and `peft`, providing a clear starting point for users.
* Incorporating additional badges for the associated Hugging Face Dataset, Model Collection, and the Hugging Face Paper page (as a "Discussion" link) for better navigation.
* Adding dedicated sections for the **SENTINEL Dataset**, **Model Weights**, **Acknowledgments**, and **Citation** for a more complete documentation.

These improvements make the model card much more informative and user-friendly directly on huggingface.co.

Files changed (1) hide show
  1. README.md +114 -5
README.md CHANGED
@@ -1,19 +1,128 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - psp-dada/SENTINEL
5
  language:
6
  - en
7
- base_model:
8
- - liuhaotian/llava-v1.5-13b
9
  pipeline_tag: image-text-to-text
 
10
  ---
11
 
12
- # Model Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc -->
13
 
14
  <a href='https://arxiv.org/abs/2507.12455'>
15
  <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
16
  <a href='https://github.com/pspdada/SENTINEL'>
17
  <img src='https://img.shields.io/badge/Github-Repo-Green'></a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- For the details of this model, please refer to the [documentation](https://github.com/pspdada/SENTINEL?tab=readme-ov-file#-model-weights) of the GitHub repo.
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - liuhaotian/llava-v1.5-13b
4
  datasets:
5
  - psp-dada/SENTINEL
6
  language:
7
  - en
8
+ license: apache-2.0
 
9
  pipeline_tag: image-text-to-text
10
+ library_name: transformers
11
  ---
12
 
13
+ # Model Card for SENTINEL: Mitigating Object Hallucinations via Sentence-Level Early Intervention
14
 
15
  <a href='https://arxiv.org/abs/2507.12455'>
16
  <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
17
  <a href='https://github.com/pspdada/SENTINEL'>
18
  <img src='https://img.shields.io/badge/Github-Repo-Green'></a>
19
+ <a href='https://huggingface.co/datasets/psp-dada/SENTINEL'>
20
+ <img src='https://img.shields.io/badge/Datasets-HF-Green'></a>
21
+ <a href='https://huggingface.co/collections/psp-dada/sentinel-686ea70912079af142015286'>
22
+ <img src='https://img.shields.io/badge/Models-HF-orange'></a>
23
+ <a href='https://huggingface.co/papers/2507.12455'>
24
+ <img src='https://img.shields.io/badge/Discussion-HF-blue'></a>
25
+
26
+ ## Overview
27
+
28
+ **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning) is a novel framework designed to prevent and mitigate object hallucinations in multimodal large language models (MLLMs). It addresses the critical insight that hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.
29
+
30
+ SENTINEL introduces an automatic, sentence-level early intervention strategy that eliminates the dependency on human annotations. It achieves this by iteratively sampling model outputs, validating object existence through cross-checking with open-vocabulary detectors, and classifying sentences into hallucinated/non-hallucinated categories. Subsequently, it builds context-aware preference data and trains models using a novel Context-aware Preference Loss (C-DPO) that emphasizes discriminative learning at the sentence level where hallucinations initially manifest.
31
+
32
+ ## Key Features
33
+
34
+ * **Annotation-free**: No human labeling required for training data construction.
35
+ * **Model-agnostic**: Compatible with any MLLM architecture.
36
+ * **Efficient**: Achieves robust hallucination mitigation through lightweight LoRA fine-tuning.
37
+ * **Early intervention**: Halts hallucination propagation by addressing them at their initial manifestation, maximizing mitigation.
38
+ * **In-domain contextual preference learning**: Constructs high-quality preference pairs via detector cross-validation without relying on proprietary LLMs or manual annotations.
39
+ * **Context-aware robustness**: Prioritizes context-coherent positive samples to significantly boost generalization.
40
+ * **State-of-the-art results**: SENTINEL reduces hallucinations by over 90% compared to original models and outperforms prior state-of-the-art methods on various hallucination and general capabilities benchmarks.
41
+
42
+ ## Usage
43
+
44
+ This model is a LoRA adapter designed to be seamlessly plugged into its corresponding base model, `liuhaotian/llava-v1.5-13b`, for inference or further fine-tuning. You can use it with the `transformers` and `peft` libraries.
45
+
46
+ First, ensure you have the necessary libraries installed:
47
+ ```bash
48
+ pip install transformers peft accelerate bitsandbytes
49
+ pip install "flash-attn<2.4" --no-build-isolation
50
+ ```
51
+
52
+ ### Example Inference
53
+
54
+ Here’s how to load the `LLaVA-v1.5-13B-SENTINEL` LoRA adapter on top of the base `LLaVA-v1.5-13B` model and perform inference:
55
+
56
+ ```python
57
+ import torch
58
+ from transformers import AutoProcessor, AutoModelForCausalLM
59
+ from peft import PeftModel
60
+ from PIL import Image
61
+ import requests
62
+
63
+ # 1. Load the base model and its processor
64
+ base_model_id = "liuhaotian/llava-v1.5-13b"
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ base_model_id,
67
+ torch_dtype=torch.float16,
68
+ device_map="auto",
69
+ trust_remote_code=True # LLaVA models often require this for proper loading
70
+ )
71
+ processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
72
+
73
+ # 2. Load the SENTINEL LoRA adapter
74
+ # This model is specifically designed as a LoRA adapter on top of LLaVA-v1.5-13B.
75
+ lora_model_id = "psp-dada/LLaVA-v1.5-13B-SENTINEL"
76
+ model = PeftModel.from_pretrained(model, lora_model_id)
77
+
78
+ # Optionally, merge the LoRA weights into the base model for simpler inference
79
+ model = model.merge_and_unload()
80
+ model.eval() # Set model to evaluation mode
81
+
82
+ # 3. Prepare your image and prompt
83
+ # Example image from a public dataset (replace with your own image path or URL)
84
+ image_url = "https://huggingface.co/datasets/hf-internal-testing/dummy-images/resolve/main/cat-and-dog.jpg"
85
+ image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
86
+
87
+ # LLaVA models typically use a specific prompt format, e.g., for instruct models.
88
+ # Adjust the prompt format as per the base model's requirements if different.
89
+ prompt = "USER: What are the animals in the picture? ASSISTANT:"
90
+
91
+ # 4. Process inputs and generate a response
92
+ inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device, torch.float16)
93
+
94
+ with torch.no_grad():
95
+ output_ids = model.generate(**inputs, max_new_tokens=200, do_sample=False)
96
+
97
+ # Decode and print the generated text
98
+ generated_text = processor.decode(output_ids[0], skip_special_tokens=True)
99
+ print(generated_text)
100
+ ```
101
+ For more detailed environment setup, data generation, training, and evaluation instructions, please refer to the [official GitHub repository](https://github.com/pspdada/SENTINEL).
102
+
103
+ ## Dataset
104
+
105
+ The [**SENTINEL Dataset**](https://huggingface.co/datasets/psp-dada/SENTINEL) is an in-domain multimodal preference dataset for mitigating object hallucination, constructed entirely without human annotation. It includes preference pairs for various LLaVA and Qwen2-VL family models, enabling robust and scalable hallucination mitigation.
106
+
107
+ ## Model Weights
108
+
109
+ This repository contains the LoRA adapter weights for `LLaVA-v1.5-13B-SENTINEL`. All models released in the SENTINEL project are trained using LoRA. These weights can be seamlessly plugged into their corresponding base models for inference or further fine-tuning. For a full list of all released SENTINEL models and their base models, please refer to the [Hugging Face collection](https://huggingface.co/collections/psp-dada/sentinel-686ea70912079af142015286).
110
+
111
+ ## Acknowledgement
112
+
113
+ * [LLaVA](https://github.com/haotian-liu/LLaVA): LLaVA-v1.5 is an excellent open-source project on MLLMs.
114
+ * [HA-DPO](https://github.com/opendatalab/HA-DPO): Our code for the LLaVA-v1.5 part is based on HA-DPO, an influential work in the field of object hallucination in MLLMs. It provided us with valuable inspiration.
115
+ * [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): A unified and efficient fine-tuning framework of LLMs. Our implementations for LLaVA-v1.6, Qwen2-VL, and Qwen2.5-VL are based on this framework.
116
+
117
+ ## Citation
118
+
119
+ If you find our model/code/data/paper helpful, please consider citing our papers 📝 and starring us ⭐️!
120
 
121
+ ```bibtex
122
+ @article{peng2025mitigating,
123
+ title={Mitigating Object Hallucinations via Sentence-Level Early Intervention},
124
+ author={Peng, Shangpin and Yang, Senqiao and Jiang, Li and Tian, Zhuotao},
125
+ journal={arXiv preprint arXiv:2507.12455},
126
+ year={2025}
127
+ }
128
+ ```