psp-dada
/

LLaVA-v1.6-Vicuna-13B-SENTINEL

@@ -1,12 +1,13 @@
 ---
-license: apache-2.0
 datasets:
 - psp-dada/SENTINEL
 language:
 - en
-base_model:
-- llava-hf/llava-v1.6-vicuna-13b-hf
 pipeline_tag: image-text-to-text
 ---
 # Model Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc -->
@@ -15,5 +16,79 @@ pipeline_tag: image-text-to-text
 <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
 <a href='https://github.com/pspdada/SENTINEL'>
 <img src='https://img.shields.io/badge/Github-Repo-Green'></a>
-For the details of this model, please refer to the [documentation](https://github.com/pspdada/SENTINEL?tab=readme-ov-file#-model-weights) of the GitHub repo.

 ---
+base_model:
+- llava-hf/llava-v1.6-vicuna-13b-hf
 datasets:
 - psp-dada/SENTINEL
 language:
 - en
+license: apache-2.0
 pipeline_tag: image-text-to-text
+library_name: transformers
 ---
 # Model Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc -->
 <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a>
 <a href='https://github.com/pspdada/SENTINEL'>
 <img src='https://img.shields.io/badge/Github-Repo-Green'></a>
+<a href='https://huggingface.co/datasets/psp-dada/SENTINEL'>
+<img src='https://img.shields.io/badge/Datasets-HF-Green'></a>
+## About SENTINEL
+**SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning) is a framework designed to prevent and mitigate object hallucinations in multimodal large language models (MLLMs). It introduces an automatic, sentence-level early intervention strategy to address the critical insight that hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.
+SENTINEL eliminates the dependency on human annotations by bootstrapping high-quality in-domain preference pairs. This is achieved by iteratively sampling model outputs, validating object existence through cross-checking with two open-vocabulary detectors, and classifying sentences into hallucinated/non-hallucinated categories. Subsequently, it uses context-coherent positive samples and hallucinated negative samples to build context-aware preference data iteratively. Finally, models are trained using a context-aware preference loss (C-DPO) that emphasizes discriminative learning at the sentence level where hallucinations initially manifest. Experimental results show SENTINEL can reduce hallucinations by over 90% compared to the original model and outperforms previous state-of-the-art methods on both hallucination benchmarks and general capabilities benchmarks, demonstrating its superiority and generalization ability.
+## Key Features
+*   **Annotation-free**: SENTINEL constructs high-quality in-domain preference pairs without requiring human labeling.
+*   **Model-agnostic**: The framework is compatible with any MLLM architecture.
+*   **Efficient**: It achieves state-of-the-art results with lightweight LoRA fine-tuning.
+*   **Early intervention**: Halts hallucination propagation by intervening at the sentence level where hallucinations initially emerge.
+*   **Context-aware preference learning**: Emphasizes discriminative learning from context-coherent positive samples to boost generalization and robustness.
+*   **State-of-the-art results**: Achieves significant reduction in hallucinations and improved general task performance across various benchmarks.
+## Usage
+You can use this model with the Hugging Face `transformers` library. Since this is a LoRA adapter, it will automatically load the base model (`llava-hf/llava-v1.6-vicuna-13b-hf`) and apply the SENTINEL weights.
+```python
+from transformers import AutoProcessor, AutoModelForCausalLM
+from PIL import Image
+import requests
+import torch
+# Load the model and processor
+model_id = "psp-dada/LLaVA-v1.6-Vicuna-13B-SENTINEL"
+processor = AutoProcessor.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
+    low_cpu_mem_usage=True
+)
+# Move model to GPU if available
+if torch.cuda.is_available():
+    model = model.to("cuda")
+# Example inference: Describe a cat image
+prompt = "What is shown in this image?"
+image_url = "https://llava-vl.github.io/static/images/a_picture_of_a_cat.jpg" # Example image
+image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
+# Prepare inputs
+inputs = processor(text=prompt, images=image, return_tensors="pt")
+if torch.cuda.is_available():
+    inputs = {k: v.to("cuda") for k, v in inputs.items()}
+# Generate output
+with torch.no_grad():
+    output_ids = model.generate(**inputs, max_new_tokens=100, do_sample=False)
+# Decode and print the result
+generated_text = processor.decode(output_ids[0], skip_special_tokens=True)
+print(f"Prompt: {prompt}
+Response: {generated_text}")
+```
+## More Details
+For further details on data generation, training, and evaluation, please refer to the [official GitHub repository](https://github.com/pspdada/SENTINEL). You can also find additional model weights in the [SENTINEL models collection](https://huggingface.co/collections/psp-dada/sentinel-686ea70912079af142015286).
+## Citation
+If you find our model/code/data/paper helpful, please consider citing our papers 📝 and star us ⭐️！
+```bibtex
+@article{peng2025mitigating,
+  title={Mitigating Object Hallucinations via Sentence-Level Early Intervention},
+  author={Peng, Shangpin and Yang, Senqiao and Jiang, Li and Tian, Zhuotao},
+  journal={arXiv preprint arXiv:2507.12455},
+  year={2025}
+}
+```