Prompt4Trust

This repository contains the official implementation of the paper:

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models
Anita Kriz*, Elizabeth Laura Janes*, Xing Shen*, Tal Arbel
*Equal contribution
IEEE/CVF International Conference on Computer Vision 2025 Workshop CVAMD
Paper (arXiv preprint)
Code (GitHub)

Overview

Multimodal large language models (MLLMs) show great potential for healthcare applications, but their clinical deployment is challenged by prompt sensitivity and overconfident incorrect responses. To improve trustworthiness in safety-critical settings, we introduce Prompt4Trust, the first reinforcement learning framework for prompt augmentation focused on confidence calibration in MLLMs. A lightweight LLM is trained to generate context-aware auxiliary prompts that guide a downstream MLLM to produce predictions with confidence scores that better reflect true accuracy. By prioritizing clinically meaningful calibration, Prompt4Trust enhances both reliability and task performance, achieving state-of-the-art results on the PMC-VQA benchmark while enabling efficient zero-shot generalization to larger MLLMs.

fig

Usage

As this model (Calibration Guidance Prompt Generator) is a finetuned version of the Qwen2.5-1.5B-Instruct, we refer users to Qwen’s documentation for details on model loading and inference. We also recommand using vLLM for faster inference.

An example code of loading the model using vLLM:

from vllm import LLM
cgp_generator = LLM(model="xingshen/prompt4trust-cgpgenerator-1.5B")

Acknowledgments

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada, in part by the Canadian Institute for Advanced Research (CIFAR) Artificial Intelligence Chairs Program, in part by the Mila - Quebec Artificial Intelligence Institute, in part by the compute resources provided by Mila (mila.quebec), in part by the Mila-Google Research Grant, in part by the Fonds de recherche du Québec, in part by the Canada First Research Excellence Fund, awarded to the Healthy Brains, Healthy Lives initiative at McGill University, and in part by the Department of Electrical and Computer Engineering at McGill University.

Contact

Please raise a GitHub issue here or email us at [email protected] (with the email subject starting with "[Prompt4Trust]") if you have any question or encounter any issue.

Downloads last month: 51

Safetensors

Model size

2B params

Tensor type

F32

Model tree for xingshen/prompt4trust-cgpgenerator-1.5B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1223)

this model

xingshen
/

prompt4trust-cgpgenerator-1.5B