Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
santiviquez 
posted an update Jan 29, 2024
Post
Confidence * may be * all you need.

A simple average of the log probabilities of the output tokens from an LLM might be all it takes to tell if the model is hallucinating.🫨

The idea is that if a model is not confident (low output token probabilities), the model may be inventing random stuff.

In these two papers:
1. https://aclanthology.org/2023.eacl-main.75/
2. https://arxiv.org/abs/2303.08896

The authors claim that this simple method is the best heuristic for detecting hallucinations. The beauty is that it only uses the generated token probabilities, so it can be implemented at inference time ⚡

Love this paper too! It's simple yet powerful and applicable to black box models.
I actually have a space to demonstrate it: https://huggingface.co/spaces/mithril-security/hallucination_detector

I also dig into it on an HF Blog post: https://huggingface.co/blog/dhuynh95/automatic-hallucination-detection

·

Ohh that’s so cool! I actually played with the space last week when I was reading the paper. Don’t remember how I found it 🤔

You might be interested in this follow-up work showing that fully intrinsic properties in the form of attribution scores outperform logprobs, especially on fully detached hallucinations, matching supervised hallucination detectors' abilities: https://aclanthology.org/2023.acl-long.3/

·

Nice! Thank you, I'll take a look