Spaces:

kluster-ai
/

LLM-Hallucination-Detection-Leaderboard

Running

rymc commited on Jul 8

Commit

14c29f2

verified ·

1 Parent(s): 0d9790a

cleanup text

Files changed (1) hide show

docs.md CHANGED Viewed

@@ -2,14 +2,6 @@
 keywords: hallucination detection documentation, LLM hallucination benchmark, RAG evaluation guide, Verify API, kluster.ai, retrieval-augmented generation evaluation, large language model accuracy
 -->
-# About
-As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
-This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [kluster.ai](https://platform.kluster.ai/), an automated hallucination detection API, to evaluate the factual consistency of model outputs.
----
 # Tasks
 We evaluate each model using two benchmarks:

 keywords: hallucination detection documentation, LLM hallucination benchmark, RAG evaluation guide, Verify API, kluster.ai, retrieval-augmented generation evaluation, large language model accuracy
 -->
 # Tasks
 We evaluate each model using two benchmarks: