Spaces:

kluster-ai
/

LLM-Hallucination-Detection-Leaderboard

Running

App Files Files Community

rymc commited on Jul 7

Commit

70e0415

verified ·

1 Parent(s): 3ed1a89

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +26 -5

src/about.py CHANGED Viewed

@@ -20,15 +20,36 @@ NUM_FEWSHOT = 0 # Change with your few shot
-# Your leaderboard name
-TITLE = """<h1 align="center" id="space-title">Demo leaderboard</h1>"""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-Intro text
 """
-# Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = f"""
 ## How it works
@@ -69,4 +90,4 @@ If everything is done, check you can launch the EleutherAIHarness on your model
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 CITATION_BUTTON_TEXT = r"""
-"""

+TITLE = """<h1 align="center" id="space-title">LLM Hallucination Detection Leaderboard</h1>"""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+<!--
+keywords: LLM hallucination detection, hallucination leaderboard, RAG hallucination benchmark, UltraChat hallucination rate, Verify API, kluster.ai, factual accuracy of language models, large language model evaluation
+-->
+The **LLM Hallucination Detection Leaderboard** is a public, continuously updated comparison of how well popular Large Language Models (LLMs) avoid *hallucinations*, responses that are factually incorrect, fabricated, or unsupported by evidence. By surfacing transparent metrics across tasks, we help practitioners choose models that they can trust in production.
+### Why does hallucination detection matter?
+* **User Trust & Safety** – Hallucinations undermine confidence and can damage reputation.
+* **Retrieval-Augmented Generation (RAG) Quality** – In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
+* **Regulatory & Compliance Pressure** – Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.
+### How we measure hallucinations
+We evaluate each model on two complementary benchmarks and compute a *hallucination rate* (lower = better):
+1. **HaluEval-QA (RAG setting)** – Given a question *and* a supporting document, the model must answer *only* using the provided context.
+2. **UltraChat Filtered (Non-RAG setting)** – Open-domain questions with **no** extra context test the model's internal knowledge.
+Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.
+---
+Stay informed as we add new models and tasks, and follow us on [X](https://x.com/klusterai) or join Discord [here](https://discord.com/invite/klusterai) for the latest updates on trustworthy LLMs.
 """
 LLM_BENCHMARKS_TEXT = f"""
 ## How it works
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 CITATION_BUTTON_TEXT = r"""
+# """