Update src/about.py
Browse files- src/about.py +26 -5
src/about.py
CHANGED
@@ -20,15 +20,36 @@ NUM_FEWSHOT = 0 # Change with your few shot
|
|
20 |
|
21 |
|
22 |
|
23 |
-
|
24 |
-
TITLE = """<h1 align="center" id="space-title">Demo leaderboard</h1>"""
|
25 |
|
26 |
# What does your leaderboard evaluate?
|
27 |
INTRODUCTION_TEXT = """
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
"""
|
30 |
|
31 |
-
# Which evaluations are you running? how can people reproduce what you have?
|
32 |
LLM_BENCHMARKS_TEXT = f"""
|
33 |
## How it works
|
34 |
|
@@ -69,4 +90,4 @@ If everything is done, check you can launch the EleutherAIHarness on your model
|
|
69 |
|
70 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
71 |
CITATION_BUTTON_TEXT = r"""
|
72 |
-
"""
|
|
|
20 |
|
21 |
|
22 |
|
23 |
+
TITLE = """<h1 align="center" id="space-title">LLM Hallucination Detection Leaderboard</h1>"""
|
|
|
24 |
|
25 |
# What does your leaderboard evaluate?
|
26 |
INTRODUCTION_TEXT = """
|
27 |
+
<!--
|
28 |
+
keywords: LLM hallucination detection, hallucination leaderboard, RAG hallucination benchmark, UltraChat hallucination rate, Verify API, kluster.ai, factual accuracy of language models, large language model evaluation
|
29 |
+
-->
|
30 |
+
|
31 |
+
The **LLM Hallucination Detection Leaderboard** is a public, continuously updated comparison of how well popular Large Language Models (LLMs) avoid *hallucinations*, responses that are factually incorrect, fabricated, or unsupported by evidence. By surfacing transparent metrics across tasks, we help practitioners choose models that they can trust in production.
|
32 |
+
|
33 |
+
### Why does hallucination detection matter?
|
34 |
+
|
35 |
+
* **User Trust & Safety** – Hallucinations undermine confidence and can damage reputation.
|
36 |
+
* **Retrieval-Augmented Generation (RAG) Quality** – In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
|
37 |
+
* **Regulatory & Compliance Pressure** – Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.
|
38 |
+
|
39 |
+
### How we measure hallucinations
|
40 |
+
|
41 |
+
We evaluate each model on two complementary benchmarks and compute a *hallucination rate* (lower = better):
|
42 |
+
|
43 |
+
1. **HaluEval-QA (RAG setting)** – Given a question *and* a supporting document, the model must answer *only* using the provided context.
|
44 |
+
2. **UltraChat Filtered (Non-RAG setting)** – Open-domain questions with **no** extra context test the model's internal knowledge.
|
45 |
+
|
46 |
+
Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.
|
47 |
+
|
48 |
+
---
|
49 |
+
|
50 |
+
Stay informed as we add new models and tasks, and follow us on [X](https://x.com/klusterai) or join Discord [here](https://discord.com/invite/klusterai) for the latest updates on trustworthy LLMs.
|
51 |
"""
|
52 |
|
|
|
53 |
LLM_BENCHMARKS_TEXT = f"""
|
54 |
## How it works
|
55 |
|
|
|
90 |
|
91 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
92 |
CITATION_BUTTON_TEXT = r"""
|
93 |
+
# """
|