rymc commited on
Commit
70e0415
·
verified ·
1 Parent(s): 3ed1a89

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +26 -5
src/about.py CHANGED
@@ -20,15 +20,36 @@ NUM_FEWSHOT = 0 # Change with your few shot
20
 
21
 
22
 
23
- # Your leaderboard name
24
- TITLE = """<h1 align="center" id="space-title">Demo leaderboard</h1>"""
25
 
26
  # What does your leaderboard evaluate?
27
  INTRODUCTION_TEXT = """
28
- Intro text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  """
30
 
31
- # Which evaluations are you running? how can people reproduce what you have?
32
  LLM_BENCHMARKS_TEXT = f"""
33
  ## How it works
34
 
@@ -69,4 +90,4 @@ If everything is done, check you can launch the EleutherAIHarness on your model
69
 
70
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
71
  CITATION_BUTTON_TEXT = r"""
72
- """
 
20
 
21
 
22
 
23
+ TITLE = """<h1 align="center" id="space-title">LLM Hallucination Detection Leaderboard</h1>"""
 
24
 
25
  # What does your leaderboard evaluate?
26
  INTRODUCTION_TEXT = """
27
+ <!--
28
+ keywords: LLM hallucination detection, hallucination leaderboard, RAG hallucination benchmark, UltraChat hallucination rate, Verify API, kluster.ai, factual accuracy of language models, large language model evaluation
29
+ -->
30
+
31
+ The **LLM Hallucination Detection Leaderboard** is a public, continuously updated comparison of how well popular Large Language Models (LLMs) avoid *hallucinations*, responses that are factually incorrect, fabricated, or unsupported by evidence. By surfacing transparent metrics across tasks, we help practitioners choose models that they can trust in production.
32
+
33
+ ### Why does hallucination detection matter?
34
+
35
+ * **User Trust & Safety** – Hallucinations undermine confidence and can damage reputation.
36
+ * **Retrieval-Augmented Generation (RAG) Quality** – In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
37
+ * **Regulatory & Compliance Pressure** – Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.
38
+
39
+ ### How we measure hallucinations
40
+
41
+ We evaluate each model on two complementary benchmarks and compute a *hallucination rate* (lower = better):
42
+
43
+ 1. **HaluEval-QA (RAG setting)** – Given a question *and* a supporting document, the model must answer *only* using the provided context.
44
+ 2. **UltraChat Filtered (Non-RAG setting)** – Open-domain questions with **no** extra context test the model's internal knowledge.
45
+
46
+ Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.
47
+
48
+ ---
49
+
50
+ Stay informed as we add new models and tasks, and follow us on [X](https://x.com/klusterai) or join Discord [here](https://discord.com/invite/klusterai) for the latest updates on trustworthy LLMs.
51
  """
52
 
 
53
  LLM_BENCHMARKS_TEXT = f"""
54
  ## How it works
55
 
 
90
 
91
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
92
  CITATION_BUTTON_TEXT = r"""
93
+ # """