aloe-vera commited on
Commit
5484928
·
verified ·
1 Parent(s): 95be10d

fix kluster.ai name

Browse files
Files changed (2) hide show
  1. app.py +1 -1
  2. docs.md +2 -2
app.py CHANGED
@@ -171,7 +171,7 @@ with demo:
171
  Verify
172
  </a> by
173
  <a href="https://platform.kluster.ai/" target="_blank" style="color: #0057ff; text-decoration: none;">
174
- KlusterAI
175
  </a>
176
  </div>
177
  </div>
 
171
  Verify
172
  </a> by
173
  <a href="https://platform.kluster.ai/" target="_blank" style="color: #0057ff; text-decoration: none;">
174
+ kluster.ai
175
  </a>
176
  </div>
177
  </div>
docs.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
4
 
5
- This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [KlusterAI](https://platform.kluster.ai/), an automated hallucination detection tool, to evaluate the factual consistency of model outputs.
6
 
7
  ---
8
 
@@ -34,7 +34,7 @@ This setting evaluates how factually accurate a model is when **no context is pr
34
 
35
  # Evaluation Method
36
 
37
- We use **Verify**, a hallucination detection tool built by KlusterAI, to classify model outputs:
38
 
39
  - In the **RAG setting**, Verify checks if the output contradicts, fabricates, or strays from the input document.
40
  - In the **real-world knowledge setting**, Verify uses search queries to fact-check the answer based on current, public information.
 
2
 
3
  As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
4
 
5
+ This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [kluster.ai](https://platform.kluster.ai/), an automated hallucination detection tool, to evaluate the factual consistency of model outputs.
6
 
7
  ---
8
 
 
34
 
35
  # Evaluation Method
36
 
37
+ We use **Verify**, a hallucination detection tool built by kluster.ai, to classify model outputs:
38
 
39
  - In the **RAG setting**, Verify checks if the output contradicts, fabricates, or strays from the input document.
40
  - In the **real-world knowledge setting**, Verify uses search queries to fact-check the answer based on current, public information.