fix kluster.ai name
Browse files
app.py
CHANGED
@@ -171,7 +171,7 @@ with demo:
|
|
171 |
Verify
|
172 |
</a> by
|
173 |
<a href="https://platform.kluster.ai/" target="_blank" style="color: #0057ff; text-decoration: none;">
|
174 |
-
|
175 |
</a>
|
176 |
</div>
|
177 |
</div>
|
|
|
171 |
Verify
|
172 |
</a> by
|
173 |
<a href="https://platform.kluster.ai/" target="_blank" style="color: #0057ff; text-decoration: none;">
|
174 |
+
kluster.ai
|
175 |
</a>
|
176 |
</div>
|
177 |
</div>
|
docs.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
|
3 |
As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
|
4 |
|
5 |
-
This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [
|
6 |
|
7 |
---
|
8 |
|
@@ -34,7 +34,7 @@ This setting evaluates how factually accurate a model is when **no context is pr
|
|
34 |
|
35 |
# Evaluation Method
|
36 |
|
37 |
-
We use **Verify**, a hallucination detection tool built by
|
38 |
|
39 |
- In the **RAG setting**, Verify checks if the output contradicts, fabricates, or strays from the input document.
|
40 |
- In the **real-world knowledge setting**, Verify uses search queries to fact-check the answer based on current, public information.
|
|
|
2 |
|
3 |
As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
|
4 |
|
5 |
+
This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [kluster.ai](https://platform.kluster.ai/), an automated hallucination detection tool, to evaluate the factual consistency of model outputs.
|
6 |
|
7 |
---
|
8 |
|
|
|
34 |
|
35 |
# Evaluation Method
|
36 |
|
37 |
+
We use **Verify**, a hallucination detection tool built by kluster.ai, to classify model outputs:
|
38 |
|
39 |
- In the **RAG setting**, Verify checks if the output contradicts, fabricates, or strays from the input document.
|
40 |
- In the **real-world knowledge setting**, Verify uses search queries to fact-check the answer based on current, public information.
|