Spaces:

kluster-ai
/

LLM-Hallucination-Detection-Leaderboard

Running

aloe-vera commited on Jul 3

Commit

5484928

verified ·

1 Parent(s): 95be10d

fix kluster.ai name

Files changed (2) hide show

app.py CHANGED Viewed

@@ -171,7 +171,7 @@ with demo:
                     Verify
                 </a> by
                 <a href="https://platform.kluster.ai/" target="_blank" style="color: #0057ff; text-decoration: none;">
-                    KlusterAI
                 </a>
             </div>
         </div>

                     Verify
                 </a> by
                 <a href="https://platform.kluster.ai/" target="_blank" style="color: #0057ff; text-decoration: none;">
+                    kluster.ai
                 </a>
             </div>
         </div>

docs.md CHANGED Viewed

@@ -2,7 +2,7 @@
 As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
-This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [KlusterAI](https://platform.kluster.ai/), an automated hallucination detection tool, to evaluate the factual consistency of model outputs.
 ---
@@ -34,7 +34,7 @@ This setting evaluates how factually accurate a model is when **no context is pr
 # Evaluation Method
-We use **Verify**, a hallucination detection tool built by KlusterAI, to classify model outputs:
 - In the **RAG setting**, Verify checks if the output contradicts, fabricates, or strays from the input document.
 - In the **real-world knowledge setting**, Verify uses search queries to fact-check the answer based on current, public information.

 As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
+This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [kluster.ai](https://platform.kluster.ai/), an automated hallucination detection tool, to evaluate the factual consistency of model outputs.
 ---
 # Evaluation Method
+We use **Verify**, a hallucination detection tool built by kluster.ai, to classify model outputs:
 - In the **RAG setting**, Verify checks if the output contradicts, fabricates, or strays from the input document.
 - In the **real-world knowledge setting**, Verify uses search queries to fact-check the answer based on current, public information.