rymc commited on
Commit
14c29f2
·
verified ·
1 Parent(s): 0d9790a

cleanup text

Browse files
Files changed (1) hide show
  1. docs.md +0 -8
docs.md CHANGED
@@ -2,14 +2,6 @@
2
  keywords: hallucination detection documentation, LLM hallucination benchmark, RAG evaluation guide, Verify API, kluster.ai, retrieval-augmented generation evaluation, large language model accuracy
3
  -->
4
 
5
- # About
6
-
7
- As large language models (LLMs) continue to improve, evaluating how well they avoid hallucinations (producing information that is unfaithful or factually incorrect) has become increasingly important. While many models claim to be reliable, their factual grounding can vary significantly across tasks and settings.
8
-
9
- This leaderboard provides a standardised evaluation of how different LLMs perform on hallucination detection tasks. Our goal is to help researchers and developers understand which models are more trustworthy in both grounded (context-based) and open-ended (real-world knowledge) settings. We use [Verify](https://platform.kluster.ai/verify) by [kluster.ai](https://platform.kluster.ai/), an automated hallucination detection API, to evaluate the factual consistency of model outputs.
10
-
11
- ---
12
-
13
  # Tasks
14
 
15
  We evaluate each model using two benchmarks:
 
2
  keywords: hallucination detection documentation, LLM hallucination benchmark, RAG evaluation guide, Verify API, kluster.ai, retrieval-augmented generation evaluation, large language model accuracy
3
  -->
4
 
 
 
 
 
 
 
 
 
5
  # Tasks
6
 
7
  We evaluate each model using two benchmarks: