Spaces:

kluster-ai
/

LLM-Hallucination-Detection-Leaderboard

Running

Ryan McConville commited on Jul 8

Commit

27960a9

1 Parent(s): db0d8a2

fix char display

Files changed (2) hide show

introduction.md CHANGED Viewed

@@ -6,16 +6,16 @@ The **LLM Hallucination Detection Leaderboard** is a public, continuously update
 ### Why does hallucination detection matter?
-* **User Trust & Safety** – Hallucinations undermine confidence and can damage reputation.
-* **Retrieval-Augmented Generation (RAG) Quality** – In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
-* **Regulatory & Compliance Pressure** – Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.
 ### How we measure hallucinations
 We evaluate each model on two complementary benchmarks and compute a *hallucination rate* (lower = better):
-1. **HaluEval-QA (RAG setting)** – Given a question *and* a supporting document, the model must answer *only* using the provided context.
-2. **UltraChat Filtered (Non-RAG setting)** – Open-domain questions with **no** extra context test the model's internal knowledge.
 Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.

 ### Why does hallucination detection matter?
+* **User Trust & Safety**: Hallucinations undermine confidence and can damage reputation.
+* **Retrieval-Augmented Generation (RAG) Quality**:  In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
+* **Regulatory & Compliance Pressure**: Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.
 ### How we measure hallucinations
 We evaluate each model on two complementary benchmarks and compute a *hallucination rate* (lower = better):
+1. **HaluEval-QA (RAG setting)**: Given a question *and* a supporting document, the model must answer *only* using the provided context.
+2. **UltraChat Filtered (Non-RAG setting)**:  Open-domain questions with **no** extra context test the model's internal knowledge.
 Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.

submit.md CHANGED Viewed

@@ -18,15 +18,15 @@ Please email **[email protected]** with the subject line:
 Attach **one ZIP file** that contains **all of the following**:
-1. **`model_card.md`** – A short Markdown file describing your model:
    • Name and version
    • Architecture / base model
    • Training or finetuning procedure
    • License
    • Intended use & known limitations
    • Contact information
-2. **`results.csv`** – A CSV file with **one row per prompt** and **one column per field** (see schema below).
-3. (Optional) **`extra_notes.md`** – Anything else you would like us to know (e.g., additional analysis).
 ---

 Attach **one ZIP file** that contains **all of the following**:
+1. **`model_card.md`**: A short Markdown file describing your model:
    • Name and version
    • Architecture / base model
    • Training or finetuning procedure
    • License
    • Intended use & known limitations
    • Contact information
+2. **`results.csv`**: A CSV file with **one row per prompt** and **one column per field** (see schema below).
+3. (Optional) **`extra_notes.md`**: Anything else you would like us to know (e.g., additional analysis).
 ---