Ryan McConville commited on
Commit
27960a9
Β·
1 Parent(s): db0d8a2

fix char display

Browse files
Files changed (2) hide show
  1. introduction.md +5 -5
  2. submit.md +3 -3
introduction.md CHANGED
@@ -6,16 +6,16 @@ The **LLM Hallucination Detection Leaderboard** is a public, continuously update
6
 
7
  ### Why does hallucination detection matter?
8
 
9
- * **User Trust & Safety** – Hallucinations undermine confidence and can damage reputation.
10
- * **Retrieval-Augmented Generation (RAG) Quality** – In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
11
- * **Regulatory & Compliance Pressure** – Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.
12
 
13
  ### How we measure hallucinations
14
 
15
  We evaluate each model on two complementary benchmarks and compute a *hallucination rate* (lower = better):
16
 
17
- 1. **HaluEval-QA (RAG setting)** – Given a question *and* a supporting document, the model must answer *only* using the provided context.
18
- 2. **UltraChat Filtered (Non-RAG setting)** – Open-domain questions with **no** extra context test the model's internal knowledge.
19
 
20
  Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.
21
 
 
6
 
7
  ### Why does hallucination detection matter?
8
 
9
+ * **User Trust & Safety**: Hallucinations undermine confidence and can damage reputation.
10
+ * **Retrieval-Augmented Generation (RAG) Quality**: In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
11
+ * **Regulatory & Compliance Pressure**: Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.
12
 
13
  ### How we measure hallucinations
14
 
15
  We evaluate each model on two complementary benchmarks and compute a *hallucination rate* (lower = better):
16
 
17
+ 1. **HaluEval-QA (RAG setting)**: Given a question *and* a supporting document, the model must answer *only* using the provided context.
18
+ 2. **UltraChat Filtered (Non-RAG setting)**: Open-domain questions with **no** extra context test the model's internal knowledge.
19
 
20
  Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.
21
 
submit.md CHANGED
@@ -18,15 +18,15 @@ Please email **[email protected]** with the subject line:
18
 
19
  Attach **one ZIP file** that contains **all of the following**:
20
 
21
- 1. **`model_card.md`** – A short Markdown file describing your model:
22
  β€’ Name and version
23
  β€’ Architecture / base model
24
  β€’ Training or finetuning procedure
25
  β€’ License
26
  β€’ Intended use & known limitations
27
  β€’ Contact information
28
- 2. **`results.csv`** – A CSV file with **one row per prompt** and **one column per field** (see schema below).
29
- 3. (Optional) **`extra_notes.md`** – Anything else you would like us to know (e.g., additional analysis).
30
 
31
  ---
32
 
 
18
 
19
  Attach **one ZIP file** that contains **all of the following**:
20
 
21
+ 1. **`model_card.md`**: A short Markdown file describing your model:
22
  β€’ Name and version
23
  β€’ Architecture / base model
24
  β€’ Training or finetuning procedure
25
  β€’ License
26
  β€’ Intended use & known limitations
27
  β€’ Contact information
28
+ 2. **`results.csv`**: A CSV file with **one row per prompt** and **one column per field** (see schema below).
29
+ 3. (Optional) **`extra_notes.md`**: Anything else you would like us to know (e.g., additional analysis).
30
 
31
  ---
32