Ryan McConville commited on
Commit
07c76e7
·
1 Parent(s): 79c898c

Add submission page

Browse files
Files changed (1) hide show
  1. submit.md +75 -1
submit.md CHANGED
@@ -1 +1,75 @@
1
- # If you are interested, please submit here ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLM Hallucination Detection Leaderboard Submission Guidelines
2
+
3
+ Thank you for your interest in contributing to the **LLM Hallucination Detection Leaderboard**! We welcome submissions from researchers and practitioners who have built or finetuned language models that can be evaluated on our hallucination benchmarks.
4
+
5
+ ---
6
+
7
+ ## 1. What to Send
8
+
9
+ Please email **[email protected]** with the subject line:
10
+
11
+ ```
12
+ [Verify Leaderboard Submission] <Your-Model-Name>
13
+ ```
14
+
15
+ Attach **one ZIP file** that contains **all of the following**:
16
+
17
+ 1. **`model_card.md`** – A short Markdown file describing your model:
18
+ • Name and version
19
+ • Architecture / base model
20
+ • Training or finetuning procedure
21
+ • License
22
+ • Intended use & known limitations
23
+ • Contact information
24
+ 2. **`results.csv`** – A CSV file with **one row per prompt** and **one column per field** (see schema below).
25
+ 3. (Optional) **`extra_notes.md`** – Anything else you would like us to know (e.g., additional analysis).
26
+
27
+ ---
28
+
29
+ ## 2. CSV Schema
30
+
31
+ | Column | Description |
32
+ |--------------------|---------------------------------------------------------------------------|
33
+ | `request` | The exact input prompt shown to the model. |
34
+ | `response` | The raw output produced by the model. |
35
+ | `verify_response` | The Verify judgment or explanation regarding hallucination. |
36
+ | `verify_label` | The final boolean / categorical label (e.g., `TRUE`, `FALSE`). |
37
+ | `task` | The benchmark or dataset name the sample comes from. |
38
+
39
+ **Important:** Use UTF-8 encoding and **do not** add additional columns without prior discussion; extra information should go in the `metadata` field. You must use Verify by kluster.ai to ensure fairness in the leaderboard.
40
+
41
+ ---
42
+
43
+ ## 3. Evaluation Datasets
44
+
45
+ Run your model on the following public datasets and include *all* examples in your CSV. You can load them directly from Hugging Face:
46
+
47
+ | Dataset | Hugging Face Link |
48
+ |---------|-------------------|
49
+ | HaluEval QA (qa_samples subet with Question and Knowledge column) | https://huggingface.co/datasets/pminervini/HaluEval |
50
+ | UltraChat | https://huggingface.co/datasets/kluster-ai/ultrachat-sampled |
51
+
52
+ ---
53
+
54
+ ## 5. Example Row
55
+
56
+ ```csv
57
+ request,response,verify_response,verify_label,task
58
+ "What is the capital of the UK?","London is the capital of the UK.","The statement is factually correct.",CORRECT,TruthfulQA
59
+ ```
60
+
61
+ ---
62
+
63
+ ## 6. Review Process
64
+
65
+ 1. We will sanity-check the file format and reproduce a random subset.
66
+ 2. If everything looks good, your scores will appear on the public leaderboard.
67
+ 3. We may reach out for clarifications, please keep an eye on your inbox.
68
+
69
+ ---
70
+
71
+ ## 7. Contact
72
+
73
+ Questions? Email **[email protected]**.
74
+
75
+ We look forward to your submissions and to advancing reliable language models together!