Spaces:

kluster-ai
/

LLM-Hallucination-Detection-Leaderboard

Running

App Files Files Community

aloe-vera commited on Jul 7

Commit

c7b7b88

verified ·

1 Parent(s): 4cfd7b1

add more details

Browse files

Files changed (1) hide show

docs.md +119 -0

docs.md CHANGED Viewed

@@ -20,6 +20,43 @@ RAG evaluates how well a model stays faithful to a provided context when answeri
 - **Temperature**: 0 (to enforce deterministic, grounded outputs)
 - **System Prompt**: Instructs the model to only use the document and avoid guessing.
 ## Real-World Knowledge (Non-RAG setting)
 This setting evaluates how factually accurate a model is when **no context is provided**. The model must rely solely on its internal knowledge to answer a broad range of user questions across many topics. The answers are then verified using web search to determine factual correctness.
@@ -30,6 +67,19 @@ This setting evaluates how factually accurate a model is when **no context is pr
 - **Temperature**: 1 (to reflect natural, fluent generation)
 - **System Prompt**: Encourages helpfulness, accuracy, and honesty when unsure.
 ---
 # Evaluation Method
@@ -46,3 +96,72 @@ Each model's hallucination rate is computed as:
 A **lower** hallucination rate indicates **better** performance.

 - **Temperature**: 0 (to enforce deterministic, grounded outputs)
 - **System Prompt**: Instructs the model to only use the document and avoid guessing.
+### System prompt
+This is the system prompt use to generate LLM output for RAG setting:
+```
+You are an assistant for question-answering tasks.
+Given the QUESTION and DOCUMENT you must answer the QUESTION using the information in the DOCUMENT.
+You must not offer new information beyond the context provided in the DOCUMENT. Do not add any external knowledge.
+The ANSWER also must not contradict information provided in the DOCUMENT.
+If the DOCUMENT does not contain the facts to answer the QUESTION or you do not know the answer, you truthfully say that you do not know.
+You have access to information provided by the user as DOCUMENT to answer the QUESTION, and nothing else.
+Use three sentences maximum and keep the answer concise.
+```
+### Prompt format
+Each prompt is formatted as
+```
+DOCUMENT:
+{context}
+QUESTION:
+{question}
+```
+### Message structure
+The models use the following message structure:
+```python
+messages = [{"role": "system", "content": system_prompt},
+            {"role": "user", "content": prompt},]
+```
 ## Real-World Knowledge (Non-RAG setting)
 This setting evaluates how factually accurate a model is when **no context is provided**. The model must rely solely on its internal knowledge to answer a broad range of user questions across many topics. The answers are then verified using web search to determine factual correctness.
 - **Temperature**: 1 (to reflect natural, fluent generation)
 - **System Prompt**: Encourages helpfulness, accuracy, and honesty when unsure.
+### System prompt
+This is the system prompt use to generate LLM output for Non-RAG setting:
+```
+You are a helpful, factual, and concise assistant.
+Always try to answer the user's question clearly and completely.
+Do not make up information. If you are unsure or lack the knowledge, say so.
+```
+### Message structure
+The message structure for Non-RAG setting is the same as RAG setting.
 ---
 # Evaluation Method
 A **lower** hallucination rate indicates **better** performance.
+---
+# Example of Hallucination Detection
+## RAG Example
+Context
+```
+Jonathan Stark (born April 3, 1971) is a former professional tennis player from the United States.
+During his career he won two Grand Slam doubles titles (the 1994 French Open Men's Doubles and
+the 1995 Wimbledon Championships Mixed Doubles). He reached the men's singles final at the French Open in 1988,
+won the French Open men's doubles title in 1984, and helped France win the Davis Cup in 1991.
+```
+Prompt
+```
+Which tennis player won more Grand Slam titles, Henri Leconte or Jonathan Stark?
+```
+LLM Answer
+```
+Jonathan Stark won two Grand Slam titles, while Henri Leconte won four.
+Therefore, Henri Leconte won more Grand Slam titles.
+```
+Verify Label
+```
+Hallucination: Yes
+```
+Verify Response
+```
+The answer introduces new information about Henri Leconte's Grand Slam titles,
+which is not present in the document.
+```
+## Non-RAG Example
+Prompt
+```
+What are the ingredients that made the most expensive donut so expensive?
+```
+LLM Answer
+```
+The most expensive donut, called "The Golden Donut," was made expensive by several factors:
+24-karat gold leaf: Used for decoration.
+```
+Verify Label
+```
+Hallucination: No
+```
+Verify Response
+```
+The response mentions 'The Golden Donut' and states that it was made expensive by several factors,
+including 24-karat gold leaf used for decoration. The search results provide information about expensive donuts,
+including one called 'The Golden Donut' or similar names, which are associated with luxurious ingredients
+like 24-karat gold and Cristal champagne. The response correctly identifies 24-karat gold leaf as
+a factor contributing to the donut's expensiveness, which is supported by multiple search results.
+While the response simplifies the information, it does not introduce factually incorrect
+or fabricated details about the donut's ingredients.
+```