aloe-vera commited on
Commit
c7b7b88
·
verified ·
1 Parent(s): 4cfd7b1

add more details

Browse files
Files changed (1) hide show
  1. docs.md +119 -0
docs.md CHANGED
@@ -20,6 +20,43 @@ RAG evaluates how well a model stays faithful to a provided context when answeri
20
  - **Temperature**: 0 (to enforce deterministic, grounded outputs)
21
  - **System Prompt**: Instructs the model to only use the document and avoid guessing.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ## Real-World Knowledge (Non-RAG setting)
24
 
25
  This setting evaluates how factually accurate a model is when **no context is provided**. The model must rely solely on its internal knowledge to answer a broad range of user questions across many topics. The answers are then verified using web search to determine factual correctness.
@@ -30,6 +67,19 @@ This setting evaluates how factually accurate a model is when **no context is pr
30
  - **Temperature**: 1 (to reflect natural, fluent generation)
31
  - **System Prompt**: Encourages helpfulness, accuracy, and honesty when unsure.
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ---
34
 
35
  # Evaluation Method
@@ -46,3 +96,72 @@ Each model's hallucination rate is computed as:
46
 
47
  A **lower** hallucination rate indicates **better** performance.
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - **Temperature**: 0 (to enforce deterministic, grounded outputs)
21
  - **System Prompt**: Instructs the model to only use the document and avoid guessing.
22
 
23
+ ### System prompt
24
+
25
+ This is the system prompt use to generate LLM output for RAG setting:
26
+
27
+ ```
28
+ You are an assistant for question-answering tasks.
29
+ Given the QUESTION and DOCUMENT you must answer the QUESTION using the information in the DOCUMENT.
30
+ You must not offer new information beyond the context provided in the DOCUMENT. Do not add any external knowledge.
31
+ The ANSWER also must not contradict information provided in the DOCUMENT.
32
+ If the DOCUMENT does not contain the facts to answer the QUESTION or you do not know the answer, you truthfully say that you do not know.
33
+ You have access to information provided by the user as DOCUMENT to answer the QUESTION, and nothing else.
34
+ Use three sentences maximum and keep the answer concise.
35
+ ```
36
+
37
+ ### Prompt format
38
+
39
+ Each prompt is formatted as
40
+
41
+ ```
42
+ DOCUMENT:
43
+ {context}
44
+
45
+ QUESTION:
46
+ {question}
47
+ ```
48
+
49
+ ### Message structure
50
+
51
+ The models use the following message structure:
52
+
53
+ ```python
54
+ messages = [{"role": "system", "content": system_prompt},
55
+ {"role": "user", "content": prompt},]
56
+
57
+ ```
58
+
59
+
60
  ## Real-World Knowledge (Non-RAG setting)
61
 
62
  This setting evaluates how factually accurate a model is when **no context is provided**. The model must rely solely on its internal knowledge to answer a broad range of user questions across many topics. The answers are then verified using web search to determine factual correctness.
 
67
  - **Temperature**: 1 (to reflect natural, fluent generation)
68
  - **System Prompt**: Encourages helpfulness, accuracy, and honesty when unsure.
69
 
70
+ ### System prompt
71
+
72
+ This is the system prompt use to generate LLM output for Non-RAG setting:
73
+
74
+ ```
75
+ You are a helpful, factual, and concise assistant.
76
+ Always try to answer the user's question clearly and completely.
77
+ Do not make up information. If you are unsure or lack the knowledge, say so.
78
+ ```
79
+
80
+ ### Message structure
81
+ The message structure for Non-RAG setting is the same as RAG setting.
82
+
83
  ---
84
 
85
  # Evaluation Method
 
96
 
97
  A **lower** hallucination rate indicates **better** performance.
98
 
99
+ ---
100
+ # Example of Hallucination Detection
101
+
102
+ ## RAG Example
103
+
104
+ Context
105
+ ```
106
+ Jonathan Stark (born April 3, 1971) is a former professional tennis player from the United States.
107
+ During his career he won two Grand Slam doubles titles (the 1994 French Open Men's Doubles and
108
+ the 1995 Wimbledon Championships Mixed Doubles). He reached the men's singles final at the French Open in 1988,
109
+ won the French Open men's doubles title in 1984, and helped France win the Davis Cup in 1991.
110
+ ```
111
+
112
+ Prompt
113
+
114
+ ```
115
+ Which tennis player won more Grand Slam titles, Henri Leconte or Jonathan Stark?
116
+ ```
117
+
118
+ LLM Answer
119
+ ```
120
+ Jonathan Stark won two Grand Slam titles, while Henri Leconte won four.
121
+ Therefore, Henri Leconte won more Grand Slam titles.
122
+ ```
123
+
124
+ Verify Label
125
+ ```
126
+ Hallucination: Yes
127
+ ```
128
+
129
+
130
+ Verify Response
131
+ ```
132
+ The answer introduces new information about Henri Leconte's Grand Slam titles,
133
+ which is not present in the document.
134
+ ```
135
+
136
+ ## Non-RAG Example
137
+
138
+ Prompt
139
+
140
+ ```
141
+ What are the ingredients that made the most expensive donut so expensive?
142
+ ```
143
+
144
+ LLM Answer
145
+ ```
146
+ The most expensive donut, called "The Golden Donut," was made expensive by several factors:
147
+
148
+ 24-karat gold leaf: Used for decoration.
149
+
150
+ ```
151
+
152
+ Verify Label
153
+ ```
154
+ Hallucination: No
155
+ ```
156
+
157
+
158
+ Verify Response
159
+ ```
160
+ The response mentions 'The Golden Donut' and states that it was made expensive by several factors,
161
+ including 24-karat gold leaf used for decoration. The search results provide information about expensive donuts,
162
+ including one called 'The Golden Donut' or similar names, which are associated with luxurious ingredients
163
+ like 24-karat gold and Cristal champagne. The response correctly identifies 24-karat gold leaf as
164
+ a factor contributing to the donut's expensiveness, which is supported by multiple search results.
165
+ While the response simplifies the information, it does not introduce factually incorrect
166
+ or fabricated details about the donut's ingredients.
167
+ ```