kgreenewald commited on
Commit
db225b0
·
verified ·
1 Parent(s): c05f354

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -15
README.md CHANGED
@@ -22,6 +22,9 @@ providing access to the Uncertainty, Hallucination Detection, and Safety Excepti
22
  - **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
23
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
24
 
 
 
 
25
  ### Uncertainty Intrinsic
26
  The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
27
 
@@ -33,7 +36,8 @@ This percentage is *calibrated* in the following sense: given a set of answers a
33
  The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
34
 
35
  ### Safety Exception Intrinsic
36
- The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
 
37
 
38
 
39
  ## Usage
@@ -83,8 +87,8 @@ we can evaluate the certainty and hallucination status of this reply by invoking
83
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
84
 
85
 
86
- ### PDL Implementation
87
- Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1**, this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
88
  ```python
89
  defs:
90
  apply_template:
@@ -111,10 +115,10 @@ defs:
111
  def: mycontext
112
  args:
113
  context: ${ context }
114
- - model: granite-8b-intrinsics-v2-20241201
115
  parameters:
116
  api_key: EMPTY
117
- api_base: http://aims-01.sl.res.ibm.com:21001/v1
118
  temperature: 0
119
  max_tokens: 1
120
  custom_llm_provider: text-completion-openai
@@ -157,13 +161,13 @@ text:
157
  intrinsic: safety
158
  - role: system
159
  text: ${ system_prompt }
160
- - if: ${ safety != "N" }
161
  then:
162
  text:
163
  - "\n\nDocuments: ${ document }\n\n ${ query }"
164
- - model: openai/granite-8b-intrinsics-v2-20241201
165
  def: answer
166
- parameters: {api_key: EMPTY, api_base: http://aims-01.sl.res.ibm.com:21001/v1, temperature: 0, stop: "\n"}
167
  - call: get_intrinsic
168
  def: certainty
169
  contribute: []
@@ -179,7 +183,8 @@ text:
179
  - "\nCertainty: ${ certainty }"
180
  - "\nHallucination: ${ hallucination }"
181
  ```
182
-
 
183
 
184
 
185
 
@@ -197,16 +202,19 @@ Additionally, certainty scores are *distributional* quantities, and so will do w
197
  red-teamed examples.
198
 
199
  ## Evaluation
 
200
 
201
- The model was evaluated on the [MMLU](https://huggingface.co/datasets/cais/mmlu) datasets (not used in training). Shown are the [Expected Calibration Error (ECE)](https://towardsdatascience.com/expected-calibration-error-ece-a-step-by-step-visual-explanation-with-python-code-c3e9aa12937d) for each task, for the base model (Granite-3.0-8b-instruct) and Granite-Uncertainty-3.0-8b.
202
- The average ECE across tasks for our method is 0.064 (out of 1) and is consistently low across tasks (maximum task ECE 0.10), compared to the base model average ECE of 0.20 and maximum task ECE of 0.60. Note that our ECE of 0.064 is smaller than the gap between the quantized certainty outputs (10% quantization steps). Additionally, the zero-shot performance on the MMLU tasks does not degrade, averaging at 89%.
203
- <!-- This section describes the evaluation protocols and provides the results. -->
204
 
205
 
 
206
 
207
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/2MwP7DRZlNBtWSKWFvXOI.png)
208
 
209
 
 
210
 
211
  ## Training Details
212
  The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
@@ -238,13 +246,27 @@ The following datasets were used for calibration and/or finetuning. Certainty sc
238
  * [piqa](https://huggingface.co/datasets/ybisk/piqa)
239
 
240
  ### RAG Hallucination Training Data
241
- The following public datasets were used for finetuning the RAG model. The details of data creation for RAG response generation is available at [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf).
242
  For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
243
 
244
-
245
  * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
246
  * [QuAC](https://huggingface.co/datasets/allenai/quac)
247
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
  ## Model Card Authors
249
 
250
  Kristjan Greenewald
 
22
  - **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
23
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
24
 
25
+
26
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/ornGz5BdtfIXLYxDzUgi9.png)
27
+
28
  ### Uncertainty Intrinsic
29
  The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
30
 
 
36
  The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
37
 
38
  ### Safety Exception Intrinsic
39
+ The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
40
+ The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
41
 
42
 
43
  ## Usage
 
87
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
88
 
89
 
90
+ ### Intrinsics Example with PDL
91
+ Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1** at `API_BASE` (insert the host address here), this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
92
  ```python
93
  defs:
94
  apply_template:
 
115
  def: mycontext
116
  args:
117
  context: ${ context }
118
+ - model: granite-8b-intrinsics-v1
119
  parameters:
120
  api_key: EMPTY
121
+ api_base: API_BASE
122
  temperature: 0
123
  max_tokens: 1
124
  custom_llm_provider: text-completion-openai
 
161
  intrinsic: safety
162
  - role: system
163
  text: ${ system_prompt }
164
+ - if: ${ safety != "Y" }
165
  then:
166
  text:
167
  - "\n\nDocuments: ${ document }\n\n ${ query }"
168
+ - model: openai/granite-8b-intrinsics-v1
169
  def: answer
170
+ parameters: {api_key: EMPTY, api_base: API_BASE, temperature: 0, stop: "\n"}
171
  - call: get_intrinsic
172
  def: certainty
173
  contribute: []
 
183
  - "\nCertainty: ${ certainty }"
184
  - "\nHallucination: ${ hallucination }"
185
  ```
186
+ ### Intrinsics Example with SGLang
187
+ The below SGLang implementation uses the SGLang fork at [https://github.com/frreiss/sglang/tree/granite](https://github.com/frreiss/sglang/tree/granite) that supports Granite models.
188
 
189
 
190
 
 
202
  red-teamed examples.
203
 
204
  ## Evaluation
205
+ We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
206
 
207
+ We first find that the performance of the intrinsics in our shared model **Granite Instrinsics 3.0 8b v1** is not degraded
208
+ versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
209
+ binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
210
 
211
 
212
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
213
 
214
+ We then find that RAG performance of **Granite Instrinsics 3.0 8b v1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
215
 
216
 
217
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
218
 
219
  ## Training Details
220
  The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
 
246
  * [piqa](https://huggingface.co/datasets/ybisk/piqa)
247
 
248
  ### RAG Hallucination Training Data
249
+ The following public datasets were used for finetuning. The details of data creation for RAG response generation is available at [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf).
250
  For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
251
 
 
252
  * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
253
  * [QuAC](https://huggingface.co/datasets/allenai/quac)
254
 
255
+ ### Safety Exception Training Data
256
+ The following public datasets were used for finetuning.
257
+
258
+ * [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
259
+ * [nvidia/Aegis-AI-Content-Safety-Dataset-1.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-1.0/viewer/default/train)
260
+ * A subset of [https://huggingface.co/datasets/Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
261
+ * Ibm/AttaQ
262
+ * [google/civil_comments](https://huggingface.co/datasets/google/civil_comments/blob/5cb696158f7a49c75722fd0c16abded746da3ea3/civil_comments.py)
263
+ * [allenai/social_bias_frames](https://huggingface.co/datasets/allenai/social_bias_frames)
264
+
265
+
266
+
267
+
268
+
269
+
270
  ## Model Card Authors
271
 
272
  Kristjan Greenewald