kgreenewald
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,9 @@ providing access to the Uncertainty, Hallucination Detection, and Safety Excepti
|
|
22 |
- **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
|
23 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
24 |
|
|
|
|
|
|
|
25 |
### Uncertainty Intrinsic
|
26 |
The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
|
27 |
|
@@ -33,7 +36,8 @@ This percentage is *calibrated* in the following sense: given a set of answers a
|
|
33 |
The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
|
34 |
|
35 |
### Safety Exception Intrinsic
|
36 |
-
The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
|
|
|
37 |
|
38 |
|
39 |
## Usage
|
@@ -83,8 +87,8 @@ we can evaluate the certainty and hallucination status of this reply by invoking
|
|
83 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
|
84 |
|
85 |
|
86 |
-
### PDL
|
87 |
-
Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1
|
88 |
```python
|
89 |
defs:
|
90 |
apply_template:
|
@@ -111,10 +115,10 @@ defs:
|
|
111 |
def: mycontext
|
112 |
args:
|
113 |
context: ${ context }
|
114 |
-
- model: granite-8b-intrinsics-
|
115 |
parameters:
|
116 |
api_key: EMPTY
|
117 |
-
api_base:
|
118 |
temperature: 0
|
119 |
max_tokens: 1
|
120 |
custom_llm_provider: text-completion-openai
|
@@ -157,13 +161,13 @@ text:
|
|
157 |
intrinsic: safety
|
158 |
- role: system
|
159 |
text: ${ system_prompt }
|
160 |
-
- if: ${ safety != "
|
161 |
then:
|
162 |
text:
|
163 |
- "\n\nDocuments: ${ document }\n\n ${ query }"
|
164 |
-
- model: openai/granite-8b-intrinsics-
|
165 |
def: answer
|
166 |
-
parameters: {api_key: EMPTY, api_base:
|
167 |
- call: get_intrinsic
|
168 |
def: certainty
|
169 |
contribute: []
|
@@ -179,7 +183,8 @@ text:
|
|
179 |
- "\nCertainty: ${ certainty }"
|
180 |
- "\nHallucination: ${ hallucination }"
|
181 |
```
|
182 |
-
|
|
|
183 |
|
184 |
|
185 |
|
@@ -197,16 +202,19 @@ Additionally, certainty scores are *distributional* quantities, and so will do w
|
|
197 |
red-teamed examples.
|
198 |
|
199 |
## Evaluation
|
|
|
200 |
|
201 |
-
|
202 |
-
|
203 |
-
|
204 |
|
205 |
|
|
|
206 |
|
207 |
-
|
208 |
|
209 |
|
|
|
210 |
|
211 |
## Training Details
|
212 |
The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
|
@@ -238,13 +246,27 @@ The following datasets were used for calibration and/or finetuning. Certainty sc
|
|
238 |
* [piqa](https://huggingface.co/datasets/ybisk/piqa)
|
239 |
|
240 |
### RAG Hallucination Training Data
|
241 |
-
The following public datasets were used for finetuning
|
242 |
For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
|
243 |
|
244 |
-
|
245 |
* [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
|
246 |
* [QuAC](https://huggingface.co/datasets/allenai/quac)
|
247 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
248 |
## Model Card Authors
|
249 |
|
250 |
Kristjan Greenewald
|
|
|
22 |
- **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
|
23 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
24 |
|
25 |
+
|
26 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/ornGz5BdtfIXLYxDzUgi9.png)
|
27 |
+
|
28 |
### Uncertainty Intrinsic
|
29 |
The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
|
30 |
|
|
|
36 |
The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
|
37 |
|
38 |
### Safety Exception Intrinsic
|
39 |
+
The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
|
40 |
+
The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
|
41 |
|
42 |
|
43 |
## Usage
|
|
|
87 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
|
88 |
|
89 |
|
90 |
+
### Intrinsics Example with PDL
|
91 |
+
Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1** at `API_BASE` (insert the host address here), this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
|
92 |
```python
|
93 |
defs:
|
94 |
apply_template:
|
|
|
115 |
def: mycontext
|
116 |
args:
|
117 |
context: ${ context }
|
118 |
+
- model: granite-8b-intrinsics-v1
|
119 |
parameters:
|
120 |
api_key: EMPTY
|
121 |
+
api_base: API_BASE
|
122 |
temperature: 0
|
123 |
max_tokens: 1
|
124 |
custom_llm_provider: text-completion-openai
|
|
|
161 |
intrinsic: safety
|
162 |
- role: system
|
163 |
text: ${ system_prompt }
|
164 |
+
- if: ${ safety != "Y" }
|
165 |
then:
|
166 |
text:
|
167 |
- "\n\nDocuments: ${ document }\n\n ${ query }"
|
168 |
+
- model: openai/granite-8b-intrinsics-v1
|
169 |
def: answer
|
170 |
+
parameters: {api_key: EMPTY, api_base: API_BASE, temperature: 0, stop: "\n"}
|
171 |
- call: get_intrinsic
|
172 |
def: certainty
|
173 |
contribute: []
|
|
|
183 |
- "\nCertainty: ${ certainty }"
|
184 |
- "\nHallucination: ${ hallucination }"
|
185 |
```
|
186 |
+
### Intrinsics Example with SGLang
|
187 |
+
The below SGLang implementation uses the SGLang fork at [https://github.com/frreiss/sglang/tree/granite](https://github.com/frreiss/sglang/tree/granite) that supports Granite models.
|
188 |
|
189 |
|
190 |
|
|
|
202 |
red-teamed examples.
|
203 |
|
204 |
## Evaluation
|
205 |
+
We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
|
206 |
|
207 |
+
We first find that the performance of the intrinsics in our shared model **Granite Instrinsics 3.0 8b v1** is not degraded
|
208 |
+
versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
|
209 |
+
binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
|
210 |
|
211 |
|
212 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
|
213 |
|
214 |
+
We then find that RAG performance of **Granite Instrinsics 3.0 8b v1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
|
215 |
|
216 |
|
217 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
|
218 |
|
219 |
## Training Details
|
220 |
The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
|
|
|
246 |
* [piqa](https://huggingface.co/datasets/ybisk/piqa)
|
247 |
|
248 |
### RAG Hallucination Training Data
|
249 |
+
The following public datasets were used for finetuning. The details of data creation for RAG response generation is available at [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf).
|
250 |
For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
|
251 |
|
|
|
252 |
* [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
|
253 |
* [QuAC](https://huggingface.co/datasets/allenai/quac)
|
254 |
|
255 |
+
### Safety Exception Training Data
|
256 |
+
The following public datasets were used for finetuning.
|
257 |
+
|
258 |
+
* [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
|
259 |
+
* [nvidia/Aegis-AI-Content-Safety-Dataset-1.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-1.0/viewer/default/train)
|
260 |
+
* A subset of [https://huggingface.co/datasets/Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
261 |
+
* Ibm/AttaQ
|
262 |
+
* [google/civil_comments](https://huggingface.co/datasets/google/civil_comments/blob/5cb696158f7a49c75722fd0c16abded746da3ea3/civil_comments.py)
|
263 |
+
* [allenai/social_bias_frames](https://huggingface.co/datasets/allenai/social_bias_frames)
|
264 |
+
|
265 |
+
|
266 |
+
|
267 |
+
|
268 |
+
|
269 |
+
|
270 |
## Model Card Authors
|
271 |
|
272 |
Kristjan Greenewald
|