ibm-granite
/

granite-3.0-8b-lora-intrinsics-v0.1

@@ -22,6 +22,9 @@ providing access to the Uncertainty, Hallucination Detection, and Safety Excepti
 - **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Uncertainty Intrinsic
 The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
@@ -33,7 +36,8 @@ This percentage is *calibrated* in the following sense: given a set of answers a
 The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
 ### Safety Exception Intrinsic
-The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
 ## Usage
@@ -83,8 +87,8 @@ we can evaluate the certainty and hallucination status of this reply by invoking
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
-### PDL Implementation
-Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1**, this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
 ```python
 defs:
   apply_template:
@@ -111,10 +115,10 @@ defs:
         def: mycontext
         args:
           context: ${ context }
-      - model: granite-8b-intrinsics-v2-20241201
         parameters:
           api_key: EMPTY
-          api_base: http://aims-01.sl.res.ibm.com:21001/v1
           temperature: 0
           max_tokens: 1
           custom_llm_provider: text-completion-openai
@@ -157,13 +161,13 @@ text:
       intrinsic: safety
   - role: system
     text: ${ system_prompt }
-  - if: ${ safety != "N" }
     then:
       text:
         - "\n\nDocuments: ${ document }\n\n ${ query }"
-        - model: openai/granite-8b-intrinsics-v2-20241201
           def: answer
-          parameters: {api_key: EMPTY, api_base: http://aims-01.sl.res.ibm.com:21001/v1, temperature: 0, stop: "\n"}
         - call: get_intrinsic
           def: certainty
           contribute: []
@@ -179,7 +183,8 @@ text:
         - "\nCertainty: ${ certainty }"
         - "\nHallucination: ${ hallucination }"
 ```
@@ -197,16 +202,19 @@ Additionally, certainty scores are *distributional* quantities, and so will do w
 red-teamed examples.
 ## Evaluation
-The model was evaluated on the [MMLU](https://huggingface.co/datasets/cais/mmlu) datasets (not used in training). Shown are the [Expected Calibration Error (ECE)](https://towardsdatascience.com/expected-calibration-error-ece-a-step-by-step-visual-explanation-with-python-code-c3e9aa12937d) for each task, for the base model (Granite-3.0-8b-instruct) and Granite-Uncertainty-3.0-8b.
-The average ECE across tasks for our method is 0.064 (out of 1) and is consistently low across tasks (maximum task ECE 0.10), compared to the base model average ECE of 0.20 and maximum task ECE of 0.60. Note that our ECE of 0.064 is smaller than the gap between the quantized certainty outputs (10% quantization steps). Additionally, the zero-shot performance on the MMLU tasks does not degrade, averaging at 89%.
-<!-- This section describes the evaluation protocols and provides the results. -->
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/2MwP7DRZlNBtWSKWFvXOI.png)
 ## Training Details
 The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
@@ -238,13 +246,27 @@ The following datasets were used for calibration and/or finetuning. Certainty sc
 * [piqa](https://huggingface.co/datasets/ybisk/piqa)
 ### RAG Hallucination Training Data
-The following public datasets were used for finetuning the RAG model. The details of data creation for RAG response generation is available at [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf).
 For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
 * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
 * [QuAC](https://huggingface.co/datasets/allenai/quac)
 ## Model Card Authors
 Kristjan Greenewald

 - **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/ornGz5BdtfIXLYxDzUgi9.png)
 ### Uncertainty Intrinsic
 The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
 The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
 ### Safety Exception Intrinsic
+The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
+The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
 ## Usage
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
+### Intrinsics Example with PDL
+Given a hosted instance of **Granite Intrinsics 3.0 8b Instruct v1** at `API_BASE` (insert the host address here), this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
 ```python
 defs:
   apply_template:
         def: mycontext
         args:
           context: ${ context }
+      - model: granite-8b-intrinsics-v1
         parameters:
           api_key: EMPTY
+          api_base: API_BASE
           temperature: 0
           max_tokens: 1
           custom_llm_provider: text-completion-openai
       intrinsic: safety
   - role: system
     text: ${ system_prompt }
+  - if: ${ safety != "Y" }
     then:
       text:
         - "\n\nDocuments: ${ document }\n\n ${ query }"
+        - model: openai/granite-8b-intrinsics-v1
           def: answer
+          parameters: {api_key: EMPTY, api_base: API_BASE, temperature: 0, stop: "\n"}
         - call: get_intrinsic
           def: certainty
           contribute: []
         - "\nCertainty: ${ certainty }"
         - "\nHallucination: ${ hallucination }"
 ```
+### Intrinsics Example with SGLang
+The below SGLang implementation uses the SGLang fork at [https://github.com/frreiss/sglang/tree/granite](https://github.com/frreiss/sglang/tree/granite) that supports Granite models.
 red-teamed examples.
 ## Evaluation
+We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
+We first find that the performance of the intrinsics in our shared model **Granite Instrinsics 3.0 8b v1** is not degraded
+versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
+binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
+We then find that RAG performance of **Granite Instrinsics 3.0 8b v1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
 ## Training Details
 The **Granite Instrinsics 3.0 8b v1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
 * [piqa](https://huggingface.co/datasets/ybisk/piqa)
 ### RAG Hallucination Training Data
+The following public datasets were used for finetuning. The details of data creation for RAG response generation is available at [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf).
 For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
 * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
 * [QuAC](https://huggingface.co/datasets/allenai/quac)
+### Safety Exception Training Data
+The following public datasets were used for finetuning.
+* [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
+* [nvidia/Aegis-AI-Content-Safety-Dataset-1.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-1.0/viewer/default/train)
+* A subset of [https://huggingface.co/datasets/Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
+* Ibm/AttaQ
+* [google/civil_comments](https://huggingface.co/datasets/google/civil_comments/blob/5cb696158f7a49c75722fd0c16abded746da3ea3/civil_comments.py)
+* [allenai/social_bias_frames](https://huggingface.co/datasets/allenai/social_bias_frames)
 ## Model Card Authors
 Kristjan Greenewald