Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ keep an eye out for feedback and questions in the [Community section](https://hu
|
|
16 |
|
17 |
## Model Summary
|
18 |
|
19 |
-
**Granite
|
20 |
providing access to the Uncertainty, Hallucination Detection, and Safety Exception intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) model.
|
21 |
|
22 |
- **Developer:** IBM Research
|
@@ -49,7 +49,7 @@ The Safety Exception intrinsic was designed as a binary classifier that analyses
|
|
49 |
|
50 |
This is an experimental LoRA testing new functionality being developed for IBM's Granite LLM family. We are welcoming the community to test it out and give us feedback, but we are NOT recommending this model be used for real deployments at this time. Stay tuned for more updates on the Granite roadmap.
|
51 |
|
52 |
-
**Granite
|
53 |
with the added ability to generate the three specified intrinsics.
|
54 |
|
55 |
|
@@ -89,7 +89,7 @@ we can evaluate the certainty and hallucination status of this reply by invoking
|
|
89 |
|
90 |
|
91 |
### Intrinsics Example with PDL
|
92 |
-
Given a hosted instance of **Granite
|
93 |
Note that the hosted instance must be supported by LiteLLM ([https://docs.litellm.ai/docs/providers](https://docs.litellm.ai/docs/providers))
|
94 |
|
95 |
First, create a file `intrinsics.pdl` with the following content.
|
@@ -288,7 +288,7 @@ def main_chat_flow (s, doc, query):
|
|
288 |
|
289 |
|
290 |
if __name__ == "__main__":
|
291 |
-
model_path = "ibm-granite/granite-
|
292 |
|
293 |
# Setting the model_path to the granite model, and chat template to be the granite template
|
294 |
# This assumes "granite3-instruct" chat template has been registered in "sglang/lang/chat_template.py"
|
@@ -317,20 +317,20 @@ red-teamed examples.
|
|
317 |
## Evaluation
|
318 |
We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
|
319 |
|
320 |
-
We first find that the performance of the intrinsics in our shared model **Granite
|
321 |
versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
|
322 |
binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
|
323 |
|
324 |
|
325 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
|
326 |
|
327 |
-
We then find that RAG performance of **Granite
|
328 |
|
329 |
|
330 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
|
331 |
|
332 |
## Training Details
|
333 |
-
The **Granite
|
334 |
|
335 |
|
336 |
|
|
|
16 |
|
17 |
## Model Summary
|
18 |
|
19 |
+
**Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** is a LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct),
|
20 |
providing access to the Uncertainty, Hallucination Detection, and Safety Exception intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) model.
|
21 |
|
22 |
- **Developer:** IBM Research
|
|
|
49 |
|
50 |
This is an experimental LoRA testing new functionality being developed for IBM's Granite LLM family. We are welcoming the community to test it out and give us feedback, but we are NOT recommending this model be used for real deployments at this time. Stay tuned for more updates on the Granite roadmap.
|
51 |
|
52 |
+
**Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** is lightly tuned so that its behavior closely mimics that of [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct),
|
53 |
with the added ability to generate the three specified intrinsics.
|
54 |
|
55 |
|
|
|
89 |
|
90 |
|
91 |
### Intrinsics Example with PDL
|
92 |
+
Given a hosted instance of **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** at `API_BASE` (insert the host address here), this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
|
93 |
Note that the hosted instance must be supported by LiteLLM ([https://docs.litellm.ai/docs/providers](https://docs.litellm.ai/docs/providers))
|
94 |
|
95 |
First, create a file `intrinsics.pdl` with the following content.
|
|
|
288 |
|
289 |
|
290 |
if __name__ == "__main__":
|
291 |
+
model_path = "ibm-granite/granite-3.0-8b-lora-intrinsics-v0.1"
|
292 |
|
293 |
# Setting the model_path to the granite model, and chat template to be the granite template
|
294 |
# This assumes "granite3-instruct" chat template has been registered in "sglang/lang/chat_template.py"
|
|
|
317 |
## Evaluation
|
318 |
We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
|
319 |
|
320 |
+
We first find that the performance of the intrinsics in our shared model **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** is not degraded
|
321 |
versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
|
322 |
binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
|
323 |
|
324 |
|
325 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
|
326 |
|
327 |
+
We then find that RAG performance of **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
|
328 |
|
329 |
|
330 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
|
331 |
|
332 |
## Training Details
|
333 |
+
The **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
|
334 |
|
335 |
|
336 |
|