Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,14 @@ Made with Exllamav2 0.2.8 with the default dataset. Granite3 models require Exll
|
|
23 |
Exl2 models don't support native RAM offloading, so the model has to fully fit into GPU VRAM.
|
24 |
It's also required to use Nvidia RTX on Windows or Nvidia RTX/AMD ROCm on Linux.
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
# Original model card
|
27 |
# Granite Guardian 3.1 8B
|
28 |
|
|
|
23 |
Exl2 models don't support native RAM offloading, so the model has to fully fit into GPU VRAM.
|
24 |
It's also required to use Nvidia RTX on Windows or Nvidia RTX/AMD ROCm on Linux.
|
25 |
|
26 |
+
Just in case if you downloaded the model and it answers only Yes/No, it's [intended behavior](https://github.com/ibm-granite/granite-guardian/tree/main#scope-of-use).
|
27 |
+
It's hardcoded in the model's Jinja2 template that can be viewed in tokenizer_config.json file.
|
28 |
+
By default in chat mode it evaluates if user's or assistant's message is harmful in general sense according to the model's risk definitions.
|
29 |
+
But it allows to choose a different predefined option, to set custom harm definitions or detect risks in RAG or function calling pipelines.
|
30 |
+
If you're using TabbyAPI you can either set risk_name or risk_definition via [template variables](https://github.com/theroyallab/tabbyAPI/wiki/04.-Chat-Completions#template-variables).
|
31 |
+
For example, you can switch to violence detection by adding: ``"template_vars": {"guardian_config": {"risk_name": "violence"}}`` to v1/chat/completions request.
|
32 |
+
For more information refer to Granite Guardian [documentation](https://github.com/ibm-granite/granite-guardian) and its Jinja2 template.
|
33 |
+
|
34 |
# Original model card
|
35 |
# Granite Guardian 3.1 8B
|
36 |
|