Update `pipeline_tag` and correct `project_page` URL in metadata
Browse filesThis PR improves the model card metadata by:
- Changing the `pipeline_tag` from `text-generation` to `text-classification`. This more accurately reflects the model's core functionality of evaluating text for policy compliance and providing a classification (PASS/FAIL), ensuring better discoverability at https://huggingface.co/models?pipeline_tag=text-classification.
- Correcting the `project_page` URL from `https://github.com/taruschirag/DynaGuard` to `https://taruschirag.github.io/DynaGuard/` to point to the official project website, ensuring consistency and accuracy.
No changes are made to the markdown content as it is already comprehensive and well-structured.
README.md
CHANGED
|
@@ -1,8 +1,12 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
| 3 |
language: en
|
| 4 |
library_name: transformers
|
| 5 |
-
|
|
|
|
| 6 |
tags:
|
| 7 |
- guardrail
|
| 8 |
- safety
|
|
@@ -11,13 +15,9 @@ tags:
|
|
| 11 |
- umd
|
| 12 |
- qwen3
|
| 13 |
- llm
|
| 14 |
-
datasets:
|
| 15 |
-
- tomg-group-umd/DynaBench
|
| 16 |
-
base_model:
|
| 17 |
-
- Qwen/Qwen3-4B
|
| 18 |
repo_url: https://github.com/montehoover/DynaGuard
|
| 19 |
paper_url: https://arxiv.org/abs/2509.02563
|
| 20 |
-
project_page: https://github.
|
| 21 |
---
|
| 22 |
|
| 23 |
# DynaGuard-4B 🛡️
|
|
@@ -35,17 +35,17 @@ The DynaGuard series achieves state-of-the-art performance across a wide range o
|
|
| 35 |
|
| 36 |
## Model Details
|
| 37 |
|
| 38 |
-
*
|
| 39 |
-
*
|
| 40 |
-
*
|
| 41 |
-
*
|
| 42 |
-
*
|
| 43 |
|
| 44 |
### Key Features
|
| 45 |
|
| 46 |
-
*
|
| 47 |
-
*
|
| 48 |
-
*
|
| 49 |
1. **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
|
| 50 |
2. **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
|
| 51 |
|
|
@@ -109,7 +109,8 @@ Evaluate the following dialogue for compliance with the given policy. Provide th
|
|
| 109 |
"""
|
| 110 |
inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
|
| 111 |
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
|
| 112 |
-
print("
|
|
|
|
| 113 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 114 |
```
|
| 115 |
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen3-4B
|
| 4 |
+
datasets:
|
| 5 |
+
- tomg-group-umd/DynaBench
|
| 6 |
language: en
|
| 7 |
library_name: transformers
|
| 8 |
+
license: apache-2.0
|
| 9 |
+
pipeline_tag: text-classification
|
| 10 |
tags:
|
| 11 |
- guardrail
|
| 12 |
- safety
|
|
|
|
| 15 |
- umd
|
| 16 |
- qwen3
|
| 17 |
- llm
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
repo_url: https://github.com/montehoover/DynaGuard
|
| 19 |
paper_url: https://arxiv.org/abs/2509.02563
|
| 20 |
+
project_page: https://taruschirag.github.io/DynaGuard/
|
| 21 |
---
|
| 22 |
|
| 23 |
# DynaGuard-4B 🛡️
|
|
|
|
| 35 |
|
| 36 |
## Model Details
|
| 37 |
|
| 38 |
+
* **Developed by:** University of Maryland, Capital One
|
| 39 |
+
* **Base Model:** [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 40 |
+
* **Model Type:** Decoder-only Transformer
|
| 41 |
+
* **Training Data:** Fine-tuned on a mixture of the **[DynaBench](https://huggingface.co/tomg-group-umd/DynaBench)** dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
|
| 42 |
+
* **Training Procedure:** The model was trained using Supervised Fine-Tuning (SFT) for one epoch, followed by GRPO.
|
| 43 |
|
| 44 |
### Key Features
|
| 45 |
|
| 46 |
+
* **Dynamic Policies:** Accepts arbitrary guardrail policies written in natural language, allowing for bespoke and application-specific moderation.
|
| 47 |
+
* **Interpretability:** Can generate detailed, natural-language explanations for why a policy was violated, enabling chatbot recovery and human-in-the-loop refinement.
|
| 48 |
+
* **Dual-Mode Inference:** Supports two modes for flexibility:
|
| 49 |
1. **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
|
| 50 |
2. **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
|
| 51 |
|
|
|
|
| 109 |
"""
|
| 110 |
inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
|
| 111 |
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
|
| 112 |
+
print("
|
| 113 |
+
--- Fast Inference Mode Output ---")
|
| 114 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 115 |
```
|
| 116 |
|