Instructions for moderation of prompts (without response)

by AmenRa - opened Jul 25

Jul 25

Hi,

In the example reported in your model card, you provided the instructions for your model when moderating both prompt and response.
In your paper, you reported evaluation results for the moderation of prompts only, if I am not mistaken.

How should I instruct GuardReasoner for the moderation of prompts only?

Thanks

yueliu1999

Owner Jul 26

Hi,

Thanks for your interest. Just use the same instruction. You should only input the prompt and leave the response as None.

Some examples can be found in https://github.com/yueliu1999/GuardReasoner/blob/main/generate.py (OpenAI Moderation benchmark)

Best,
Yue

AmenRa changed discussion status to closed Jul 28

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment