Instructions for moderation of prompts (without response)

#4
by AmenRa - opened

Hi,

In the example reported in your model card, you provided the instructions for your model when moderating both prompt and response.
In your paper, you reported evaluation results for the moderation of prompts only, if I am not mistaken.

How should I instruct GuardReasoner for the moderation of prompts only?

Thanks

Hi,

Thanks for your interest. Just use the same instruction. You should only input the prompt and leave the response as None.

Some examples can be found in https://github.com/yueliu1999/GuardReasoner/blob/main/generate.py (OpenAI Moderation benchmark)

Best,
Yue

AmenRa changed discussion status to closed

Sign up or log in to comment