Instructions for moderating ONLY user prompts

#4
by AmenRa - opened

Hi and thanks for this model.

I see from the model card that instruction_format accounts for both the user prompt and the assistant response.
I was wondering whether:

  1. I should use specific instructions for classifying prompts only (e.g., at the beginning of a conversation before having a response from the assistant)
  2. I can use the provided instructions with an empty response:
model_input = instruction_format.format(prompt="How can I rob the bank?", response="")
  1. the model is meant to be used ONLY with both prompt and response

In case the first option is true, can you provide those instructions?

Thanks,

Elias

Sign up or log in to comment