Can you share more information about the Writer Reward Model training ?

#1
by cizhenshi - opened

such as datasets size, data source

Knowledge Engineer Group @ Tsinghua University org

Our writing reward model (RM) is trained in a manner similar to the RM training approach described in https://arxiv.org/abs/2404.00934, except that our dataset includes a larger number of writing-related prompts and a more diverse collection of model responses.

Thank you for your detailed response! It's very helpful to understand the training approach and the diversity of the dataset in the writing domain. I was wondering if it might be possible to share a general idea of the scale of reward pairs used in this area—would it be in the range of a few thousand, tens of thousands, or perhaps even hundreds of thousands? Of course, I completely understand if this information is sensitive. I really appreciate the insights you've shared so far!

Knowledge Engineer Group @ Tsinghua University org
edited 3 days ago

Thanks for your understanding. The scale of reward pairs is above hundreds of thousands.

Sign up or log in to comment