Can you share more information about the Writer Reward Model training ?

by cizhenshi - opened 4 days ago

Discussion

cizhenshi

4 days ago

such as datasets size, data source

bys0318

Knowledge Engineer Group @ Tsinghua University org 3 days ago

Our writing reward model (RM) is trained in a manner similar to the RM training approach described in https://arxiv.org/abs/2404.00934, except that our dataset includes a larger number of writing-related prompts and a more diverse collection of model responses.

cizhenshi

3 days ago

Thank you for your detailed response! It's very helpful to understand the training approach and the diversity of the dataset in the writing domain. I was wondering if it might be possible to share a general idea of the scale of reward pairs used in this area—would it be in the range of a few thousand, tens of thousands, or perhaps even hundreds of thousands? Of course, I completely understand if this information is sensitive. I really appreciate the insights you've shared so far!

bys0318

Knowledge Engineer Group @ Tsinghua University org 3 days ago

•

edited 3 days ago

Thanks for your understanding. The scale of reward pairs is above hundreds of thousands.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment