Qwen2.5 7b GRPO RM Train (Writing Demo)

image/png

This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).

Model Output Example (from 768 token prefix)

image/png

Other

Reward function files can be found here: verifiers

This model was trained using my chunked pref reward model baseline: pretrain-rm-baseline-7b

Downloads last month
16
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Quest-AI/qwen-writerdemo-7b-s500

Base model

Qwen/Qwen2.5-7B
Finetuned
(263)
this model
Quantizations
2 models

Dataset used to train Quest-AI/qwen-writerdemo-7b-s500