Qwen2.5 7b GRPO RM Train (Writing Demo)
This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).
Model Output Example (from 768 token prefix)
Other
Reward function files can be found here: verifiers
This model was trained using my chunked pref reward model baseline: pretrain-rm-baseline-7b
- Downloads last month
- 16
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.