Qwen2.5 7b GRPO RM Train (Writing Demo)

This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).

Model Output Example (from 768 token prefix)

Reward function files can be found here: verifiers

This model was trained using my chunked pref reward model baseline: pretrain-rm-baseline-7b

Safetensors

Model size

7.62B params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Base model

Qwen/Qwen2.5-7B

Finetuned

(263)

this model

Quantizations