Post
257
POLAR🐻❄️ New reward modeling by Shanghai AI Lab
internlm/polar-68693f829d2e83ac5e6e124a
✨ 1.8B/7B - Apache 2.0
✨ Scalable policy discriminative pretraining
✨ Easy RLHF with minimal preference data
internlm/polar-68693f829d2e83ac5e6e124a
✨ 1.8B/7B - Apache 2.0
✨ Scalable policy discriminative pretraining
✨ Easy RLHF with minimal preference data