
RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
Text Generation
•
8B
•
Updated
•
20
•
1
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/