RLHFlow

university

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

baohao submitted a paper 19 days ago

Self-Hinting Language Models Enhance Reinforcement Learning

baohao updated a collection 4 months ago

baohao updated a collection 4 months ago

View all activity

Papers

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

View all Papers

submitted a paper to Daily Papers 19 days ago

Self-Hinting Language Models Enhance Reinforcement Learning

Paper • 2602.03143 • Published 20 days ago • 29

updated a collection 4 months ago

Reinforce-Ada

Training & test sets and finetuned models • 19 items • Updated Oct 26, 2025 • 3

updated 2 models 4 months ago

RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy

2B • Updated Oct 26, 2025 • 4

RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy

2B • Updated Oct 26, 2025 • 71

published 2 models 4 months ago

RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy

2B • Updated Oct 26, 2025 • 4

RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy

2B • Updated Oct 26, 2025 • 71

updated 2 datasets 4 months ago

RLHFlow/reinforce_ada_hard_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 13.3k • 36

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 25k • 25

updated a model 4 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard

Updated Oct 15, 2025 • 1

updated a collection 4 months ago

Reinforce-Ada

Training & test sets and finetuned models • 19 items • Updated Oct 26, 2025 • 3

published a model 5 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard

Updated Oct 15, 2025 • 1

updated a model 5 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy

2B • Updated Oct 11, 2025 • 1

published a model 5 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy

2B • Updated Oct 11, 2025 • 1

updated a dataset 5 months ago

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 25k • 25

published a dataset 5 months ago

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 25k • 25

updated a dataset 5 months ago

RLHFlow/reinforce_ada_hard_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 13.3k • 36