Rl - a anujga Collection

anujga 's Collections

Special

PT

Persona

Sft

O1

Rl

Theory

agent

Rl

updated Mar 18

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

Paper • 2307.12950 • Published Jul 24, 2023 • 10
HumanLLMs/Human-Like-DPO-Dataset

Viewer • Updated Jan 12 • 10.9k • 1.53k • 225
sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo

Viewer • Updated Oct 23, 2024 • 5.65k • 86 • 26
RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 72 • 15
RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10, 2024 • 526k • 29
TIGER-Lab/WebInstruct-CFT

Viewer • Updated Feb 2 • 654k • 177 • 52
deu05232/promptriever-ours2-filtered_FN

Viewer • Updated Feb 10 • 1.31M • 6
argilla/distilabel-intel-orca-dpo-pairs

Viewer • Updated Mar 19 • 12.9k • 4.2k • 174