Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
anujga 's Collections
rl-papers
Multi-lingual
Retrieval
Special
Aggregates
PT
Persona
Pt-classify
Sft
O1
Rl
Programming
Benchmark
Architecture
Datasets
Theory
agent
data/tool
data/vision
chemistry

Rl

updated Mar 18
Upvote
-

  • RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

    Paper • 2307.12950 • Published Jul 24, 2023 • 10

  • HumanLLMs/Human-Like-DPO-Dataset

    Viewer • Updated Jan 12 • 10.9k • 1.41k • 223

  • sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo

    Viewer • Updated Oct 23, 2024 • 5.65k • 88 • 25

  • RLHFlow/Deepseek-PRM-Data

    Viewer • Updated Nov 9, 2024 • 253k • 108 • 13

  • RLHFlow/DS-and-Mistral-PRM-Data

    Viewer • Updated Nov 10, 2024 • 526k • 44

  • TIGER-Lab/WebInstruct-CFT

    Viewer • Updated Feb 2 • 654k • 180 • 51

  • deu05232/promptriever-ours2-filtered_FN

    Viewer • Updated Feb 10 • 1.31M • 14

  • argilla/distilabel-intel-orca-dpo-pairs

    Viewer • Updated Mar 19 • 12.9k • 1.93k • 173
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs