scaling-dpo (DPO SCALING)

rmrafailov

authored a paper 7 months ago

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24, 2024 • 19

rmrafailov

authored 2 papers 8 months ago

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5, 2024 • 53

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 37

hsikchi

authored a paper 9 months ago

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Paper • 2406.02900 • Published Jun 5, 2024 • 12

yaswanthchittepu

authored a paper 9 months ago

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Paper • 2406.02900 • Published Jun 5, 2024 • 12

rmrafailov

authored a paper 9 months ago

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Paper • 2405.19107 • Published May 29, 2024 • 14

hsikchi

authored 2 papers 11 months ago

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

Paper • 2302.08560 • Published Feb 16, 2023 • 1

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

Paper • 2311.02013 • Published Nov 3, 2023

rmrafailov

authored a paper about 1 year ago

Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 50

rmrafailov

authored a paper over 1 year ago

Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25

hsikchi

authored a paper over 1 year ago

Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25

rmrafailov

authored a paper over 1 year ago

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Paper • 2310.12962 • Published Oct 19, 2023 • 13

DPO SCALING

AI & ML interests

scaling-dpo's activity

PERSONA: A Reproducible Testbed for Pluralistic Alignment

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

OpenVLA: An Open-Source Vision-Language-Action Model

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

Diffusion Model Alignment Using Direct Preference Optimization

Contrastive Prefence Learning: Learning from Human Feedback without RL

Contrastive Prefence Learning: Learning from Human Feedback without RL

An Emulator for Fine-Tuning Large Language Models using Small Language Models

AI & ML interests

Team members 3

scaling-dpo's activity