-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 59 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 16 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 14
Yiming Zheng
ZYM666
AI & ML interests
None yet
Recent Activity
liked
a dataset
9 days ago
mio/sukasuka-anime-vocal-dataset
liked
a model
10 days ago
bosonai/Higgs-Llama-3-70B
liked
a model
11 days ago
black-forest-labs/FLUX.1-dev
Organizations
None yet
Collections
1
Papers
1
spaces
1
models
7
ZYM666/swin-spe-model
Updated
•
8
ZYM666/q-FrozenLake-v1-4x4-noSlippery
Reinforcement Learning
•
Updated
ZYM666/Alpaca
Updated
ZYM666/ChatDoctor_change
Text Generation
•
Updated
•
14
•
1
ZYM666/text2vec-large-chinese-support-sentence-transformer
Updated
ZYM666/text2vec-large-chinese-support-sentence
Updated
ZYM666/flower_yolov5
Updated
datasets
0
None public yet