-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 29 -
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Paper • 2406.02900 • Published • 14 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 23 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9
Yuquan Xie
xieyuquan
AI & ML interests
LLM, multi-modal
Recent Activity
updated
a Space
about 1 month ago
xieyuquan/werewolf_1
published
a Space
about 1 month ago
xieyuquan/werewolf_1
updated
a model
about 2 months ago
xieyuquan/Optimus3-Policy