Case2Code: Learning Inductive Reasoning with Synthetic Data Paper • 2407.12504 • Published Jul 17, 2024 • 8
Secrets of RLHF in Large Language Models Part I: PPO Paper • 2307.04964 • Published Jul 11, 2023 • 28
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning Paper • 2310.11971 • Published Oct 18, 2023 • 1
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment Paper • 2312.09979 • Published Dec 15, 2023 • 1
Secrets of RLHF in Large Language Models Part II: Reward Modeling Paper • 2401.06080 • Published Jan 11, 2024 • 27
Human-Instruction-Free LLM Self-Alignment with Limited Samples Paper • 2401.06785 • Published Jan 6, 2024 • 1
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback Paper • 2401.11458 • Published Jan 21, 2024 • 2
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper • 2402.01391 • Published Feb 2, 2024 • 42
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards Paper • 2403.07708 • Published Mar 12, 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 19