Async RLHF Paper Checkpoints Checkpoints for "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models" https://arxiv.org/abs/2410.18252 vwxyzjn/online_dpo_async Updated Feb 5 • 9 vwxyzjn/online_dpo_sync Updated Feb 5 • 6 vwxyzjn/ppo_async Updated Feb 5 • 8 vwxyzjn/ppo_sync Updated Feb 5 • 5
lm-human-preference-details vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 1 lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023
vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 1
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023
vwxyzjn/acecoder_sft_gpt4o_test_cases_then_impl_no_system_message Viewer • Updated 3 days ago • 41.6k • 18