LLMs Align Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking Paper • 2312.09244 • Published Dec 14, 2023 • 7
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking Paper • 2312.09244 • Published Dec 14, 2023 • 7