yaswanthchittepu/ultrafeedback-binarized-llama3-8b-pop-margin-data-full Viewer • Updated Jul 25, 2024 • 63.7k • 55
yaswanthchittepu/ultrafeedback-binarized-llama3-8b-standard-margin-data-full Viewer • Updated Jul 25, 2024 • 63.7k • 45
yaswanthchittepu/ultrafeedback-binarized-pop-margin-data-full Viewer • Updated Jul 7, 2024 • 63.7k • 64
yaswanthchittepu/ultrafeedback-binarized-standard-margin-data-full Viewer • Updated Jul 7, 2024 • 63.7k • 69
yaswanthchittepu/pythia2.8b-ultrafeedback-binarized-pop-rm Text Classification • Updated Jul 5, 2024 • 7
yaswanthchittepu/pythia2.8b-ultrafeedback-binarized-standard-rm Text Classification • Updated Jul 5, 2024 • 7
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published Jun 5, 2024 • 12