JudgeBench: A Benchmark for Evaluating LLM-based Judges Paper • 2410.12784 • Published 26 days ago • 42
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28 • 19
Chain-of-Verification Reduces Hallucination in Large Language Models Paper • 2309.11495 • Published Sep 20, 2023 • 38
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model Paper • 2310.09520 • Published Oct 14, 2023 • 10