Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28, 2024 • 21
Some of the Papers I've Read Collection A few of the research papers that I've read. • 8 items • Updated Jul 2, 2024
Some of the Papers I've Read Collection A few of the research papers that I've read. • 8 items • Updated Jul 2, 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Paper • 2406.04271 • Published Jun 6, 2024 • 31
Some of the Papers I've Read Collection A few of the research papers that I've read. • 8 items • Updated Jul 2, 2024
Some of the Papers I've Read Collection A few of the research papers that I've read. • 8 items • Updated Jul 2, 2024
Preference Datasets for DPO Collection This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs • 7 items • Updated Dec 11, 2024 • 42