Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper โข 2305.18290 โข Published May 29, 2023 โข 58 โข 3
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey Paper โข 2403.14608 โข Published Mar 21, 2024 โข 3
Small Models Struggle to Learn from Strong Reasoners Paper โข 2502.12143 โข Published Feb 17 โข 38 โข 6