anakin87
's Collections
📝 Cool LLM papers
updated
Paper
•
2412.15115
•
Published
•
340
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
•
2412.11605
•
Published
•
17
📈
Scaling test-time compute
Reverse Thinking Makes LLMs Stronger Reasoners
Paper
•
2411.19865
•
Published
•
20
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
•
2411.15124
•
Published
•
58
Scaling Laws for Precision
Paper
•
2411.04330
•
Published
•
7
LoRA vs Full Fine-tuning: An Illusion of Equivalence
Paper
•
2410.21228
•
Published
•
2
Unpacking DPO and PPO: Disentangling Best Practices for Learning from
Preference Feedback
Paper
•
2406.09279
•
Published
•
2
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy
Data
Paper
•
2404.14367
•
Published
•
1
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
30
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Paper
•
2404.10719
•
Published
•
5