How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench Paper • 2508.20931 • Published 8 days ago • 15
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Paper • 2404.14723 • Published Apr 23, 2024 • 10
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2, 2024 • 68