Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Paper • 2501.09695 • Published Jan 16 • 1
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Paper • 2505.12929 • Published May 19 • 3