Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs Paper • 2503.14286 • Published 16 days ago • 1
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference Paper • 2306.12509 • Published Jun 21, 2023 • 14