Accelerating Nash Learning from Human Feedback via Mirror Prox Paper • 2505.19731 • Published 17 days ago • 6
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 91
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators Paper • 2502.06394 • Published Feb 10 • 90