Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback Paper • 2310.05199 • Published Oct 8, 2023 • 1
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning Paper • 2310.11971 • Published Oct 18, 2023 • 1
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment Paper • 2312.09979 • Published Dec 15, 2023 • 1
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios Paper • 2401.00741 • Published Jan 1, 2024 • 1
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback Paper • 2401.11458 • Published Jan 21, 2024 • 2
Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey Paper • 2308.01191 • Published Aug 2, 2023 • 1
The Rise and Potential of Large Language Model Based Agents: A Survey Paper • 2309.07864 • Published Sep 14, 2023 • 7
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models Paper • 2403.12171 • Published Mar 18, 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance Paper • 2406.18118 • Published Jun 26, 2024
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle Paper • 2406.11190 • Published Jun 17, 2024
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study Paper • 2407.06153 • Published Jul 8, 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment Paper • 2410.09893 • Published Oct 13, 2024
DocFusion: A Unified Framework for Document Parsing Tasks Paper • 2412.12505 • Published Dec 17, 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Paper • 2402.05808 • Published Feb 8, 2024
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric Paper • 2502.17184 • Published Feb 24 • 1
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper • 2504.13914 • Published Apr 10 • 3
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published 6 days ago • 33