SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Paper • 2503.18892 • Published 13 days ago • 28
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26 • 22
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Paper • 2501.12224 • Published Jan 21 • 48
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published Jan 21 • 46
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement Paper • 2501.12273 • Published Jan 21 • 14
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models Paper • 2411.19477 • Published Nov 29, 2024 • 6
Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework Paper • 2410.06328 • Published Oct 8, 2024 • 2
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 55
Drowning in Documents: Consequences of Scaling Reranker Inference Paper • 2411.11767 • Published Nov 18, 2024 • 17
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists Paper • 2410.23331 • Published Oct 30, 2024 • 8