-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 137 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 134 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 86
Collections
Discover the best community collections!
Collections including paper arxiv:2508.13167
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 17 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 32 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 137 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 134 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 86
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 17 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 32 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49