SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 2 days ago • 46
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 3 days ago • 19
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 8 days ago • 270
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 9 days ago • 39
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 12 days ago • 22
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 10 days ago • 84
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 33
view post Post 2786 BIG release by DeepSeek AI🔥🔥🔥DeepSeek-R1 & DeepSeek-R1-Zero: two 660B reasoning models are here, alongside 6 distilled dense models (based on Llama & Qwen) for the community!https://huggingface.co/deepseek-ai deepseek-ai/DeepSeek-R1✨ MIT License : enabling distillation for custom models ✨ 32B & 70B models match OpenAI o1-mini in multiple capabilities✨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner' See translation 🔥 15 15 🧠 6 6 👍 2 2 + Reply