Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.

Model:

deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1

1 reply

liked a model 4 months ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated Mar 27 • 984k • • 12.2k

updated a model 6 months ago

UnstableLlama/Marco-o1-exl2

Text Generation • Updated Nov 26, 2024 • 3

liked a model 6 months ago

turboderp/pixtral-12b-exl2

Updated Nov 11, 2024 • 15 • 8

reacted to ezgikorkmaz's post with 🚀 7 months ago

Post

2094

I wrote a recent survey about deep reinforcement learning. The paper is a compact guide to understand some of the key concepts in reinforcement learning. Find the paper below:

Paper: https://arxiv.org/pdf/2401.02349v2
Twitter: https://x.com/EzgiKorkmazAI/status/1851934161138798615

updated a model 7 months ago

UnstableLlama/Rombos-LLM-V2.6-Nemotron-70b-exl2

Updated Oct 18, 2024

liked 4 models 8 months ago

liked a model 10 months ago

turboderp/turbcat-instruct-72b

Text Generation • Updated Jul 19, 2024 • 16 • 29

liked a model 11 months ago

turboderp/llama3-turbcat-instruct-8b-exl2

Updated Jun 20, 2024 • 4 • 5

updated a model 11 months ago

UnstableLlama/L3-MS-Astoria-70b-exl2-default-cal

Updated Jun 7, 2024 • 5