UnstableLlama's picture

UnstableLlama

UnstableLlama
·

AI & ML interests

Local AI

Recent Activity

liked a model about 2 months ago
turboderp/gemma-3-27b-it-exl2
liked a model 2 months ago
nvidia/audio-flamingo-2-0.5B
published a model 3 months ago
UnstableLlama/Mistral-Small-24B-Base-2501-exl2
View all activity

Organizations

None yet

UnstableLlama's activity

reacted to chansung's post with 👍 4 months ago
view post
Post
2083
Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.


Model: deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1
  • 1 reply
·
reacted to ezgikorkmaz's post with 🚀 7 months ago