siyeng feng

siyengfeng

AI & ML interests

None yet

Recent Activity

upvoted a paper about 6 hours ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

liked a model 2 days ago

Qwen/Qwen2.5-VL-72B-Instruct

liked a model 2 days ago

tencent/Hunyuan3D-2

View all activity

Organizations

None yet

siyengfeng's activity

upvoted a paper about 6 hours ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 2 days ago • 46

liked 4 models 2 days ago

upvoted a paper 2 days ago

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published 3 days ago • 19

upvoted 3 papers 7 days ago

Autonomy-of-Experts Models

Paper • 2501.13074 • Published 8 days ago • 38

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published 9 days ago • 76

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 8 days ago • 270

liked a model 7 days ago

bespokelabs/Bespoke-Stratos-32B

Text Generation • Updated 7 days ago • 282 • 27

upvoted 2 papers 8 days ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published 9 days ago • 39

Reasoning Language Models: A Blueprint

Paper • 2501.11223 • Published 11 days ago • 30

upvoted an article 8 days ago

Article

Process Reinforcement through Implicit Rewards

•

28 days ago

• 20

upvoted 2 papers 8 days ago

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Paper • 2501.10893 • Published 12 days ago • 22

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published 10 days ago • 84

upvoted a paper 10 days ago

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 33

reacted to AdinaY's post with 🔥 10 days ago

Post

2786

BIG release by DeepSeek AI🔥🔥🔥

DeepSeek-R1 & DeepSeek-R1-Zero: two 660B reasoning models are here, alongside 6 distilled dense models (based on Llama & Qwen) for the community!
https://huggingface.co/deepseek-ai
deepseek-ai/DeepSeek-R1

✨ MIT License : enabling distillation for custom models
✨ 32B & 70B models match OpenAI o1-mini in multiple capabilities
✨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'

liked a model 10 days ago

deepseek-ai/DeepSeek-R1-Zero

Text Generation • Updated 4 days ago • 17.1k • 622

upvoted a paper 10 days ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published 14 days ago • 100

liked a model 10 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 4 days ago • 498k • 5.24k