1 37 123

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

liked a dataset 2 days ago

m-a-p/SuperGPQA

liked a dataset 7 days ago

LLM360/guru-RL-92k

upvoted an article 21 days ago

From GRPO to DAPO and GSPO: What, Why, and How

View all activity

Organizations

None yet

liked a dataset 2 days ago

m-a-p/SuperGPQA

Viewer • Updated Apr 30, 2025 • 26.5k • 9.06k • 80

liked a dataset 7 days ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 1.56k • 42

upvoted an article 21 days ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9, 2025

•

upvoted an article about 1 month ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

390

liked a dataset about 1 month ago

zwhe99/DeepMath-103K

Viewer • Updated May 29, 2025 • 103k • 9k • 286

liked a model about 1 month ago

deepseek-ai/DeepSeek-Math-V2

Text Generation • 685B • Updated Nov 27, 2025 • 2.25k • 677

upvoted a paper about 2 months ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132

liked 2 models about 2 months ago

WeiboAI/VibeThinker-1.5B

Text Generation • 2B • Updated Nov 24, 2025 • 1.86k • 507

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated Nov 21, 2025 • 1.49k • 234

liked a dataset 2 months ago

open-r1/DAPO-Math-17k-Processed

Viewer • Updated Nov 10, 2025 • 34.8k • 5.12k • 53

upvoted 2 papers 3 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 80

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55

liked 3 models 3 months ago

liked a dataset 3 months ago

jupyter-agent/jupyter-agent-dataset

Viewer • Updated Sep 10, 2025 • 95.8k • 1.44k • 154

liked a model 3 months ago

jinaai/jina-embeddings-v4

Visual Document Retrieval • 4B • Updated Sep 2, 2025 • 113k • 447

upvoted a collection 3 months ago

Qwen3-VL

Collection

37 items • Updated 10 days ago • 561

liked 2 models 3 months ago

Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 46.5k • • 358

Peng Wang

AI & ML interests

Recent Activity

Organizations

stillarrow's activity

From GRPO to DAPO and GSPO: What, Why, and How

Illustrating Reinforcement Learning from Human Feedback (RLHF)