1 25 94

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 23 hours ago

Understanding R1-Zero-Like Training: A Critical Perspective

liked a dataset 24 days ago

KbsdJames/Omni-MATH

liked a model 24 days ago

ByteDance-Seed/Seed-Coder-8B-Instruct

View all activity

Organizations

None yet

stillarrow's activity

upvoted a paper about 23 hours ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 50

upvoted a paper about 1 month ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 53

upvoted a paper about 2 months ago

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15 • 17

upvoted a collection about 2 months ago

Gemma 3 Release

Collection

24 items • Updated 8 days ago • 380

upvoted an article 2 months ago

Article

The Large Language Model Course

•

Jan 16

• 185

upvoted 2 articles 3 months ago

Article

Mastering Tensor Dimensions in Transformers

•

Jan 12

• 62

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

•

Jan 30

• 72

upvoted an article 4 months ago

Article

Open R1: Update #2

and 6 others •

Feb 10

• 214

upvoted a collection 4 months ago

OpenMath

Collection

A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated about 9 hours ago • 44

upvoted 3 papers 5 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 98

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3, 2024 • 55

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 83

upvoted a paper 10 months ago

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 55

upvoted 3 papers 11 months ago

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Paper • 2407.04078 • Published Jul 4, 2024 • 21

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 81

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29, 2024 • 40

upvoted a collection about 1 year ago

Reward models on the hub

Collection

UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13, 2024 • 25

upvoted 2 papers about 1 year ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 65

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 34

upvoted a paper over 1 year ago

LiPO: Listwise Preference Optimization through Learning-to-Rank

Paper • 2402.01878 • Published Feb 2, 2024 • 20