taicheng guo

taicheng

AI & ML interests

None yet

Recent Activity

liked a model 3 days ago

Qwen/Qwen3-1.7B

upvoted a paper 4 days ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

upvoted a paper 5 days ago

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

View all activity

Organizations

taicheng's activity

upvoted a paper 4 days ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published 6 days ago • 79

upvoted a paper 5 days ago

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published 7 days ago • 34

upvoted a paper 9 days ago

Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 14 days ago • 80

upvoted 2 papers 15 days ago

MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space

Paper • 2504.13835 • Published 17 days ago • 36

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 17 days ago • 118

upvoted a paper 16 days ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 18 days ago • 88

upvoted a paper 18 days ago

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

Paper • 2504.10766 • Published 21 days ago • 40

upvoted a paper 2 months ago

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 74

upvoted a paper 3 months ago

Teaching Language Models to Critique via Reinforcement Learning

Paper • 2502.03492 • Published Feb 5 • 24

upvoted a collection 4 months ago

🧠 Reasoning Models

Collection

9 items • Updated Mar 28 • 38

upvoted 6 papers 4 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 98

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published Jan 9 • 101

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 276

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 97

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 53

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published Dec 19, 2024 • 74

upvoted 3 papers 5 months ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 95

What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks

Paper • 2305.18365 • Published May 27, 2023 • 4

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 49

upvoted a collection 7 months ago

Power-LM

Collection

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17, 2024 • 15