arxiv:2510.09535

Mitigating Overthinking through Reasoning Shaping

Published on Oct 10

· Submitted by

Feifan Song on Oct 13

Upvote

Authors:

Feifan Song ,

Shaohang Wei ,

Abstract

Group Relative Segment Penalization (GRSP) improves token efficiency in large reasoning models without significantly reducing accuracy, especially for complex problems, by regularizing reasoning at the step level.

AI-generated summary

Large reasoning models (LRMs) boosted by Reinforcement Learning from Verifier Reward (RLVR) have shown great power in problem solving, yet they often cause overthinking: excessive, meandering reasoning that inflates computational cost. Prior designs of penalization in RLVR manage to reduce token consumption while often harming model performance, which arises from the oversimplicity of token-level supervision. In this paper, we argue that the granularity of supervision plays a crucial role in balancing efficiency and accuracy, and propose Group Relative Segment Penalization (GRSP), a step-level method to regularize reasoning. Since preliminary analyses show that reasoning segments are strongly correlated with token consumption and model performance, we design a length-aware weighting mechanism across segment clusters. Extensive experiments demonstrate that GRSP achieves superior token efficiency without heavily compromising accuracy, especially the advantages with harder problems. Moreover, GRSP stabilizes RL training and scales effectively across model sizes.

View arXiv page View PDF Add to collection

Community

songff

Paper author Paper submitter 9 days ago

SylvainWei

Paper author 9 days ago

good work! This work gives us more explanations on the exact issues of overlong CoTs, and maybe segment-level penalty is an efficient and effective way to alleviate the issue.

librarian-bot

8 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.09535 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.09535 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.09535 in a Space README.md to link it from this page.