Papers
arxiv:2512.02807

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

Published on Dec 2
· Submitted by yixuan on Dec 4
Authors:

Abstract

Stable rank, an intrinsic quality signal derived from model representations, improves LLM alignment with human preferences through reinforcement learning without external supervision.

AI-generated summary

Aligning Large Language Models (LLMs) with human preferences typically relies on external supervision, which faces critical limitations: human annotations are scarce and subjective, reward models are vulnerable to reward hacking, and self-evaluation methods suffer from prompt sensitivity and biases. In this work, we propose stable rank, an intrinsic, annotation-free quality signal derived from model representations. Stable rank measures the effective dimensionality of hidden states by computing the ratio of total variance to dominant-direction variance, capturing quality through how information distributes across representation dimensions. Empirically, stable rank achieves 84.04% accuracy on RewardBench and improves task accuracy by an average of 11.3 percentage points over greedy decoding via Best-of-N sampling. Leveraging this insight, we introduce Stable Rank Group Relative Policy Optimization (SR-GRPO), which uses stable rank as a reward signal for reinforcement learning. Without external supervision, SR-GRPO improves Qwen2.5-1.5B-Instruct by 10% on STEM and 19% on mathematical reasoning, outperforming both learned reward models and self-evaluation baselines. Our findings demonstrate that quality signals can be extracted from internal model geometry, offering a path toward scalable alignment without external supervision.

Community

Paper author Paper submitter
edited about 24 hours ago

🤯 We know RLHF relies heavily on external rewards. But what if the model already knows when it's reasoning well?

The paper SR-GRPO introduces a simple intrinsic metric, stable rank, which serves as an annotation-free quality signal for LLM output. It measures the richness and coherence of the hidden states.

It is like checking the complexity of the model's brain signal to see if it is reasoning clearly.

The result: The SR-GRPO alignment method significantly boosts performance on challenging mathematical reasoning and STEM tasks, eliminating the need for expensive human preference data.

Just look inside! 🧐

Paper: SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.02807 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.02807 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.02807 in a Space README.md to link it from this page.

Collections including this paper 1