Papers
arxiv:2510.10023

Skill-Targeted Adaptive Training

Published on Oct 11
· Submitted by Yinghui He on Oct 14
Authors:
,
,
,

Abstract

A new fine-tuning strategy, STAT, uses a teacher model's metacognition to identify and address skill gaps in a student model, leading to improved performance on both in-distribution and out-of-distribution benchmarks.

AI-generated summary

Language models often show little to no improvement (i.e., "saturation") when trained via vanilla supervised fine-tuning (SFT) on data similar to what they saw in their training set (e.g., MATH). We introduce a new fine-tuning strategy, STAT, to train such a student model by using the metacognition ability of a stronger large language model (LLM) as the teacher. The teacher uses the task dataset to create a list of skills needed for the task, and then labels each data point with its required skills (Didolkar et al., 2024). By monitoring the student's answers, the teacher creates a Missing-Skill-Profile for the student, tracking how often they failed to apply each skill in their responses. We use this idea to build a modified training set in one of two ways. In STAT-Sel, the teacher uses an existing set of training examples but adaptively reweights them according to the Missing-Skill-Profile. In STAT-Syn, the teacher synthesizes additional examples involving missing skills. Across extensive experiments on Llama and Qwen models, our methods yield improvements of up to 7.5% on MATH, whereas SFT provides only limited gains. Furthermore, STAT enhances performance on out-of-distribution benchmarks (e.g., AIME24/25, AMC23, etc.) by an average of 4.6%. Crucially, we find that STAT is complementary to RL via GRPO (Shao et al., 2024): after the model is improved using STAT to address skill gaps, GRPO continues to add further gains. We conclude that skill-targeted adaptive training should broadly improve current training pipelines. Our code is available at: https://github.com/princeton-pli/STAT.

Community

Paper submitter

We introduce a new training paradigm, Skill-Targeted Adaptive Training (STAT), which offers a principled path to overcoming SFT saturation and advancing generalization in LLMs.

1️⃣ Current Bottleneck
Supervised fine-tuning (SFT) often plateaus when models are trained on data similar to their pretraining distribution — a phenomenon of saturation seen on benchmarks like MATH.

2️⃣ Our Approach: STAT
We introduce Skill-Targeted Adaptive Training (STAT), a new fine-tuning paradigm that leverages the metacognition of a stronger LLM as a teacher. The teacher identifies the skills required for a task, tracks where the student model struggles, and builds a Missing-Skill Profile.
• STAT-Sel adaptively reweights existing examples based on missing skills.
• STAT-Syn synthesizes new examples targeting those gaps.

3️⃣ Results
Across Llama and Qwen models, STAT achieves:
• +7.5% improvement on MATH (vs. minimal SFT gains)
• +4.6% average boost on out-of-distribution benchmarks (AIME24/25, AMC23, etc.)
Moreover, STAT is complementary with reinforcement learning (e.g., GRPO), showing that addressing skill gaps before RL can further amplify downstream gains.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.10023 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.10023 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.10023 in a Space README.md to link it from this page.

Collections including this paper 2