arxiv:2509.23873

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

Published on Sep 28

· Submitted by

jasmineWang on Oct 1

alibaba

Upvote

Authors:

Shaobo Wang ,

Jiaming Wang ,

Jiajun Zhang ,

Abstract

Quadrant-based Tuning (Q-Tuning) optimizes both sample and token pruning in supervised fine-tuning of large language models, achieving superior performance with reduced data.

AI-generated summary

As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.

View arXiv page View PDF Project page Add to collection

Community

Jessamine

Paper author Paper submitter 9 days ago

As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38% average improvement over the full-data SFT baseline using only 12.5% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.

Jessamine

Paper author Paper submitter 9 days ago

Method

Pipeline

Experimental Results
on Instruction Datasets
With 12.5% samples and 50% tokens, it outperforms the best pruning baselines by 3.3 points on LLaMA2-7B and 2.7 points on Mistral-7B. At larger budgets with 50% samples and 70% tokens, Q-Tuning further widens the margin, exceeding the strongest baselines by 2.4 and 3.7 points, respectively, while closely matching full-dataset performance.

on Reasoning Datasets Q-Tuning delivers consistent gains across reasoning benchmarks. On GSM8K, it largely improves LLaMA3-8B, Mistral-7B, and SmolLM-1.7B under the 25% × 70% budget, all surpassing their full-data counterparts. On the more challenging MATH benchmark, Q-Tuning also exceeds the strongest baselines on both LLaMA3-8B and Mistral-7B. Q_Tuning62_1920x2163