arxiv:2506.02294

Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation

Published on Jun 2

· Submitted by

NPBP26 on Jun 5

Upvote

Authors:

Niclas Popp ,

Kevin Alexander Laube ,

Abstract

A diffusion-based data augmentation strategy improves robustness in knowledge distillation by generating challenging samples, enhancing accuracy and spurious feature resilience.

AI-generated summary

Large foundation models trained on extensive datasets demonstrate strong zero-shot capabilities in various domains. To replicate their success when data and model size are constrained, knowledge distillation has become an established tool for transferring knowledge from foundation models to small student networks. However, the effectiveness of distillation is critically limited by the available training data. This work addresses the common practical issue of covariate shift in knowledge distillation, where spurious features appear during training but not at test time. We ask the question: when these spurious features are unknown, yet a robust teacher is available, is it possible for a student to also become robust to them? We address this problem by introducing a novel diffusion-based data augmentation strategy that generates images by maximizing the disagreement between the teacher and the student, effectively creating challenging samples that the student struggles with. Experiments demonstrate that our approach significantly improves worst group and mean group accuracy on CelebA and SpuCo Birds as well as the spurious mAUC on spurious ImageNet under covariate shift, outperforming state-of-the-art diffusion-based data augmentation baselines

View arXiv page View PDF Add to collection

Community

NPBP26

Paper author Paper submitter 4 days ago

Knowledge distillation is widely regarded as an effective method for training compact models for resource-constrained environments when a strong teacher model is available. However, a crucial limitation is the reliance on the available training data. We ask the question: Can a student learn knowledge from the teacher that lies outside the training distribution? Specifically, we examine how covariate shift, such as the presence of spurious features during training, affects distillation.
To adress this problem, we introduce ConfiG, a confidence-guided approach to synthetic data augmentation. ConfiG leverages diffusion models to generate samples that expose mismatches between teacher and student predictions, encouraging the student to generalize beyond spurious correlations.

librarian-bot

3 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.02294 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.02294 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.02294 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.