Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation
Abstract
A diffusion-based data augmentation strategy improves robustness in knowledge distillation by generating challenging samples, enhancing accuracy and spurious feature resilience.
Large foundation models trained on extensive datasets demonstrate strong zero-shot capabilities in various domains. To replicate their success when data and model size are constrained, knowledge distillation has become an established tool for transferring knowledge from foundation models to small student networks. However, the effectiveness of distillation is critically limited by the available training data. This work addresses the common practical issue of covariate shift in knowledge distillation, where spurious features appear during training but not at test time. We ask the question: when these spurious features are unknown, yet a robust teacher is available, is it possible for a student to also become robust to them? We address this problem by introducing a novel diffusion-based data augmentation strategy that generates images by maximizing the disagreement between the teacher and the student, effectively creating challenging samples that the student struggles with. Experiments demonstrate that our approach significantly improves worst group and mean group accuracy on CelebA and SpuCo Birds as well as the spurious mAUC on spurious ImageNet under covariate shift, outperforming state-of-the-art diffusion-based data augmentation baselines
Community
Knowledge distillation is widely regarded as an effective method for training compact models for resource-constrained environments when a strong teacher model is available. However, a crucial limitation is the reliance on the available training data. We ask the question: Can a student learn knowledge from the teacher that lies outside the training distribution? Specifically, we examine how covariate shift, such as the presence of spurious features during training, affects distillation.
To adress this problem, we introduce ConfiG, a confidence-guided approach to synthetic data augmentation. ConfiG leverages diffusion models to generate samples that expose mismatches between teacher and student predictions, encouraging the student to generalize beyond spurious correlations.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models (2025)
- ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models (2025)
- Dataset Distillation with Probabilistic Latent Features (2025)
- Learning from Reasoning Failures via Synthetic Data Generation (2025)
- PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models (2025)
- CAE-DFKD: Bridging the Transferability Gap in Data-Free Knowledge Distillation (2025)
- Provably Improving Generalization of Few-Shot Models with Synthetic Data (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper