🌟 Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation

NeurIPS 2025

Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation.
Xin Zhang, Ziruo Zhang, Jiawei Du, Zuozhu Liu, Joey Tianyi Zhou
Agency for Science, Technology, and Research (ASTAR), Singapore
National University of Singapore, Singapore
Zhejiang University, China

📖 Introduction

problem

Multimodal embedding distributions across various distillation methods : We extract image and text embeddings from a finetuned CLIP and project them into a shared representation space using DOSNES. Red triangles and blue circles denote image and text embeddings, respectively. Left: Embeddings from randomly sampled data in the original dataset exhibit a well-spread and modality-aligned distribution. Middle: The distilled dataset generated by a sota MDD method (LoRS) leads to Modality Collapse, where image and text embeddings are poorly aligned and concentrated in distinct regions. Right: Our method effectively mitigates modality collapse, yielding a distribution that better preserves cross-modal alignment and exhibits greater representational diversity.

⚙️ Installation

To get started, follow these instructions to set up the environment and install dependencies.

Clone this repository:

git clone https://github.com/zhangxin-xd/RepBlend.git
cd RepBlend

Install required packages:

conda create -n RepBlend python=3.10
conda activate RepBlend
pip install -r requirements.txt

🚀 Usage

Here’s how to use RepBlend for Multimodal Dataset Distillation:

First, download the pretrained weights and datasets and place them into their respective folders.

Pretrained Weights

The checkpoints for all experimental networks are available from their respective official repositories. For convenience, we have also provided them together 🤗 here. Once downloaded, put them in distill_utils/checkpoints/.

Experimental Datasets

The dataset hase been validated on various benchmarks, you can download from their respective links. Once downloaded, put them in distill_utils/data/.

datasets	links
Flickr30K	images, 🤗 annotations
COCO	images, 🤗 annotations
LLaVA-cc3m	images, 🤗 annotations

Generate Expert Trajectories

You can generate expert trajectories by running the scripts/buffer.sh, or alternatively, download our [pre-generated trajectories](🤗 https://huggingface.co/xinxin66/RepBlend) for faster reproduction.

bash scripts/buffer.sh

Distill Multimodal Dataset

You can distill multimodal datasets with RepBlend by running scripts/distill_coco_repblend.sh and scripts/distill_flickr_repblend.sh.

bash scripts/distill_coco_repblend.sh
bash scripts/distill_flickr_repblend.sh

📊 Results

Our experiments demonstrate the effectiveness of the proposed approach across various benchmarks.

For detailed experimental results and further analysis, please refer to the full paper.

📑 Citation

If you find this code useful in your research, please consider citing our work:

@inproceedings{RepBlend2025neurips,
    title={Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation},
    author={Zhang, Xin and Zhang, Ziruo, and Du, Jiawei and Liu, Zuozhu and Zhou, Joey Tianyi},
    booktitle={Adv. Neural Inf. Process. Syst. (NeurIPS)},
    year={2025}
}

🎉 Reference

Our code has referred to previous works:

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support