Fashion-RAG

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation

MiniMax

International Joint Conference on Neural Networks (IJCNN) 2025
Oral Presentation

Fulvio Sanguigni1,2,*, Davide Morelli1,2,*, Marcella Cornia1, Rita Cucchiara1
1University of Modena, 2University of Pisa

Table of Contents
  1. About The Project
  2. Getting Started
  3. Data and Models
  4. Inference

About The Project

Fashion-RAG is a novel approach in the fashion domain, handling multimodal virtual dressing with a new, Retrieval Augmented Generation (RAG) pipeline for visual data. Our approach allows to retrieve garments aligned with a given textual description, and uses the retrieved information as a conditioning to generate the dressed person with Stable Diffusion (SD) as the generative model. We finetune the SD U-Net and an additional adapter module (Inversion Adapter) to handle for the retrieved information.

(back to top)

✨ Key Features

Our contribution can be summarized as follows:

  • πŸ” Retrieval Enhanced Generation for Visual Items. We present a unified framework capable of generating Virtual Dressing without the need of a user-defined garment image. Instead, our method succesfully leverages textual information and retrieves coherent garments to perform the task
  • πŸ‘—πŸ‘šπŸ§₯ Multiple Garments Conditioning. We introduce a plug-and-play adapter module that is flexible to the number of retrieved items, allowing to retrieve up to 3 garments per text prompt.
  • πŸ“Š Extensive experiments. Experiments on the Dress Code datasets demonstrate that Fahion-RAG outweights previous competitors.

Getting Started

Prerequisites

Clone the repository:

git clone Fashion-RAG.git

Installation

  1. We recommend installing the required packages using Python's native virtual environment (venv) as follows:
    python -m venv fashion-rag
    source fashion-rag/bin/activate
    
  2. Upgrade pip and install dependencies
    pip install --upgrade pip
    pip install -r requirements.txt
    
  3. Create a .env file like the following:
    export WANDB_API_KEY="ENTER YOUR WANDB TOKEN"
    export TORCH_HOME="ENTER YOUR TORCH PATH TO SAVE TORCH MODELS USED FOR METRICS COMPUTING"
    export HF_TOKEN="ENTER YOUR HUGGINGFACE TOKEN"
    export HF_CACHE_DIR="PATH WHERE YOU WANT TO SAVE THE HF MODELS (NEED CUSTOM VARIABLE TO ACCOUNT FOR OLD TRANSFORMERS PACKAGES, OTHERWISE USE HF_HOME)"
    

Data and Models

Download DressCode from the original repository Download the finetuned U-Net and Inversion Adapter from this source and put them into your experiment folder as follows:

<experiment folder>/
β”‚
β”œβ”€β”€ unet_120000.pth
β”œβ”€β”€ inversion_adapter_120000.pth

Copy the provided retrieval file paths folder dataset/dresscode-retrieval into your retrieve path or use them directly.

Inference

Let's generate our virtual dressing images using the finetuned TEMU-VTOFF model.

source fashion-rag/bin/activate

python evaluate_RAG.py \
    python evaluate_RAG.py \
    --pretrained_model_name_or_path stabilityai/stable-diffusion-2-inpainting \
    --output_dir "output directory path" \
    --finetuned_models_dir "U-Net and inversion adapter directory weights path" \
    --unet_name unet_120000.pth --inversion_adapter_name inversion_adapter_120000.pth \
    --dataset dresscode --dresscode_dataroot <data path>/DressCode \
    --category "garment category"\
    --test_order "specify paired or unpaired" --mask_type mask \
    --phase test --num_inference_steps 50 \
    --test_batch_size 8 --num_workers_test 8 --metrics_batch_size 8 --mixed_precision fp16 \
    --text_usage prompt_noun_chunks \
    --retrieve_path "dataset/dresscode-retrieval or your custom path" \
    --clip_retrieve_model ViT-L-14 --clip_retrieve_weights laion2b_s32b_b82k \
    --n_chunks "number of text chunks 1 or 3" \
    --n_retrieved "number of retrieved images 1 to 3" \
    --metrics fid_score kid_score retrieved_score clip_score lpips_score ssim_score \
    --attention_layers_fine_list '-1' '0 1 2 3'\
    --compute_metrics

The final output folder structure will look like this:

out_dir/pte_paired_nc_<number_of_chunks>_nr_<number_of_retrieved_images>/
β”‚
β”œβ”€β”€ lower_body/
β”œβ”€β”€ upper_body/
β”œβ”€β”€ dresses/
└── metrics_all.json
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support