File size: 4,015 Bytes
0a11f58
eef8a74
b147eb1
eef8a74
 
 
0a11f58
f99f5eb
b147eb1
f99f5eb
b147eb1
f99f5eb
b147eb1
f99f5eb
b147eb1
 
 
26c72b7
b147eb1
f99f5eb
b147eb1
f99f5eb
b147eb1
f99f5eb
b147eb1
f99f5eb
b147eb1
07ad387
b147eb1
f99f5eb
b147eb1
 
 
 
f99f5eb
 
b147eb1
f99f5eb
b147eb1
a39e446
b147eb1
 
 
 
a39e446
 
b147eb1
a39e446
b147eb1
a39e446
b147eb1
 
 
 
 
 
 
 
a39e446
b147eb1
 
 
 
a39e446
 
b147eb1
a39e446
b147eb1
a39e446
b147eb1
a39e446
b147eb1
 
 
 
 
 
 
a39e446
 
b147eb1
f99f5eb
b147eb1
f99f5eb
b147eb1
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: openrail++
library_name: diffusers
tags:
- text-to-image
- stable-diffusion
---

# Conceptrol: Concept Control of Zero-shot Personalized Image Generation

## Model Card

This model implements Conceptrol, a training-free method that boosts zero-shot personalized image generation across Stable Diffusion, SDXL, and FLUX. It works without additional training, data, or models.

<p align="center">
  <img src="demo/teaser.png">
</p>

[Conceptrol: Concept Control of Zero-shot Personalized Image Generation](https://huggingface.co/papers/2503.06568)

**Abstract:**

Personalized image generation with text-to-image diffusion models generates unseen images based on reference image content. Zero-shot adapter methods such as IP-Adapter and OminiControl are especially interesting because they do not require test-time fine-tuning. However, they struggle to balance preserving personalized content and adherence to the text prompt.  We identify a critical design flaw resulting in this performance gap: current adapters inadequately integrate personalization images with the textual descriptions. The generated images, therefore, replicate the personalized content rather than adhere to the text prompt instructions. Yet the base text-to-image has strong conceptual understanding capabilities that can be leveraged.

We propose Conceptrol, a simple yet effective framework that enhances zero-shot adapters without adding computational overhead. Conceptrol constrains the attention of visual specification with a textual concept mask that improves subject-driven generation capabilities. It achieves as much as 89% improvement on personalization benchmarks over the vanilla IP-Adapter and can even outperform fine-tuning approaches such as Dreambooth LoRA.

## Quick Start

#### 1. Environment Setup

``` bash
conda create -n conceptrol python=3.10
conda activate conceptrol
pip install -r requirements.txt
```

#### 2. Go to `demo_sd.ipynb` / `demo_sdxl.ipynb` / `demo_flux.py` for fun!

## Local Setup using Gradio

#### 1. Start Gradio Interface
``` bash
pip install gradio
gradio gradio_src/app.py
```

#### 2. Use the GUI!

## Supporting Models

| Model Name            |  Link                                             |
|-----------------------|-------------------------------------------------------------|
| Stable Diffusion 1.5  | [stable-diffusion-v1-5/stable-diffusion-v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)   |
| Realistic Vision V5.1 | [SG161222/Realistic_Vision_V5.1_noVAE](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE) |
| Stable Diffusion XL-1024   | [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) |
| Animagine XL v4.0 |   [cagliostrolab/animagine-xl-4.0](https://huggingface.co/cagliostrolab/animagine-xl-4.0)|
| Realistic Vision XL V5.0 | [SG161222/RealVisXL_V5.0](https://huggingface.co/SG161222/RealVisXL_V5.0) |
| FLUX-schnell | [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) |

| Adapter Name            |  Link                                             |
|-----------------------|-------------------------------------------------------------|
| IP-Adapter  | [h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter/tree/main)  |
| OminiControl | [Yuanshi/OminiControl](https://huggingface.co/Yuanshi/OminiControl) |


## Source Code

https://github.com/QY-H00/Conceptrol

## Citation

``` bibtex
@article{he2025conceptrol,
  title={Conceptrol: Concept Control of Zero-shot Personalized Image Generation},
  author={Qiyuan He and Angela Yao},
  journal={arXiv preprint arXiv:2503.06568},
  year={2025}
}
```

## Acknowledgement

We thank the following repositories for their great work: 

[diffusers](https://github.com/huggingface/diffusers), 
[transformers](https://github.com/huggingface/transformers), 
[IP-Adapter](https://github.com/tencent-ailab/IP-Adapter), 
[OminiControl](https://github.com/Yuanshi9815/OminiControl)