|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- black-forest-labs/FLUX.1-dev |
|
library_name: transformers |
|
pipeline_tag: image-to-image |
|
tags: |
|
- image-generation |
|
- subject-personalization |
|
- style-transfer |
|
- Diffusion-Transformer |
|
--- |
|
|
|
<h3 align="center"> |
|
<img src="assets/uso.webp" alt="Logo" style="vertical-align: middle; width: 95px; height: auto;"> |
|
</br> |
|
Unified Style and Subject-Driven Generation via Disentangled and Reward Learning |
|
</h3> |
|
|
|
<p align="center"> |
|
<a href="https://github.com/bytedance/USO"><img alt="Build" src="https://img.shields.io/github/stars/bytedance/USO"></a> |
|
<a href="https://bytedance.github.io/USO/"><img alt="Build" src="https://img.shields.io/badge/Project%20Page-USO-blue"></a> |
|
<a href="https://arxiv.org/abs/2508.18966"><img alt="Build" src="https://img.shields.io/badge/Tech%20Report-USO-b31b1b.svg"></a> |
|
<a href="https://huggingface.co/bytedance-research/USO"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a> |
|
</p> |
|
|
|
 |
|
|
|
## 📖 Introduction |
|
Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they ultimately concern the disentanglement and re-composition of “content” and “style”, a long-standing theme in style-driven research. To this end, we present USO, a Unified framework for Style driven and subject-driven GeneratiOn. First, we construct a large-scale triplet dataset consisting of content images, style images, and their corresponding stylized content images. Second, we introduce a disentangled learning scheme that simultaneously aligns style features and disentangles content from style through two complementary objectives, style-alignment training and content–style disentanglement training. Third, we incorporate a style reward-learning paradigm to further enhance the model’s performance. |
|
|
|
## ⚡️ Quick Start |
|
|
|
### 🔧 Requirements and Installation |
|
|
|
Clone our [Github repo](https://github.com/bytedance/UNO) |
|
|
|
|
|
Install the requirements |
|
```bash |
|
## create a virtual environment with python >= 3.10 <= 3.12, like |
|
# python -m venv uso_env |
|
# source uso_env/bin/activate |
|
# then install |
|
pip install -r requirements.txt |
|
``` |
|
|
|
then download checkpoints in one of the three ways: |
|
1. Directly run the inference scripts, the checkpoints will be downloaded automatically by the `hf_hub_download` function in the code to your `$HF_HOME`(the default value is `~/.cache/huggingface`). |
|
2. use `huggingface-cli download <repo name>` to download `black-forest-labs/FLUX.1-dev`, `xlabs-ai/xflux_text_encoders`, `openai/clip-vit-large-patch14`, `TODO UNO hf model`, then run the inference scripts. |
|
3. use `huggingface-cli download <repo name> --local-dir <LOCAL_DIR>` to download all the checkpoints menthioned in 2. to the directories your want. Then set the environment variable `TODO`. Finally, run the inference scripts. |
|
|
|
### 🌟 Gradio Demo |
|
|
|
```bash |
|
python app.py |
|
``` |
|
|
|
## 📄 Disclaimer |
|
<p> |
|
We open-source this project for academic research. The vast majority of images |
|
used in this project are either generated or from open-source datasets. If you have any concerns, |
|
please contact us, and we will promptly remove any inappropriate content. |
|
Our project is released under the Apache 2.0 License. If you apply to other base models, |
|
please ensure that you comply with the original licensing terms. |
|
<br><br>This research aims to advance the field of generative AI. Users are free to |
|
create images using this tool, provided they comply with local laws and exercise |
|
responsible usage. The developers are not liable for any misuse of the tool by users.</p> |
|
|
|
## Citation |
|
We also appreciate it if you could give a star ⭐ to our [Github repository](https://github.com/bytedance/USO). Thanks a lot! |
|
|
|
If you find this project useful for your research, please consider citing our paper: |
|
```bibtex |
|
@article{wu2025uso, |
|
title={USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning}, |
|
author={Shaojin Wu and Mengqi Huang and Yufeng Cheng and Wenxu Wu and Jiahe Tian and Yiming Luo and Fei Ding and Qian He}, |
|
year={2025}, |
|
eprint={2508.18966}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
} |
|
``` |