Diffusers
Safetensors
English
XionghuiWang commited on
Commit
c7e514a
·
verified ·
1 Parent(s): ef7c2e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +203 -3
README.md CHANGED
@@ -1,3 +1,203 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ base_model:
4
+ - black-forest-labs/FLUX.1-Fill-dev
5
+ language:
6
+ - en
7
+ ---
8
+ # OneReward
9
+
10
+ Official implementation of **[OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning](https://arxiv.org/abs/xxxx)**
11
+
12
+ [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2508.21066) [![model](https://img.shields.io/badge/🤗-Model-yellow)](https://huggingface.co/bytedance-research/OneReward) <br>
13
+
14
+ <p align="center">
15
+ <img src="assets/show.jpg" alt="assert" width="800">
16
+ </p>
17
+
18
+
19
+ ## Introduction
20
+ We propose **OneReward**, a novel RLHF methodology for the visual domain by employing Qwen2.5-VL as a generative reward model to enhance multitask reinforcement learning, significantly improving the policy model’s generation ability across multiple subtask. Building on OneReward, we develop **Seedream 3.0 Fill**, a unified SOTA image editing model capable of effec-tively handling diverse tasks including image fill, image extend, object removal, and text rendering. It surpasses several leading commercial and open-source systems, including Ideogram, Adobe Photoshop, and FLUX Fill [Pro]. Finally, based on FLUX Fill [dev], we are thrilled to release **FLUX.1-Fill-dev-OneReward**, which outperforms closed-source FLUX Fill [Pro] in inpainting and outpainting tasks, serving as a powerful new baseline for future research in unified image editing.
21
+
22
+ <table>
23
+ <tr>
24
+ <td>
25
+ <img src="assets/radius_inpaint.png" width="512">
26
+ <p align="center"><b>Image Fill</b></p>
27
+ </td>
28
+ <td>
29
+ <img src="assets/radius_outpaint_w.png" width="512">
30
+ <p align="center"><b>Image Extend with Prompt</b></p>
31
+ </td>
32
+ </tr>
33
+ <tr>
34
+ <td>
35
+ <img src="assets/radius_outpaint_wo.png" width="512">
36
+ <p align="center"><b>Image Extend without Prompt</b></p>
37
+ </td>
38
+ <td>
39
+ <img src="assets/radius_eraser.png" width="512">
40
+ <p align="center"><b>Object Removal</b></p>
41
+ </td>
42
+ </tr>
43
+ <caption align="bottom" style="font-weight: bold; margin-top: 10px;">Seedream 3.0 Fill Performance Overview</caption>
44
+ </table>
45
+
46
+ ## Quick Start
47
+
48
+ 1. Make sure your transformers>=4.51.3 (Supporting Qwen2.5-VL)
49
+
50
+ 2. Install the latest version of diffusers
51
+ ```
52
+ pip install -U diffusers
53
+ ```
54
+
55
+ The following contains a code snippet illustrating how to use the model to generate images based on text prompts and input mask, support inpaint(image-fill), outpaint(image-extend), eraser(object-removal). As the model is fully trained, FluxFillCFGPipeline with cfg is needed.
56
+
57
+ ```python
58
+ import torch
59
+ from src.pipeline_flux_fill_with_cfg import FluxFillCFGPipeline
60
+ from diffusers.utils import load_image
61
+ from diffusers import FluxTransformer2DModel
62
+
63
+ transformer_onereward = FluxTransformer2DModel.from_pretrained(
64
+ "bytedance-research/OneReward",
65
+ subfolder="flux.1-fill-dev-OneReward-transformer",
66
+ torch_dtype=torch.bfloat16
67
+ )
68
+
69
+ pipe = FluxFillCFGPipeline.from_pretrained(
70
+ "black-forest-labs/FLUX.1-Fill-dev",
71
+ transformer=transformer_onereward,
72
+ torch_dtype=torch.bfloat16).to("cuda")
73
+
74
+ # Image Fill
75
+ image = load_image('assets/image.png')
76
+ mask = load_image('assets/mask_fill.png')
77
+ image = pipe(
78
+ prompt='the words "ByteDance", and in the next line "OneReward"',
79
+ negative_prompt="nsfw",
80
+ image=image,
81
+ mask_image=mask,
82
+ height=image.height,
83
+ width=image.width,
84
+ guidance_scale=1.0,
85
+ true_cfg=4.0,
86
+ num_inference_steps=50,
87
+ generator=torch.Generator("cpu").manual_seed(0)
88
+ ).images[0]
89
+ image.save(f"image_fill.jpg")
90
+ ```
91
+
92
+ <table>
93
+ <tr>
94
+ <td>
95
+ <img src="assets/image.png" width="512">
96
+ <p align="center"><b>input</b></p>
97
+ </td>
98
+ <td>
99
+ <img src="assets/result_fill.jpg" width="512">
100
+ <p align="center"><b>output</b></p>
101
+ </td>
102
+ </tr>
103
+ </table>
104
+
105
+ ## Model
106
+ ### FLUX.1-Fill-dev[OneReward], trained with Alg.1 in paper
107
+ ```python
108
+ transformer_onereward = FluxTransformer2DModel.from_pretrained(
109
+ "bytedance-research/OneReward",
110
+ subfolder="flux.1-fill-dev-OneReward-transformer",
111
+ torch_dtype=torch.bfloat16
112
+ )
113
+
114
+ pipe = FluxFillCFGPipeline.from_pretrained(
115
+ "black-forest-labs/FLUX.1-Fill-dev",
116
+ transformer=transformer_onereward,
117
+ torch_dtype=torch.bfloat16).to("cuda")
118
+ ```
119
+
120
+ ### FLUX.1-Fill-dev[OneRewardDynamic], trained with Alg.2 in paper
121
+ ```python
122
+ transformer_onereward_dynamic = FluxTransformer2DModel.from_pretrained(
123
+ "bytedance-research/OneReward",
124
+ subfolder="flux.1-fill-dev-OneRewardDynamic-transformer",
125
+ torch_dtype=torch.bfloat16
126
+ )
127
+
128
+ pipe = FluxFillCFGPipeline.from_pretrained(
129
+ "black-forest-labs/FLUX.1-Fill-dev",
130
+ transformer=transformer_onereward_dynamic,
131
+ torch_dtype=torch.bfloat16).to("cuda")
132
+ ```
133
+
134
+ ### Object Removal
135
+ ```python
136
+ image = load_image('assets/image.png')
137
+ mask = load_image('assets/mask_remove.png')
138
+ image = pipe(
139
+ prompt='remove', # using fix prompt in object removal
140
+ negative_prompt="nsfw",
141
+ image=image,
142
+ mask_image=mask,
143
+ height=image.height,
144
+ width=image.width,
145
+ guidance_scale=1.0,
146
+ true_cfg=4.0,
147
+ num_inference_steps=50,
148
+ generator=torch.Generator("cpu").manual_seed(0)
149
+ ).images[0]
150
+ image.save(f"object_removal.jpg")
151
+ ```
152
+
153
+ ### Image Extend with prompt
154
+ ```python
155
+ image = load_image('assets/image2.png')
156
+ mask = load_image('assets/mask_extend.png')
157
+ image = pipe(
158
+ prompt='Deep in the forest, surronded by colorful flowers',
159
+ negative_prompt="nsfw",
160
+ image=image,
161
+ mask_image=mask,
162
+ height=image.height,
163
+ width=image.width,
164
+ guidance_scale=1.0,
165
+ true_cfg=4.0,
166
+ num_inference_steps=50,
167
+ generator=torch.Generator("cpu").manual_seed(0)
168
+ ).images[0]
169
+ image.save(f"image_extend_w_prompt.jpg")
170
+ ```
171
+
172
+ ### Image Extend without prompt
173
+ ```python
174
+ image = load_image('assets/image2.png')
175
+ mask = load_image('assets/mask_extend.png')
176
+ image = pipe(
177
+ prompt='high-definition, perfect composition', # using fix prompt in image extend wo prompt
178
+ negative_prompt="nsfw",
179
+ image=image,
180
+ mask_image=mask,
181
+ height=image.height,
182
+ width=image.width,
183
+ guidance_scale=1.0,
184
+ true_cfg=4.0,
185
+ num_inference_steps=50,
186
+ generator=torch.Generator("cpu").manual_seed(0)
187
+ ).images[0]
188
+ image.save(f"image_extend_wo_prompt.jpg")
189
+ ```
190
+
191
+
192
+ ## License Agreement
193
+ Code is licensed under Apache 2.0. Model is licensed under CC BY NC 4.0.
194
+
195
+ ## Citation
196
+ ```
197
+ @article{gong2025onereward,
198
+ title={OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning},
199
+ author={Gong, Yuan and Wang, Xionghui and Wu, Jie and Wang, Shiyin and Wang, Yitong and Wu, Xinglong},
200
+ journal={arXiv preprint arXiv:2508.21066},
201
+ year={2025}
202
+ }
203
+ ```