naykun commited on
Commit
7d4ee71
·
1 Parent(s): 71f5363

update demo

Browse files
Files changed (2) hide show
  1. app.py +338 -4
  2. requirements.txt +7 -0
app.py CHANGED
@@ -1,7 +1,341 @@
1
  import gradio as gr
 
 
 
 
2
 
3
- def greet(name):
4
- return "Hello " + name + "!!"
5
 
6
- demo = gr.Interface(fn=greet, inputs="text", outputs="text")
7
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import gradio as gr
2
+ import numpy as np
3
+ import random
4
+ import torch
5
+ import spaces
6
 
7
+ from PIL import Image
8
+ from diffusers import QwenImageEditPlusPipeline
9
 
10
+ import os
11
+ import base64
12
+ import json
13
+
14
+ SYSTEM_PROMPT = '''
15
+ # Edit Instruction Rewriter
16
+ You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
17
+
18
+ Please strictly follow the rewriting rules below:
19
+
20
+ ## 1. General Principles
21
+ - Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
22
+ - If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
23
+ - Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
24
+ - All added objects or modifications must align with the logic and style of the scene in the input images.
25
+ - If multiple sub-images are to be generated, describe the content of each sub-image individually.
26
+
27
+ ## 2. Task-Type Handling Rules
28
+
29
+ ### 1. Add, Delete, Replace Tasks
30
+ - If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
31
+ - If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
32
+ > Original: "Add an animal"
33
+ > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
34
+ - Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
35
+ - For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
36
+
37
+ ### 2. Text Editing Tasks
38
+ - All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
39
+ - Both adding new text and replacing existing text are text replacement tasks, For example:
40
+ - Replace "xx" to "yy"
41
+ - Replace the mask / bounding box to "yy"
42
+ - Replace the visual object to "yy"
43
+ - Specify text position, color, and layout only if user has required.
44
+ - If font is specified, keep the original language of the font.
45
+
46
+ ### 3. Human Editing Tasks
47
+ - Make the smallest changes to the given user's prompt.
48
+ - If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
49
+ - **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject’s identity consistency.**
50
+ > Original: "Add eyebrows to the face"
51
+ > Rewritten: "Slightly thicken the person’s eyebrows with little change, look natural."
52
+
53
+ ### 4. Style Conversion or Enhancement Tasks
54
+ - If a style is specified, describe it concisely using key visual features. For example:
55
+ > Original: "Disco style"
56
+ > Rewritten: "1970s disco style: flashing lights, disco ball, mirrored walls, vibrant colors"
57
+ - For style reference, analyze the original image and extract key characteristics (color, composition, texture, lighting, artistic style, etc.), integrating them into the instruction.
58
+ - **Colorization tasks (including old photo restoration) must use the fixed template:**
59
+ "Restore and colorize the old photo."
60
+ - Clearly specify the object to be modified. For example:
61
+ > Original: Modify the subject in Picture 1 to match the style of Picture 2.
62
+ > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
63
+
64
+ ### 5. Material Replacement
65
+ - Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
66
+ - For text material replacement, use the fixed template:
67
+ "Change the material of text "xxxx" to laser style"
68
+
69
+ ### 6. Logo/Pattern Editing
70
+ - Material replacement should preserve the original shape and structure as much as possible. For example:
71
+ > Original: "Convert to sapphire material"
72
+ > Rewritten: "Convert the main subject in the image to sapphire material, preserving similar shape and structure"
73
+ - When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
74
+ > Original: "Migrate the logo in the image to a new scene"
75
+ > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
76
+
77
+ ### 7. Multi-Image Tasks
78
+ - Rewritten prompts must clearly point out which image’s element is being modified. For example:
79
+ > Original: "Replace the subject of picture 1 with the subject of picture 2"
80
+ > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2’s background unchanged"
81
+ - For stylization tasks, describe the reference image’s style in the rewritten prompt, while preserving the visual content of the source image.
82
+
83
+ ## 3. Rationale and Logic Check
84
+ - Resolve contradictory instructions: e.g., “Remove all trees but keep all trees” requires logical correction.
85
+ - Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
86
+
87
+ # Output Format Example
88
+ ```json
89
+ {
90
+ "Rewritten": "..."
91
+ }
92
+ '''
93
+
94
+ def polish_prompt(prompt, img):
95
+ prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {prompt}\n\nRewritten Prompt:"
96
+ success=False
97
+ while not success:
98
+ try:
99
+ result = api(prompt, [img])
100
+ # print(f"Result: {result}")
101
+ # print(f"Polished Prompt: {polished_prompt}")
102
+ if isinstance(result, str):
103
+ result = result.replace('```json','')
104
+ result = result.replace('```','')
105
+ result = json.loads(result)
106
+ else:
107
+ result = json.loads(result)
108
+
109
+ polished_prompt = result['Rewritten']
110
+ polished_prompt = polished_prompt.strip()
111
+ polished_prompt = polished_prompt.replace("\n", " ")
112
+ success = True
113
+ except Exception as e:
114
+ print(f"[Warning] Error during API call: {e}")
115
+ return polished_prompt
116
+
117
+
118
+ def encode_image(pil_image):
119
+ import io
120
+ buffered = io.BytesIO()
121
+ pil_image.save(buffered, format="PNG")
122
+ return base64.b64encode(buffered.getvalue()).decode("utf-8")
123
+
124
+
125
+
126
+
127
+ def api(prompt, img_list, model="qwen-vl-max-latest", kwargs={}):
128
+ import dashscope
129
+ api_key = os.environ.get('DASH_API_KEY')
130
+ if not api_key:
131
+ raise EnvironmentError("DASH_API_KEY is not set")
132
+ assert model in ["qwen-vl-max-latest"], f"Not implemented model {model}"
133
+ sys_promot = "you are a helpful assistant, you should provide useful answers to users."
134
+ messages = [
135
+ {"role": "system", "content": sys_promot},
136
+ {"role": "user", "content": []}]
137
+ for img in img_list:
138
+ messages[1]["content"].append(
139
+ {"image": f"data:image/png;base64,{encode_image(img)}"})
140
+ messages[1]["content"].append({"text": f"{prompt}"})
141
+
142
+ response_format = kwargs.get('response_format', None)
143
+
144
+ response = dashscope.MultiModalConversation.call(
145
+ api_key=api_key,
146
+ model=model, # For example, use qwen-plus here. You can change the model name as needed. Model list: https://help.aliyun.com/zh/model-studio/getting-started/models
147
+ messages=messages,
148
+ result_format='message',
149
+ response_format=response_format,
150
+ )
151
+
152
+ if response.status_code == 200:
153
+ return response.output.choices[0].message.content[0]['text']
154
+ else:
155
+ raise Exception(f'Failed to post: {response}')
156
+
157
+ # --- Model Loading ---
158
+ dtype = torch.bfloat16
159
+ device = "cuda" if torch.cuda.is_available() else "cpu"
160
+
161
+ # Load the model pipeline
162
+ pipe = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2509", torch_dtype=dtype).to(device)
163
+
164
+ # --- UI Constants and Helpers ---
165
+ MAX_SEED = np.iinfo(np.int32).max
166
+
167
+ # --- Main Inference Function (with hardcoded negative prompt) ---
168
+ @spaces.GPU(duration=300)
169
+ def infer(
170
+ images,
171
+ prompt,
172
+ seed=42,
173
+ randomize_seed=False,
174
+ true_guidance_scale=1.0,
175
+ num_inference_steps=50,
176
+ height=None,
177
+ width=None,
178
+ rewrite_prompt=True,
179
+ num_images_per_prompt=1,
180
+ progress=gr.Progress(track_tqdm=True),
181
+ ):
182
+ """
183
+ Generates an image using the local Qwen-Image diffusers pipeline.
184
+ """
185
+ # Hardcode the negative prompt as requested
186
+ negative_prompt = " "
187
+
188
+ if randomize_seed:
189
+ seed = random.randint(0, MAX_SEED)
190
+
191
+ # Set up the generator for reproducibility
192
+ generator = torch.Generator(device=device).manual_seed(seed)
193
+
194
+ # Load input images into PIL Images
195
+ pil_images = []
196
+ if images is not None:
197
+ for item in images:
198
+ try:
199
+ if isinstance(item[0], Image.Image):
200
+ pil_images.append(item[0].convert("RGB"))
201
+ elif isinstance(item[0], str):
202
+ pil_images.append(Image.open(item[0]).convert("RGB"))
203
+ elif hasattr(item, "name"):
204
+ pil_images.append(Image.open(item.name).convert("RGB"))
205
+ except Exception:
206
+ continue
207
+
208
+ if height==256 and width==256:
209
+ height, width = None, None
210
+ print(f"Calling pipeline with prompt: '{prompt}'")
211
+ print(f"Negative Prompt: '{negative_prompt}'")
212
+ print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
213
+ if rewrite_prompt and len(pil_images) > 0:
214
+ prompt = polish_prompt(prompt, pil_images[0])
215
+ print(f"Rewritten Prompt: {prompt}")
216
+
217
+
218
+ # Generate the image
219
+ image = pipe(
220
+ image=pil_images if len(pil_images) > 0 else None,
221
+ prompt=prompt,
222
+ height=height,
223
+ width=width,
224
+ negative_prompt=negative_prompt,
225
+ num_inference_steps=num_inference_steps,
226
+ generator=generator,
227
+ true_cfg_scale=true_guidance_scale,
228
+ num_images_per_prompt=num_images_per_prompt,
229
+ ).images
230
+
231
+ return image, seed
232
+
233
+ # --- Examples and UI Layout ---
234
+ examples = []
235
+
236
+ css = """
237
+ #col-container {
238
+ margin: 0 auto;
239
+ max-width: 1024px;
240
+ }
241
+ #edit_text{margin-top: -62px !important}
242
+ """
243
+
244
+ with gr.Blocks(css=css) as demo:
245
+ with gr.Column(elem_id="col-container"):
246
+ gr.HTML('<img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" alt="Qwen-Image Logo" width="400" style="display: block; margin: 0 auto;">')
247
+ gr.Markdown("[Learn more](https://github.com/QwenLM/Qwen-Image) about the Qwen-Image series. Try on [Qwen Chat](https://chat.qwen.ai/), or [download model](https://huggingface.co/Qwen/Qwen-Image-Edit) to run locally with ComfyUI or diffusers.")
248
+ with gr.Row():
249
+ with gr.Column():
250
+ input_images = gr.Gallery(label="Input Images", show_label=False, type="pil", interactive=True)
251
+
252
+ # result = gr.Image(label="Result", show_label=False, type="pil")
253
+ result = gr.Gallery(label="Result", show_label=False, type="pil")
254
+ with gr.Row():
255
+ prompt = gr.Text(
256
+ label="Prompt",
257
+ show_label=False,
258
+ placeholder="describe the edit instruction",
259
+ container=False,
260
+ )
261
+ run_button = gr.Button("Edit!", variant="primary")
262
+
263
+ with gr.Accordion("Advanced Settings", open=False):
264
+ # Negative prompt UI element is removed here
265
+
266
+ seed = gr.Slider(
267
+ label="Seed",
268
+ minimum=0,
269
+ maximum=MAX_SEED,
270
+ step=1,
271
+ value=0,
272
+ )
273
+
274
+ randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
275
+
276
+ with gr.Row():
277
+
278
+ true_guidance_scale = gr.Slider(
279
+ label="True guidance scale",
280
+ minimum=1.0,
281
+ maximum=10.0,
282
+ step=0.1,
283
+ value=4.0
284
+ )
285
+
286
+ num_inference_steps = gr.Slider(
287
+ label="Number of inference steps",
288
+ minimum=1,
289
+ maximum=50,
290
+ step=1,
291
+ value=40,
292
+ )
293
+
294
+ height = gr.Slider(
295
+ label="Height",
296
+ minimum=256,
297
+ maximum=2048,
298
+ step=8,
299
+ value=None,
300
+ )
301
+
302
+ width = gr.Slider(
303
+ label="Width",
304
+ minimum=256,
305
+ maximum=2048,
306
+ step=8,
307
+ value=None,
308
+ )
309
+
310
+ num_images_per_prompt = gr.Slider(
311
+ label="Number of images per prompt",
312
+ minimum=1,
313
+ maximum=4,
314
+ step=1,
315
+ value=1,
316
+ )
317
+
318
+ rewrite_prompt = gr.Checkbox(label="Rewrite prompt", value=False)
319
+
320
+ # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
321
+
322
+ gr.on(
323
+ triggers=[run_button.click, prompt.submit],
324
+ fn=infer,
325
+ inputs=[
326
+ input_images,
327
+ prompt,
328
+ seed,
329
+ randomize_seed,
330
+ true_guidance_scale,
331
+ num_inference_steps,
332
+ height,
333
+ width,
334
+ rewrite_prompt,
335
+ num_images_per_prompt,
336
+ ],
337
+ outputs=[result, seed],
338
+ )
339
+
340
+ if __name__ == "__main__":
341
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ git+https://github.com/huggingface/diffusers.git
2
+ transformers
3
+ accelerate
4
+ safetensors
5
+ sentencepiece
6
+ dashscope
7
+ kernels