tchung1970 Claude commited on
Commit
bdb5e40
ยท
1 Parent(s): c09ff4c

Localize UI to Korean and add documentation

Browse files

- Translate all UI elements to Korean (labels, buttons, error messages)
- Add American Gothic painting as first example
- Add CLAUDE.md with codebase architecture and development guide

๐Ÿค– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (2) hide show
  1. CLAUDE.md +130 -0
  2. app.py +33 -32
CLAUDE.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Overview
6
+
7
+ This is a Hugging Face Gradio Space for camera angle control in image editing using the Qwen Image Edit 2509 model. The application allows users to control camera rotation, movement, vertical tilt, and lens settings through a web interface. It uses optimized 4-step inference with a fused LoRA model for multiple camera angles.
8
+
9
+ ## Architecture
10
+
11
+ ### Core Components
12
+
13
+ 1. **app.py** - Main Gradio application
14
+ - Loads the Qwen Image Edit pipeline with custom transformer and FlashAttention-3 processor
15
+ - Loads and fuses LoRA weights from `dx8152/Qwen-Edit-2509-Multiple-angles` with scale 1.25
16
+ - Provides camera control UI (rotation, forward movement, vertical tilt, wide-angle lens)
17
+ - Generates bilingual (Chinese/English) prompts from camera controls
18
+ - Integrates with external video generation service (`multimodalart/wan-2-2-first-last-frame`)
19
+ - Implements live inference with auto-reset on image upload
20
+
21
+ 2. **optimization.py** - Pipeline optimization module
22
+ - Uses `spaces.aoti_compile()` for ahead-of-time (AOT) compilation of transformer
23
+ - Defines dynamic shapes for image and text sequence lengths
24
+ - Configures TorchInductor with coordinate descent tuning and CUDA graphs
25
+ - Float8 quantization code is present but commented out (line 59)
26
+
27
+ 3. **qwenimage/** - Custom Qwen model implementations
28
+ - **pipeline_qwenimage_edit_plus.py** - Custom diffusion pipeline for Qwen Image Edit
29
+ - **transformer_qwenimage.py** - QwenImageTransformer2DModel with double-stream architecture
30
+ - **qwen_fa3_processor.py** - FlashAttention-3 attention processor for joint text-image attention
31
+ - **__init__.py** - Package initialization (minimal)
32
+
33
+ ### Key Technical Details
34
+
35
+ - **Model**: Uses `Qwen/Qwen-Image-Edit-2509` base with `linoyts/Qwen-Image-Edit-Rapid-AIO` transformer for fast 4-step inference
36
+ - **LoRA**: Camera angle control LoRA from `dx8152/Qwen-Edit-2509-Multiple-angles` (้•œๅคด่ฝฌๆข.safetensors) fused at scale 1.25
37
+ - **Attention**: FlashAttention-3 via HuggingFace `kernels` package (`kernels-community/vllm-flash-attn3`)
38
+ - **Optimization**: AOT compilation with dynamic shapes and CUDA graphs for ~1500s GPU duration
39
+ - **Device**: CUDA if available, falls back to CPU
40
+ - **Dtype**: bfloat16 throughout
41
+
42
+ ### Camera Prompt Building
43
+
44
+ The `build_camera_prompt` function (app.py:70-99) converts slider values to bilingual prompts:
45
+ - **Rotation**: ยฑ45ยฐ or ยฑ90ยฐ left/right
46
+ - **Forward movement**: 0 (none), 1-4 (move forward), 5-10 (close-up)
47
+ - **Vertical tilt**: -1 (bird's-eye), 0 (neutral), +1 (worm's-eye)
48
+ - **Wide-angle**: Boolean checkbox
49
+
50
+ Prompts are generated in both Chinese and English (e.g., "ๅฐ†้•œๅคดๅ‘ๅทฆๆ—‹่ฝฌ45ๅบฆ Rotate the camera 45 degrees to the left.").
51
+
52
+ ## Common Commands
53
+
54
+ ### Running the Application
55
+
56
+ ```bash
57
+ # Install dependencies
58
+ pip install -r requirements.txt
59
+
60
+ # Run the Gradio app (launches on default port 7860)
61
+ python app.py
62
+ ```
63
+
64
+ ### Development
65
+
66
+ The app is designed to run on Hugging Face Spaces with ZeroGPU support. The `@spaces.GPU` decorator allocates GPU resources for inference and compilation.
67
+
68
+ Key environment notes:
69
+ - Requires CUDA GPU for optimal performance
70
+ - FlashAttention-3 requires the `kernels` package with `kernels-community/vllm-flash-attn3`
71
+ - Pipeline warmup happens at startup with dummy 1024x1024 images (app.py:54)
72
+
73
+ ### Model Loading Flow
74
+
75
+ 1. Load base pipeline from `Qwen/Qwen-Image-Edit-2509`
76
+ 2. Swap transformer with rapid version from `linoyts/Qwen-Image-Edit-Rapid-AIO`
77
+ 3. Load LoRA weights for camera angles
78
+ 4. Fuse LoRA at scale 1.25 and unload weights
79
+ 5. Set custom transformer class and FlashAttention-3 processor
80
+ 6. Optimize pipeline with AOT compilation
81
+
82
+ ## Important Implementation Details
83
+
84
+ ### Image Dimensions
85
+
86
+ - Input images are automatically resized to maintain aspect ratio with max dimension 1024
87
+ - Dimensions are rounded to multiples of 8 (required by the VAE)
88
+ - See `update_dimensions_on_upload()` (app.py:191-210)
89
+
90
+ ### Live Inference
91
+
92
+ - Control sliders trigger inference on `.release()` events
93
+ - Wide-angle checkbox triggers on `.input()` event
94
+ - Reset flag prevents inference during control resets
95
+ - Previous output is stored for chaining edits
96
+
97
+ ### Video Generation
98
+
99
+ - Optional feature to create video transitions between input and output images
100
+ - Uses external Gradio client: `multimodalart/wan-2-2-first-last-frame`
101
+ - Requires `x-ip-token` header from incoming request
102
+ - Saves temporary files for API communication
103
+
104
+ ### Attention Processor Limitations
105
+
106
+ The FlashAttention-3 processor (qwen_fa3_processor.py) does NOT support:
107
+ - Arbitrary attention masks
108
+ - Causal masking
109
+ - Windowed attention or sink tokens (not plumbed through)
110
+
111
+ If you need these features, you must modify the processor or fall back to standard attention.
112
+
113
+ ## Dependencies
114
+
115
+ Core dependencies from requirements.txt:
116
+ - diffusers (git+https://github.com/huggingface/diffusers.git)
117
+ - transformers
118
+ - accelerate
119
+ - safetensors
120
+ - peft
121
+ - torchao==0.11.0
122
+ - kernels (for FlashAttention-3)
123
+
124
+ ## Gradio Space Configuration
125
+
126
+ From README.md:
127
+ - SDK: gradio 5.49.1
128
+ - App file: app.py
129
+ - License: Apache 2.0
130
+ - Inference: 4 steps (configurable via slider, default=4)
app.py CHANGED
@@ -133,7 +133,7 @@ def infer_camera_edit(
133
  pil_images.append(prev_output.convert("RGB"))
134
 
135
  if len(pil_images) == 0:
136
- raise gr.Error("Please upload an image first.")
137
 
138
  if prompt == "no camera movement":
139
  return image, seed, prompt
@@ -153,28 +153,28 @@ def infer_camera_edit(
153
  def create_video_between_images(input_image, output_image, prompt: str, request: gr.Request) -> str:
154
  """Create a video between the input and output images."""
155
  if input_image is None or output_image is None:
156
- raise gr.Error("Both input and output images are required to create a video.")
157
-
158
  try:
159
-
160
  with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
161
  input_image.save(tmp.name)
162
  input_image_path = tmp.name
163
-
164
  output_pil = Image.fromarray(output_image.astype('uint8'))
165
  with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
166
  output_pil.save(tmp.name)
167
  output_image_path = tmp.name
168
-
169
  video_path = _generate_video_segment(
170
- input_image_path,
171
- output_image_path,
172
- prompt if prompt else "Camera movement transformation",
173
  request
174
  )
175
  return video_path
176
  except Exception as e:
177
- raise gr.Error(f"Video generation failed: {e}")
178
 
179
 
180
  # --- UI ---
@@ -212,42 +212,42 @@ def update_dimensions_on_upload(image):
212
 
213
  with gr.Blocks(theme=gr.themes.Citrus(), css=css) as demo:
214
  with gr.Column(elem_id="col-container"):
215
- gr.Markdown("## ๐ŸŽฌ Qwen Image Edit โ€” Camera Angle Control")
216
  gr.Markdown("""
217
- Qwen Image Edit 2509 for Camera Control โœจ
218
- Using [dx8152's Qwen-Edit-2509-Multiple-angles LoRA](https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles) and [Phr00t/Qwen-Image-Edit-Rapid-AIO](https://huggingface.co/Phr00t/Qwen-Image-Edit-Rapid-AIO/tree/main) for 4-step inference ๐Ÿ’จ
219
  """
220
  )
221
 
222
  with gr.Row():
223
  with gr.Column():
224
- image = gr.Image(label="Input Image", type="pil")
225
  prev_output = gr.Image(value=None, visible=False)
226
  is_reset = gr.Checkbox(value=False, visible=False)
227
 
228
- with gr.Tab("Camera Controls"):
229
- rotate_deg = gr.Slider(label="Rotate Right-Left (degrees ยฐ)", minimum=-90, maximum=90, step=45, value=0)
230
- move_forward = gr.Slider(label="Move Forward โ†’ Close-Up", minimum=0, maximum=10, step=5, value=0)
231
- vertical_tilt = gr.Slider(label="Vertical Angle (Bird โ†” Worm)", minimum=-1, maximum=1, step=1, value=0)
232
- wideangle = gr.Checkbox(label="Wide-Angle Lens", value=False)
233
  with gr.Row():
234
- reset_btn = gr.Button("Reset")
235
- run_btn = gr.Button("Generate", variant="primary")
236
 
237
- with gr.Accordion("Advanced Settings", open=False):
238
- seed = gr.Slider(label="Seed", minimum=0, maximum=MAX_SEED, step=1, value=0)
239
- randomize_seed = gr.Checkbox(label="Randomize Seed", value=True)
240
- true_guidance_scale = gr.Slider(label="True Guidance Scale", minimum=1.0, maximum=10.0, step=0.1, value=1.0)
241
- num_inference_steps = gr.Slider(label="Inference Steps", minimum=1, maximum=40, step=1, value=4)
242
- height = gr.Slider(label="Height", minimum=256, maximum=2048, step=8, value=1024)
243
- width = gr.Slider(label="Width", minimum=256, maximum=2048, step=8, value=1024)
244
 
245
  with gr.Column():
246
- result = gr.Image(label="Output Image", interactive=False)
247
- prompt_preview = gr.Textbox(label="Processed Prompt", interactive=False)
248
- create_video_button = gr.Button("๐ŸŽฅ Create Video Between Images", variant="secondary", visible=False)
249
  with gr.Group(visible=False) as video_group:
250
- video_output = gr.Video(label="Generated Video", show_download_button=True, autoplay=True)
251
 
252
  inputs = [
253
  image,rotate_deg, move_forward,
@@ -292,6 +292,7 @@ with gr.Blocks(theme=gr.themes.Citrus(), css=css) as demo:
292
  # Examples
293
  gr.Examples(
294
  examples=[
 
295
  ["tool_of_the_sea.png", 90, 0, 0, False, 0, True, 1.0, 4, 568, 1024],
296
  ["monkey.jpg", -90, 0, 0, False, 0, True, 1.0, 4, 704, 1024],
297
  ["metropolis.jpg", 0, 0, -1, False, 0, True, 1.0, 4, 816, 1024],
 
133
  pil_images.append(prev_output.convert("RGB"))
134
 
135
  if len(pil_images) == 0:
136
+ raise gr.Error("๋จผ์ € ์ด๋ฏธ์ง€๋ฅผ ์—…๋กœ๋“œํ•ด์ฃผ์„ธ์š”.")
137
 
138
  if prompt == "no camera movement":
139
  return image, seed, prompt
 
153
  def create_video_between_images(input_image, output_image, prompt: str, request: gr.Request) -> str:
154
  """Create a video between the input and output images."""
155
  if input_image is None or output_image is None:
156
+ raise gr.Error("๋น„๋””์˜ค ์ƒ์„ฑ์„ ์œ„ํ•ด ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ ๋ชจ๋‘ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.")
157
+
158
  try:
159
+
160
  with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
161
  input_image.save(tmp.name)
162
  input_image_path = tmp.name
163
+
164
  output_pil = Image.fromarray(output_image.astype('uint8'))
165
  with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
166
  output_pil.save(tmp.name)
167
  output_image_path = tmp.name
168
+
169
  video_path = _generate_video_segment(
170
+ input_image_path,
171
+ output_image_path,
172
+ prompt if prompt else "์นด๋ฉ”๋ผ ์›€์ง์ž„ ๋ณ€ํ™˜",
173
  request
174
  )
175
  return video_path
176
  except Exception as e:
177
+ raise gr.Error(f"๋น„๋””์˜ค ์ƒ์„ฑ ์‹คํŒจ: {e}")
178
 
179
 
180
  # --- UI ---
 
212
 
213
  with gr.Blocks(theme=gr.themes.Citrus(), css=css) as demo:
214
  with gr.Column(elem_id="col-container"):
215
+ gr.Markdown("## ๐ŸŽฌ Qwen Image Edit โ€” ์นด๋ฉ”๋ผ ์•ต๊ธ€ ์ปจํŠธ๋กค")
216
  gr.Markdown("""
217
+ ์นด๋ฉ”๋ผ ์ปจํŠธ๋กค์„ ์œ„ํ•œ Qwen Image Edit 2509 โœจ
218
+ 4๋‹จ๊ณ„ ์ถ”๋ก ์„ ์œ„ํ•œ [dx8152's Qwen-Edit-2509-Multiple-angles LoRA](https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles)์™€ [Phr00t/Qwen-Image-Edit-Rapid-AIO](https://huggingface.co/Phr00t/Qwen-Image-Edit-Rapid-AIO/tree/main) ์‚ฌ์šฉ ๐Ÿ’จ
219
  """
220
  )
221
 
222
  with gr.Row():
223
  with gr.Column():
224
+ image = gr.Image(label="์ž…๋ ฅ ์ด๋ฏธ์ง€", type="pil")
225
  prev_output = gr.Image(value=None, visible=False)
226
  is_reset = gr.Checkbox(value=False, visible=False)
227
 
228
+ with gr.Tab("์นด๋ฉ”๋ผ ์ปจํŠธ๋กค"):
229
+ rotate_deg = gr.Slider(label="์ขŒ์šฐ ํšŒ์ „ (๊ฐ๋„ ยฐ)", minimum=-90, maximum=90, step=45, value=0)
230
+ move_forward = gr.Slider(label="์ „์ง„ โ†’ ํด๋กœ์ฆˆ์—…", minimum=0, maximum=10, step=5, value=0)
231
+ vertical_tilt = gr.Slider(label="์ˆ˜์ง ์•ต๊ธ€ (์กฐ๊ฐ โ†” ์•™๊ฐ)", minimum=-1, maximum=1, step=1, value=0)
232
+ wideangle = gr.Checkbox(label="๊ด‘๊ฐ ๋ Œ์ฆˆ", value=False)
233
  with gr.Row():
234
+ reset_btn = gr.Button("์ดˆ๊ธฐํ™”")
235
+ run_btn = gr.Button("์ƒ์„ฑ", variant="primary")
236
 
237
+ with gr.Accordion("๊ณ ๊ธ‰ ์„ค์ •", open=False):
238
+ seed = gr.Slider(label="์‹œ๋“œ", minimum=0, maximum=MAX_SEED, step=1, value=0)
239
+ randomize_seed = gr.Checkbox(label="๋žœ๋ค ์‹œ๋“œ", value=True)
240
+ true_guidance_scale = gr.Slider(label="๊ฐ€์ด๋˜์Šค ์Šค์ผ€์ผ", minimum=1.0, maximum=10.0, step=0.1, value=1.0)
241
+ num_inference_steps = gr.Slider(label="์ถ”๋ก  ๋‹จ๊ณ„", minimum=1, maximum=40, step=1, value=4)
242
+ height = gr.Slider(label="๋†’์ด", minimum=256, maximum=2048, step=8, value=1024)
243
+ width = gr.Slider(label="๋„ˆ๋น„", minimum=256, maximum=2048, step=8, value=1024)
244
 
245
  with gr.Column():
246
+ result = gr.Image(label="์ถœ๋ ฅ ์ด๋ฏธ์ง€", interactive=False)
247
+ prompt_preview = gr.Textbox(label="์ฒ˜๋ฆฌ๋œ ํ”„๋กฌํ”„ํŠธ", interactive=False)
248
+ create_video_button = gr.Button("๐ŸŽฅ ์ด๋ฏธ์ง€ ๊ฐ„ ๋น„๋””์˜ค ์ƒ์„ฑ", variant="secondary", visible=False)
249
  with gr.Group(visible=False) as video_group:
250
+ video_output = gr.Video(label="์ƒ์„ฑ๋œ ๋น„๋””์˜ค", show_download_button=True, autoplay=True)
251
 
252
  inputs = [
253
  image,rotate_deg, move_forward,
 
292
  # Examples
293
  gr.Examples(
294
  examples=[
295
+ ["https://upload.wikimedia.org/wikipedia/commons/thumb/c/cc/Grant_Wood_-_American_Gothic_-_Google_Art_Project.jpg/1697px-Grant_Wood_-_American_Gothic_-_Google_Art_Project.jpg", 0, 0, 0, False, 0, True, 1.0, 4, 1024, 768],
296
  ["tool_of_the_sea.png", 90, 0, 0, False, 0, True, 1.0, 4, 568, 1024],
297
  ["monkey.jpg", -90, 0, 0, False, 0, True, 1.0, 4, 704, 1024],
298
  ["metropolis.jpg", 0, 0, -1, False, 0, True, 1.0, 4, 816, 1024],