Spaces:
Running
on
Zero
Running
on
Zero
Commit
ยท
bdb5e40
1
Parent(s):
c09ff4c
Localize UI to Korean and add documentation
Browse files- Translate all UI elements to Korean (labels, buttons, error messages)
- Add American Gothic painting as first example
- Add CLAUDE.md with codebase architecture and development guide
๐ค Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
CLAUDE.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
|
| 7 |
+
This is a Hugging Face Gradio Space for camera angle control in image editing using the Qwen Image Edit 2509 model. The application allows users to control camera rotation, movement, vertical tilt, and lens settings through a web interface. It uses optimized 4-step inference with a fused LoRA model for multiple camera angles.
|
| 8 |
+
|
| 9 |
+
## Architecture
|
| 10 |
+
|
| 11 |
+
### Core Components
|
| 12 |
+
|
| 13 |
+
1. **app.py** - Main Gradio application
|
| 14 |
+
- Loads the Qwen Image Edit pipeline with custom transformer and FlashAttention-3 processor
|
| 15 |
+
- Loads and fuses LoRA weights from `dx8152/Qwen-Edit-2509-Multiple-angles` with scale 1.25
|
| 16 |
+
- Provides camera control UI (rotation, forward movement, vertical tilt, wide-angle lens)
|
| 17 |
+
- Generates bilingual (Chinese/English) prompts from camera controls
|
| 18 |
+
- Integrates with external video generation service (`multimodalart/wan-2-2-first-last-frame`)
|
| 19 |
+
- Implements live inference with auto-reset on image upload
|
| 20 |
+
|
| 21 |
+
2. **optimization.py** - Pipeline optimization module
|
| 22 |
+
- Uses `spaces.aoti_compile()` for ahead-of-time (AOT) compilation of transformer
|
| 23 |
+
- Defines dynamic shapes for image and text sequence lengths
|
| 24 |
+
- Configures TorchInductor with coordinate descent tuning and CUDA graphs
|
| 25 |
+
- Float8 quantization code is present but commented out (line 59)
|
| 26 |
+
|
| 27 |
+
3. **qwenimage/** - Custom Qwen model implementations
|
| 28 |
+
- **pipeline_qwenimage_edit_plus.py** - Custom diffusion pipeline for Qwen Image Edit
|
| 29 |
+
- **transformer_qwenimage.py** - QwenImageTransformer2DModel with double-stream architecture
|
| 30 |
+
- **qwen_fa3_processor.py** - FlashAttention-3 attention processor for joint text-image attention
|
| 31 |
+
- **__init__.py** - Package initialization (minimal)
|
| 32 |
+
|
| 33 |
+
### Key Technical Details
|
| 34 |
+
|
| 35 |
+
- **Model**: Uses `Qwen/Qwen-Image-Edit-2509` base with `linoyts/Qwen-Image-Edit-Rapid-AIO` transformer for fast 4-step inference
|
| 36 |
+
- **LoRA**: Camera angle control LoRA from `dx8152/Qwen-Edit-2509-Multiple-angles` (้ๅคด่ฝฌๆข.safetensors) fused at scale 1.25
|
| 37 |
+
- **Attention**: FlashAttention-3 via HuggingFace `kernels` package (`kernels-community/vllm-flash-attn3`)
|
| 38 |
+
- **Optimization**: AOT compilation with dynamic shapes and CUDA graphs for ~1500s GPU duration
|
| 39 |
+
- **Device**: CUDA if available, falls back to CPU
|
| 40 |
+
- **Dtype**: bfloat16 throughout
|
| 41 |
+
|
| 42 |
+
### Camera Prompt Building
|
| 43 |
+
|
| 44 |
+
The `build_camera_prompt` function (app.py:70-99) converts slider values to bilingual prompts:
|
| 45 |
+
- **Rotation**: ยฑ45ยฐ or ยฑ90ยฐ left/right
|
| 46 |
+
- **Forward movement**: 0 (none), 1-4 (move forward), 5-10 (close-up)
|
| 47 |
+
- **Vertical tilt**: -1 (bird's-eye), 0 (neutral), +1 (worm's-eye)
|
| 48 |
+
- **Wide-angle**: Boolean checkbox
|
| 49 |
+
|
| 50 |
+
Prompts are generated in both Chinese and English (e.g., "ๅฐ้ๅคดๅๅทฆๆ่ฝฌ45ๅบฆ Rotate the camera 45 degrees to the left.").
|
| 51 |
+
|
| 52 |
+
## Common Commands
|
| 53 |
+
|
| 54 |
+
### Running the Application
|
| 55 |
+
|
| 56 |
+
```bash
|
| 57 |
+
# Install dependencies
|
| 58 |
+
pip install -r requirements.txt
|
| 59 |
+
|
| 60 |
+
# Run the Gradio app (launches on default port 7860)
|
| 61 |
+
python app.py
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
### Development
|
| 65 |
+
|
| 66 |
+
The app is designed to run on Hugging Face Spaces with ZeroGPU support. The `@spaces.GPU` decorator allocates GPU resources for inference and compilation.
|
| 67 |
+
|
| 68 |
+
Key environment notes:
|
| 69 |
+
- Requires CUDA GPU for optimal performance
|
| 70 |
+
- FlashAttention-3 requires the `kernels` package with `kernels-community/vllm-flash-attn3`
|
| 71 |
+
- Pipeline warmup happens at startup with dummy 1024x1024 images (app.py:54)
|
| 72 |
+
|
| 73 |
+
### Model Loading Flow
|
| 74 |
+
|
| 75 |
+
1. Load base pipeline from `Qwen/Qwen-Image-Edit-2509`
|
| 76 |
+
2. Swap transformer with rapid version from `linoyts/Qwen-Image-Edit-Rapid-AIO`
|
| 77 |
+
3. Load LoRA weights for camera angles
|
| 78 |
+
4. Fuse LoRA at scale 1.25 and unload weights
|
| 79 |
+
5. Set custom transformer class and FlashAttention-3 processor
|
| 80 |
+
6. Optimize pipeline with AOT compilation
|
| 81 |
+
|
| 82 |
+
## Important Implementation Details
|
| 83 |
+
|
| 84 |
+
### Image Dimensions
|
| 85 |
+
|
| 86 |
+
- Input images are automatically resized to maintain aspect ratio with max dimension 1024
|
| 87 |
+
- Dimensions are rounded to multiples of 8 (required by the VAE)
|
| 88 |
+
- See `update_dimensions_on_upload()` (app.py:191-210)
|
| 89 |
+
|
| 90 |
+
### Live Inference
|
| 91 |
+
|
| 92 |
+
- Control sliders trigger inference on `.release()` events
|
| 93 |
+
- Wide-angle checkbox triggers on `.input()` event
|
| 94 |
+
- Reset flag prevents inference during control resets
|
| 95 |
+
- Previous output is stored for chaining edits
|
| 96 |
+
|
| 97 |
+
### Video Generation
|
| 98 |
+
|
| 99 |
+
- Optional feature to create video transitions between input and output images
|
| 100 |
+
- Uses external Gradio client: `multimodalart/wan-2-2-first-last-frame`
|
| 101 |
+
- Requires `x-ip-token` header from incoming request
|
| 102 |
+
- Saves temporary files for API communication
|
| 103 |
+
|
| 104 |
+
### Attention Processor Limitations
|
| 105 |
+
|
| 106 |
+
The FlashAttention-3 processor (qwen_fa3_processor.py) does NOT support:
|
| 107 |
+
- Arbitrary attention masks
|
| 108 |
+
- Causal masking
|
| 109 |
+
- Windowed attention or sink tokens (not plumbed through)
|
| 110 |
+
|
| 111 |
+
If you need these features, you must modify the processor or fall back to standard attention.
|
| 112 |
+
|
| 113 |
+
## Dependencies
|
| 114 |
+
|
| 115 |
+
Core dependencies from requirements.txt:
|
| 116 |
+
- diffusers (git+https://github.com/huggingface/diffusers.git)
|
| 117 |
+
- transformers
|
| 118 |
+
- accelerate
|
| 119 |
+
- safetensors
|
| 120 |
+
- peft
|
| 121 |
+
- torchao==0.11.0
|
| 122 |
+
- kernels (for FlashAttention-3)
|
| 123 |
+
|
| 124 |
+
## Gradio Space Configuration
|
| 125 |
+
|
| 126 |
+
From README.md:
|
| 127 |
+
- SDK: gradio 5.49.1
|
| 128 |
+
- App file: app.py
|
| 129 |
+
- License: Apache 2.0
|
| 130 |
+
- Inference: 4 steps (configurable via slider, default=4)
|
app.py
CHANGED
|
@@ -133,7 +133,7 @@ def infer_camera_edit(
|
|
| 133 |
pil_images.append(prev_output.convert("RGB"))
|
| 134 |
|
| 135 |
if len(pil_images) == 0:
|
| 136 |
-
raise gr.Error("
|
| 137 |
|
| 138 |
if prompt == "no camera movement":
|
| 139 |
return image, seed, prompt
|
|
@@ -153,28 +153,28 @@ def infer_camera_edit(
|
|
| 153 |
def create_video_between_images(input_image, output_image, prompt: str, request: gr.Request) -> str:
|
| 154 |
"""Create a video between the input and output images."""
|
| 155 |
if input_image is None or output_image is None:
|
| 156 |
-
raise gr.Error("
|
| 157 |
-
|
| 158 |
try:
|
| 159 |
-
|
| 160 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
|
| 161 |
input_image.save(tmp.name)
|
| 162 |
input_image_path = tmp.name
|
| 163 |
-
|
| 164 |
output_pil = Image.fromarray(output_image.astype('uint8'))
|
| 165 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
|
| 166 |
output_pil.save(tmp.name)
|
| 167 |
output_image_path = tmp.name
|
| 168 |
-
|
| 169 |
video_path = _generate_video_segment(
|
| 170 |
-
input_image_path,
|
| 171 |
-
output_image_path,
|
| 172 |
-
prompt if prompt else "
|
| 173 |
request
|
| 174 |
)
|
| 175 |
return video_path
|
| 176 |
except Exception as e:
|
| 177 |
-
raise gr.Error(f"
|
| 178 |
|
| 179 |
|
| 180 |
# --- UI ---
|
|
@@ -212,42 +212,42 @@ def update_dimensions_on_upload(image):
|
|
| 212 |
|
| 213 |
with gr.Blocks(theme=gr.themes.Citrus(), css=css) as demo:
|
| 214 |
with gr.Column(elem_id="col-container"):
|
| 215 |
-
gr.Markdown("## ๐ฌ Qwen Image Edit โ
|
| 216 |
gr.Markdown("""
|
| 217 |
-
Qwen Image Edit 2509
|
| 218 |
-
|
| 219 |
"""
|
| 220 |
)
|
| 221 |
|
| 222 |
with gr.Row():
|
| 223 |
with gr.Column():
|
| 224 |
-
image = gr.Image(label="
|
| 225 |
prev_output = gr.Image(value=None, visible=False)
|
| 226 |
is_reset = gr.Checkbox(value=False, visible=False)
|
| 227 |
|
| 228 |
-
with gr.Tab("
|
| 229 |
-
rotate_deg = gr.Slider(label="
|
| 230 |
-
move_forward = gr.Slider(label="
|
| 231 |
-
vertical_tilt = gr.Slider(label="
|
| 232 |
-
wideangle = gr.Checkbox(label="
|
| 233 |
with gr.Row():
|
| 234 |
-
reset_btn = gr.Button("
|
| 235 |
-
run_btn = gr.Button("
|
| 236 |
|
| 237 |
-
with gr.Accordion("
|
| 238 |
-
seed = gr.Slider(label="
|
| 239 |
-
randomize_seed = gr.Checkbox(label="
|
| 240 |
-
true_guidance_scale = gr.Slider(label="
|
| 241 |
-
num_inference_steps = gr.Slider(label="
|
| 242 |
-
height = gr.Slider(label="
|
| 243 |
-
width = gr.Slider(label="
|
| 244 |
|
| 245 |
with gr.Column():
|
| 246 |
-
result = gr.Image(label="
|
| 247 |
-
prompt_preview = gr.Textbox(label="
|
| 248 |
-
create_video_button = gr.Button("๐ฅ
|
| 249 |
with gr.Group(visible=False) as video_group:
|
| 250 |
-
video_output = gr.Video(label="
|
| 251 |
|
| 252 |
inputs = [
|
| 253 |
image,rotate_deg, move_forward,
|
|
@@ -292,6 +292,7 @@ with gr.Blocks(theme=gr.themes.Citrus(), css=css) as demo:
|
|
| 292 |
# Examples
|
| 293 |
gr.Examples(
|
| 294 |
examples=[
|
|
|
|
| 295 |
["tool_of_the_sea.png", 90, 0, 0, False, 0, True, 1.0, 4, 568, 1024],
|
| 296 |
["monkey.jpg", -90, 0, 0, False, 0, True, 1.0, 4, 704, 1024],
|
| 297 |
["metropolis.jpg", 0, 0, -1, False, 0, True, 1.0, 4, 816, 1024],
|
|
|
|
| 133 |
pil_images.append(prev_output.convert("RGB"))
|
| 134 |
|
| 135 |
if len(pil_images) == 0:
|
| 136 |
+
raise gr.Error("๋จผ์ ์ด๋ฏธ์ง๋ฅผ ์
๋ก๋ํด์ฃผ์ธ์.")
|
| 137 |
|
| 138 |
if prompt == "no camera movement":
|
| 139 |
return image, seed, prompt
|
|
|
|
| 153 |
def create_video_between_images(input_image, output_image, prompt: str, request: gr.Request) -> str:
|
| 154 |
"""Create a video between the input and output images."""
|
| 155 |
if input_image is None or output_image is None:
|
| 156 |
+
raise gr.Error("๋น๋์ค ์์ฑ์ ์ํด ์
๋ ฅ ๋ฐ ์ถ๋ ฅ ์ด๋ฏธ์ง๊ฐ ๋ชจ๋ ํ์ํฉ๋๋ค.")
|
| 157 |
+
|
| 158 |
try:
|
| 159 |
+
|
| 160 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
|
| 161 |
input_image.save(tmp.name)
|
| 162 |
input_image_path = tmp.name
|
| 163 |
+
|
| 164 |
output_pil = Image.fromarray(output_image.astype('uint8'))
|
| 165 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmp:
|
| 166 |
output_pil.save(tmp.name)
|
| 167 |
output_image_path = tmp.name
|
| 168 |
+
|
| 169 |
video_path = _generate_video_segment(
|
| 170 |
+
input_image_path,
|
| 171 |
+
output_image_path,
|
| 172 |
+
prompt if prompt else "์นด๋ฉ๋ผ ์์ง์ ๋ณํ",
|
| 173 |
request
|
| 174 |
)
|
| 175 |
return video_path
|
| 176 |
except Exception as e:
|
| 177 |
+
raise gr.Error(f"๋น๋์ค ์์ฑ ์คํจ: {e}")
|
| 178 |
|
| 179 |
|
| 180 |
# --- UI ---
|
|
|
|
| 212 |
|
| 213 |
with gr.Blocks(theme=gr.themes.Citrus(), css=css) as demo:
|
| 214 |
with gr.Column(elem_id="col-container"):
|
| 215 |
+
gr.Markdown("## ๐ฌ Qwen Image Edit โ ์นด๋ฉ๋ผ ์ต๊ธ ์ปจํธ๋กค")
|
| 216 |
gr.Markdown("""
|
| 217 |
+
์นด๋ฉ๋ผ ์ปจํธ๋กค์ ์ํ Qwen Image Edit 2509 โจ
|
| 218 |
+
4๋จ๊ณ ์ถ๋ก ์ ์ํ [dx8152's Qwen-Edit-2509-Multiple-angles LoRA](https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles)์ [Phr00t/Qwen-Image-Edit-Rapid-AIO](https://huggingface.co/Phr00t/Qwen-Image-Edit-Rapid-AIO/tree/main) ์ฌ์ฉ ๐จ
|
| 219 |
"""
|
| 220 |
)
|
| 221 |
|
| 222 |
with gr.Row():
|
| 223 |
with gr.Column():
|
| 224 |
+
image = gr.Image(label="์
๋ ฅ ์ด๋ฏธ์ง", type="pil")
|
| 225 |
prev_output = gr.Image(value=None, visible=False)
|
| 226 |
is_reset = gr.Checkbox(value=False, visible=False)
|
| 227 |
|
| 228 |
+
with gr.Tab("์นด๋ฉ๋ผ ์ปจํธ๋กค"):
|
| 229 |
+
rotate_deg = gr.Slider(label="์ข์ฐ ํ์ (๊ฐ๋ ยฐ)", minimum=-90, maximum=90, step=45, value=0)
|
| 230 |
+
move_forward = gr.Slider(label="์ ์ง โ ํด๋ก์ฆ์
", minimum=0, maximum=10, step=5, value=0)
|
| 231 |
+
vertical_tilt = gr.Slider(label="์์ง ์ต๊ธ (์กฐ๊ฐ โ ์๊ฐ)", minimum=-1, maximum=1, step=1, value=0)
|
| 232 |
+
wideangle = gr.Checkbox(label="๊ด๊ฐ ๋ ์ฆ", value=False)
|
| 233 |
with gr.Row():
|
| 234 |
+
reset_btn = gr.Button("์ด๊ธฐํ")
|
| 235 |
+
run_btn = gr.Button("์์ฑ", variant="primary")
|
| 236 |
|
| 237 |
+
with gr.Accordion("๊ณ ๊ธ ์ค์ ", open=False):
|
| 238 |
+
seed = gr.Slider(label="์๋", minimum=0, maximum=MAX_SEED, step=1, value=0)
|
| 239 |
+
randomize_seed = gr.Checkbox(label="๋๋ค ์๋", value=True)
|
| 240 |
+
true_guidance_scale = gr.Slider(label="๊ฐ์ด๋์ค ์ค์ผ์ผ", minimum=1.0, maximum=10.0, step=0.1, value=1.0)
|
| 241 |
+
num_inference_steps = gr.Slider(label="์ถ๋ก ๋จ๊ณ", minimum=1, maximum=40, step=1, value=4)
|
| 242 |
+
height = gr.Slider(label="๋์ด", minimum=256, maximum=2048, step=8, value=1024)
|
| 243 |
+
width = gr.Slider(label="๋๋น", minimum=256, maximum=2048, step=8, value=1024)
|
| 244 |
|
| 245 |
with gr.Column():
|
| 246 |
+
result = gr.Image(label="์ถ๋ ฅ ์ด๋ฏธ์ง", interactive=False)
|
| 247 |
+
prompt_preview = gr.Textbox(label="์ฒ๋ฆฌ๋ ํ๋กฌํํธ", interactive=False)
|
| 248 |
+
create_video_button = gr.Button("๐ฅ ์ด๋ฏธ์ง ๊ฐ ๋น๋์ค ์์ฑ", variant="secondary", visible=False)
|
| 249 |
with gr.Group(visible=False) as video_group:
|
| 250 |
+
video_output = gr.Video(label="์์ฑ๋ ๋น๋์ค", show_download_button=True, autoplay=True)
|
| 251 |
|
| 252 |
inputs = [
|
| 253 |
image,rotate_deg, move_forward,
|
|
|
|
| 292 |
# Examples
|
| 293 |
gr.Examples(
|
| 294 |
examples=[
|
| 295 |
+
["https://upload.wikimedia.org/wikipedia/commons/thumb/c/cc/Grant_Wood_-_American_Gothic_-_Google_Art_Project.jpg/1697px-Grant_Wood_-_American_Gothic_-_Google_Art_Project.jpg", 0, 0, 0, False, 0, True, 1.0, 4, 1024, 768],
|
| 296 |
["tool_of_the_sea.png", 90, 0, 0, False, 0, True, 1.0, 4, 568, 1024],
|
| 297 |
["monkey.jpg", -90, 0, 0, False, 0, True, 1.0, 4, 704, 1024],
|
| 298 |
["metropolis.jpg", 0, 0, -1, False, 0, True, 1.0, 4, 816, 1024],
|