Qwen-Image-Edit-Angles

Running on Zero

App Files Files Community

Qwen-Image-Edit-Angles / CLAUDE.md

tchung1970

Localize UI to Korean and add documentation

bdb5e40 15 days ago

preview code

raw

history blame contribute delete

5.27 kB

A newer version of the Gradio SDK is available: 6.0.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is a Hugging Face Gradio Space for camera angle control in image editing using the Qwen Image Edit 2509 model. The application allows users to control camera rotation, movement, vertical tilt, and lens settings through a web interface. It uses optimized 4-step inference with a fused LoRA model for multiple camera angles.

Architecture

Core Components

app.py - Main Gradio application
- Loads the Qwen Image Edit pipeline with custom transformer and FlashAttention-3 processor
- Loads and fuses LoRA weights from dx8152/Qwen-Edit-2509-Multiple-angles with scale 1.25
- Provides camera control UI (rotation, forward movement, vertical tilt, wide-angle lens)
- Generates bilingual (Chinese/English) prompts from camera controls
- Integrates with external video generation service (multimodalart/wan-2-2-first-last-frame)
- Implements live inference with auto-reset on image upload
optimization.py - Pipeline optimization module
- Uses spaces.aoti_compile() for ahead-of-time (AOT) compilation of transformer
- Defines dynamic shapes for image and text sequence lengths
- Configures TorchInductor with coordinate descent tuning and CUDA graphs
- Float8 quantization code is present but commented out (line 59)
qwenimage/ - Custom Qwen model implementations
- pipeline_qwenimage_edit_plus.py - Custom diffusion pipeline for Qwen Image Edit
- transformer_qwenimage.py - QwenImageTransformer2DModel with double-stream architecture
- qwen_fa3_processor.py - FlashAttention-3 attention processor for joint text-image attention
- init.py - Package initialization (minimal)

Key Technical Details

Model: Uses Qwen/Qwen-Image-Edit-2509 base with linoyts/Qwen-Image-Edit-Rapid-AIO transformer for fast 4-step inference
LoRA: Camera angle control LoRA from dx8152/Qwen-Edit-2509-Multiple-angles (镜头转换.safetensors) fused at scale 1.25
Attention: FlashAttention-3 via HuggingFace kernels package (kernels-community/vllm-flash-attn3)
Optimization: AOT compilation with dynamic shapes and CUDA graphs for ~1500s GPU duration
Device: CUDA if available, falls back to CPU
Dtype: bfloat16 throughout

Camera Prompt Building

The build_camera_prompt function (app.py:70-99) converts slider values to bilingual prompts:

Rotation: ±45° or ±90° left/right
Forward movement: 0 (none), 1-4 (move forward), 5-10 (close-up)
Vertical tilt: -1 (bird's-eye), 0 (neutral), +1 (worm's-eye)
Wide-angle: Boolean checkbox

Prompts are generated in both Chinese and English (e.g., "将镜头向左旋转45度 Rotate the camera 45 degrees to the left.").

Common Commands

Running the Application

# Install dependencies
pip install -r requirements.txt

# Run the Gradio app (launches on default port 7860)
python app.py

Development

The app is designed to run on Hugging Face Spaces with ZeroGPU support. The @spaces.GPU decorator allocates GPU resources for inference and compilation.

Key environment notes:

Requires CUDA GPU for optimal performance
FlashAttention-3 requires the kernels package with kernels-community/vllm-flash-attn3
Pipeline warmup happens at startup with dummy 1024x1024 images (app.py:54)

Model Loading Flow

Load base pipeline from Qwen/Qwen-Image-Edit-2509
Swap transformer with rapid version from linoyts/Qwen-Image-Edit-Rapid-AIO
Load LoRA weights for camera angles
Fuse LoRA at scale 1.25 and unload weights
Set custom transformer class and FlashAttention-3 processor
Optimize pipeline with AOT compilation

Important Implementation Details

Image Dimensions

Input images are automatically resized to maintain aspect ratio with max dimension 1024
Dimensions are rounded to multiples of 8 (required by the VAE)
See update_dimensions_on_upload() (app.py:191-210)

Live Inference

Control sliders trigger inference on .release() events
Wide-angle checkbox triggers on .input() event
Reset flag prevents inference during control resets
Previous output is stored for chaining edits

Video Generation

Optional feature to create video transitions between input and output images
Uses external Gradio client: multimodalart/wan-2-2-first-last-frame
Requires x-ip-token header from incoming request
Saves temporary files for API communication

Attention Processor Limitations

The FlashAttention-3 processor (qwen_fa3_processor.py) does NOT support:

Arbitrary attention masks
Causal masking
Windowed attention or sink tokens (not plumbed through)

If you need these features, you must modify the processor or fall back to standard attention.

Dependencies

Core dependencies from requirements.txt:

diffusers (git+https://github.com/huggingface/diffusers.git)
transformers
accelerate
safetensors
peft
torchao==0.11.0
kernels (for FlashAttention-3)

Gradio Space Configuration

From README.md:

SDK: gradio 5.49.1
App file: app.py
License: Apache 2.0
Inference: 4 steps (configurable via slider, default=4)