we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

BiliSakura/AeroGen

Aerial image generation conditioned on bounding boxes (horizontal or rotated) and object categories. AeroGen is the first model to simultaneously support horizontal and rotated bounding box condition generation for remote sensing imagery.

Converted to diffusers format. Self-contained — no external code repo needed; all required code is bundled.

Model Details

  • Model type: Latent diffusion with UNet + VAE + CLIP text encoder + RBoxEncoder (condition encoder)
  • Conditioning: Bounding boxes (8 coords for rotated, 4 for axis-aligned), category CLIP embeddings, spatial masks
  • Scheduler: DDIMScheduler, 1000 steps, scaled_linear
  • Output: 512×512 RGB aerial images
  • License: Apache 2.0

Repository Structure

Component Path
Pipeline pipeline.py
UNet unet/
VAE vae/
Text encoder text_encoder/
Condition encoder condition_encoder/
Scheduler scheduler/
Config model_index.json

Inference

Dependencies: pip install diffusers transformers torch einops safetensors pyyaml

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/AeroGen",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
)
pipe = pipe.to("cuda")

Conditioning Format

Input Shape Description
bboxes (B, N, 8) Rotated box corners [x1,y1,x2,y2,x3,y3,x4,y4], normalized
bboxes (B, N, 4) Axis-aligned [x1,y1,x2,y2], normalized
category_conditions (B, N, 768) CLIP text embeddings per object (e.g. encode class name)
mask_conditions (B, N, 64, 64) Spatial mask per object (64×64 for 512px output)
mask_vector (B, N) 1 = valid object, 0 = padding

For layout preparation and DIOR-R format, see the original AeroGen repo.

Model Sources

Citation

@inproceedings{tangAeroGenEnhancingRemote2025,
  title = {{{AeroGen}}: {{Enhancing Remote Sensing Object Detection}} with {{Diffusion-Driven Data Generation}}},
  shorttitle = {{{AeroGen}}},
  booktitle = {{{CVPR}}},
  author = {Tang, Datao and Cao, Xiangyong and Wu, Xuan and Li, Jialin and Yao, Jing and Bai, Xueru and Jiang, Dongsheng and Li, Yin and Meng, Deyu},
  year = 2025,
  pages = {3614--3624},
  urldate = {2025-11-20}
}
Downloads last month
43
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BiliSakura/AeroGen

Paper for BiliSakura/AeroGen