we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

BiliSakura/AeroGen

Aerial image generation conditioned on bounding boxes (horizontal or rotated) and object categories. AeroGen is the first model to simultaneously support horizontal and rotated bounding box condition generation for remote sensing imagery.

Converted to diffusers format. Self-contained — no external code repo needed; all required code is bundled.

Model Details

Model type: Latent diffusion with UNet + VAE + CLIP text encoder + RBoxEncoder (condition encoder)
Conditioning: Bounding boxes (8 coords for rotated, 4 for axis-aligned), category CLIP embeddings, spatial masks
Scheduler: DDIMScheduler, 1000 steps, scaled_linear
Output: 512×512 RGB aerial images
License: Apache 2.0

Repository Structure

Component	Path
Pipeline	`pipeline.py`
UNet	`unet/`
VAE	`vae/`
Text encoder	`text_encoder/`
Condition encoder	`condition_encoder/`
Scheduler	`scheduler/`
Config	`model_index.json`

Inference

Dependencies: pip install diffusers transformers torch einops safetensors pyyaml

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/AeroGen",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
)
pipe = pipe.to("cuda")

Conditioning Format

Input	Shape	Description
`bboxes`	(B, N, 8)	Rotated box corners [x1,y1,x2,y2,x3,y3,x4,y4], normalized
`bboxes`	(B, N, 4)	Axis-aligned [x1,y1,x2,y2], normalized
`category_conditions`	(B, N, 768)	CLIP text embeddings per object (e.g. encode class name)
`mask_conditions`	(B, N, 64, 64)	Spatial mask per object (64×64 for 512px output)
`mask_vector`	(B, N)	1 = valid object, 0 = padding

For layout preparation and DIOR-R format, see the original AeroGen repo.

Model Sources

Source: Sonetto702/AeroGen
Paper: AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Original repo: Sonettoo/AeroGen
Conversion: Checkpoint converted to diffusers format (self-contained, no external repo)

Citation

@inproceedings{tangAeroGenEnhancingRemote2025,
  title = {{{AeroGen}}: {{Enhancing Remote Sensing Object Detection}} with {{Diffusion-Driven Data Generation}}},
  shorttitle = {{{AeroGen}}},
  booktitle = {{{CVPR}}},
  author = {Tang, Datao and Cao, Xiangyong and Wu, Xuan and Li, Jialin and Yao, Jing and Bai, Xueru and Jiang, Dongsheng and Li, Yin and Meng, Deyu},
  year = 2025,
  pages = {3614--3624},
  urldate = {2025-11-20}
}

Downloads last month: 43

Collection including BiliSakura/AeroGen

Remote Sensing Visual Generative Models

Collection

diffusers implementation • 24 items • Updated 4 days ago • 1

Paper for BiliSakura/AeroGen

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

Paper • 2411.15497 • Published Nov 23, 2024