Remote Sensing Visual Generative Models
Collection
diffusers implementation • 24 items • Updated
• 1
we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn
Aerial image generation conditioned on bounding boxes (horizontal or rotated) and object categories. AeroGen is the first model to simultaneously support horizontal and rotated bounding box condition generation for remote sensing imagery.
Converted to diffusers format. Self-contained — no external code repo needed; all required code is bundled.
| Component | Path |
|---|---|
| Pipeline | pipeline.py |
| UNet | unet/ |
| VAE | vae/ |
| Text encoder | text_encoder/ |
| Condition encoder | condition_encoder/ |
| Scheduler | scheduler/ |
| Config | model_index.json |
Dependencies: pip install diffusers transformers torch einops safetensors pyyaml
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/AeroGen",
custom_pipeline="pipeline.py",
trust_remote_code=True,
)
pipe = pipe.to("cuda")
| Input | Shape | Description |
|---|---|---|
bboxes |
(B, N, 8) | Rotated box corners [x1,y1,x2,y2,x3,y3,x4,y4], normalized |
bboxes |
(B, N, 4) | Axis-aligned [x1,y1,x2,y2], normalized |
category_conditions |
(B, N, 768) | CLIP text embeddings per object (e.g. encode class name) |
mask_conditions |
(B, N, 64, 64) | Spatial mask per object (64×64 for 512px output) |
mask_vector |
(B, N) | 1 = valid object, 0 = padding |
For layout preparation and DIOR-R format, see the original AeroGen repo.
@inproceedings{tangAeroGenEnhancingRemote2025,
title = {{{AeroGen}}: {{Enhancing Remote Sensing Object Detection}} with {{Diffusion-Driven Data Generation}}},
shorttitle = {{{AeroGen}}},
booktitle = {{{CVPR}}},
author = {Tang, Datao and Cao, Xiangyong and Wu, Xuan and Li, Jialin and Yao, Jing and Bai, Xueru and Jiang, Dongsheng and Li, Yin and Meng, Deyu},
year = 2025,
pages = {3614--3624},
urldate = {2025-11-20}
}