Satellite Building Segmentation

A high-performance satellite building segmentation model using enhanced U-Net architecture, achieving 65.62% Mean IoU on the ISPRS Potsdam dataset.

Model Performance

Mean IoU: 65.62%
Pixel Accuracy: 82.45%
Training: 43 epochs with early stopping
Architecture: Enhanced U-Net with multi-scale features
Dataset: ISPRS Potsdam (6-class segmentation)

lass Performance

Class	IoU	Description
Impervious	0.78	Roads, parking, concrete
Buildings	0.69	Houses, structures
Low Vegetation	0.65	Grass, crops, lawns
Trees	0.72	Forests, large trees
Cars	0.45	Vehicles
Clutter	0.35	Mixed/background

Model Details

Architecture

Base: Enhanced U-Net
Features: Multi-scale blocks, skip connections
Input: RGB satellite images (512x512)
Output: 6-class segmentation masks
Parameters: ~31M parameters

Training Details

Dataset: ISPRS Potsdam 2D Semantic Labeling
Resolution: 5cm per pixel
Epochs: 43 (early stopping)
Batch Size: 4 (thermal optimized for RTX 3090)
Loss: Combined Focal + Dice with class weights
Optimizer: Adam with differential learning rates
Hardware: NVIDIA RTX 3090

Usage

Quick Start

import torch
from PIL import Image
import numpy as np

# Load model
model = torch.load('pytorch_model.bin', map_location='cpu')
model.eval()

# Load and preprocess image
image = Image.open('satellite_image.tif').convert('RGB')
image = image.resize((512, 512))
image_tensor = torch.from_numpy(np.array(image)).float().permute(2, 0, 1) / 255.0

# Normalize
mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
image_tensor = (image_tensor - mean) / std

# Predict
with torch.no_grad():
    outputs = model(image_tensor.unsqueeze(0))
    predictions = torch.argmax(torch.softmax(outputs, dim=1), dim=1)

# Convert to numpy
segmentation = predictions.cpu().numpy()[0]

Class Mapping

CLASS_COLORS = {
    0: [255, 255, 255],  # Impervious (white)
    1: [255, 0, 0],      # Buildings (red)
    2: [0, 255, 0],      # Low vegetation (green)
    3: [0, 255, 255],    # Trees (cyan)
    4: [255, 255, 0],    # Cars (yellow)
    5: [255, 0, 255],    # Clutter (magenta)
}

Technical Specifications

Input Requirements

Format: RGB TIFF or PNG images
Size: Any size (automatically resized to 512x512)
Channels: 3 (RGB)
Bit Depth: 8-bit recommended

Output Format

Type: Integer class indices (0-5)
Size: 512x512
Classes: 6 semantic classes

Performance Characteristics

Inference Speed: ~50ms per image (GPU)
Memory Usage: ~2GB GPU memory
Accuracy: Best on urban/suburban scenes

Citation

If you use this model in your research, please cite:

@misc{satellite-building-segmentation-2024,
  title={Satellite Building Segmentation using Enhanced U-Net},
  author={Your Name},
  year={2024},
  howpublished={Hugging Face Hub},
  url={https://huggingface.co/your-username/satellite-building-segmentation}
}

Contributing

Contributions welcome! Areas for improvement:

Multi-scale inference
Attention mechanism optimization
Additional datasets
Model compression
Real-time inference

License

MIT License - See LICENSE file for details.

Acknowledgments

ISPRS for the Potsdam dataset
PyTorch community
Satellite imagery research community
Enhanced U-Net architecture research

mdranias1
/

satellite-building-segmentation