Satellite Building Segmentation

A high-performance satellite building segmentation model using enhanced U-Net architecture, achieving 65.62% Mean IoU on the ISPRS Potsdam dataset.

Model Performance

  • Mean IoU: 65.62%
  • Pixel Accuracy: 82.45%
  • Training: 43 epochs with early stopping
  • Architecture: Enhanced U-Net with multi-scale features
  • Dataset: ISPRS Potsdam (6-class segmentation)

lass Performance

Class IoU Description
Impervious 0.78 Roads, parking, concrete
Buildings 0.69 Houses, structures
Low Vegetation 0.65 Grass, crops, lawns
Trees 0.72 Forests, large trees
Cars 0.45 Vehicles
Clutter 0.35 Mixed/background

Model Details

Architecture

  • Base: Enhanced U-Net
  • Features: Multi-scale blocks, skip connections
  • Input: RGB satellite images (512x512)
  • Output: 6-class segmentation masks
  • Parameters: ~31M parameters

Training Details

  • Dataset: ISPRS Potsdam 2D Semantic Labeling
  • Resolution: 5cm per pixel
  • Epochs: 43 (early stopping)
  • Batch Size: 4 (thermal optimized for RTX 3090)
  • Loss: Combined Focal + Dice with class weights
  • Optimizer: Adam with differential learning rates
  • Hardware: NVIDIA RTX 3090

Usage

Quick Start

import torch
from PIL import Image
import numpy as np

# Load model
model = torch.load('pytorch_model.bin', map_location='cpu')
model.eval()

# Load and preprocess image
image = Image.open('satellite_image.tif').convert('RGB')
image = image.resize((512, 512))
image_tensor = torch.from_numpy(np.array(image)).float().permute(2, 0, 1) / 255.0

# Normalize
mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
image_tensor = (image_tensor - mean) / std

# Predict
with torch.no_grad():
    outputs = model(image_tensor.unsqueeze(0))
    predictions = torch.argmax(torch.softmax(outputs, dim=1), dim=1)

# Convert to numpy
segmentation = predictions.cpu().numpy()[0]

Class Mapping

CLASS_COLORS = {
    0: [255, 255, 255],  # Impervious (white)
    1: [255, 0, 0],      # Buildings (red)
    2: [0, 255, 0],      # Low vegetation (green)
    3: [0, 255, 255],    # Trees (cyan)
    4: [255, 255, 0],    # Cars (yellow)
    5: [255, 0, 255],    # Clutter (magenta)
}

Technical Specifications

Input Requirements

  • Format: RGB TIFF or PNG images
  • Size: Any size (automatically resized to 512x512)
  • Channels: 3 (RGB)
  • Bit Depth: 8-bit recommended

Output Format

  • Type: Integer class indices (0-5)
  • Size: 512x512
  • Classes: 6 semantic classes

Performance Characteristics

  • Inference Speed: ~50ms per image (GPU)
  • Memory Usage: ~2GB GPU memory
  • Accuracy: Best on urban/suburban scenes

Citation

If you use this model in your research, please cite:

@misc{satellite-building-segmentation-2024,
  title={Satellite Building Segmentation using Enhanced U-Net},
  author={Your Name},
  year={2024},
  howpublished={Hugging Face Hub},
  url={https://huggingface.co/your-username/satellite-building-segmentation}
}

Contributing

Contributions welcome! Areas for improvement:

  • Multi-scale inference
  • Attention mechanism optimization
  • Additional datasets
  • Model compression
  • Real-time inference

License

MIT License - See LICENSE file for details.

Acknowledgments

  • ISPRS for the Potsdam dataset
  • PyTorch community
  • Satellite imagery research community
  • Enhanced U-Net architecture research
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results