Satellite Building Segmentation
A high-performance satellite building segmentation model using enhanced U-Net architecture, achieving 65.62% Mean IoU on the ISPRS Potsdam dataset.
Model Performance
- Mean IoU: 65.62%
- Pixel Accuracy: 82.45%
- Training: 43 epochs with early stopping
- Architecture: Enhanced U-Net with multi-scale features
- Dataset: ISPRS Potsdam (6-class segmentation)
lass Performance
Class | IoU | Description |
---|---|---|
Impervious | 0.78 | Roads, parking, concrete |
Buildings | 0.69 | Houses, structures |
Low Vegetation | 0.65 | Grass, crops, lawns |
Trees | 0.72 | Forests, large trees |
Cars | 0.45 | Vehicles |
Clutter | 0.35 | Mixed/background |
Model Details
Architecture
- Base: Enhanced U-Net
- Features: Multi-scale blocks, skip connections
- Input: RGB satellite images (512x512)
- Output: 6-class segmentation masks
- Parameters: ~31M parameters
Training Details
- Dataset: ISPRS Potsdam 2D Semantic Labeling
- Resolution: 5cm per pixel
- Epochs: 43 (early stopping)
- Batch Size: 4 (thermal optimized for RTX 3090)
- Loss: Combined Focal + Dice with class weights
- Optimizer: Adam with differential learning rates
- Hardware: NVIDIA RTX 3090
Usage
Quick Start
import torch
from PIL import Image
import numpy as np
# Load model
model = torch.load('pytorch_model.bin', map_location='cpu')
model.eval()
# Load and preprocess image
image = Image.open('satellite_image.tif').convert('RGB')
image = image.resize((512, 512))
image_tensor = torch.from_numpy(np.array(image)).float().permute(2, 0, 1) / 255.0
# Normalize
mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
image_tensor = (image_tensor - mean) / std
# Predict
with torch.no_grad():
outputs = model(image_tensor.unsqueeze(0))
predictions = torch.argmax(torch.softmax(outputs, dim=1), dim=1)
# Convert to numpy
segmentation = predictions.cpu().numpy()[0]
Class Mapping
CLASS_COLORS = {
0: [255, 255, 255], # Impervious (white)
1: [255, 0, 0], # Buildings (red)
2: [0, 255, 0], # Low vegetation (green)
3: [0, 255, 255], # Trees (cyan)
4: [255, 255, 0], # Cars (yellow)
5: [255, 0, 255], # Clutter (magenta)
}
Technical Specifications
Input Requirements
- Format: RGB TIFF or PNG images
- Size: Any size (automatically resized to 512x512)
- Channels: 3 (RGB)
- Bit Depth: 8-bit recommended
Output Format
- Type: Integer class indices (0-5)
- Size: 512x512
- Classes: 6 semantic classes
Performance Characteristics
- Inference Speed: ~50ms per image (GPU)
- Memory Usage: ~2GB GPU memory
- Accuracy: Best on urban/suburban scenes
Citation
If you use this model in your research, please cite:
@misc{satellite-building-segmentation-2024,
title={Satellite Building Segmentation using Enhanced U-Net},
author={Your Name},
year={2024},
howpublished={Hugging Face Hub},
url={https://huggingface.co/your-username/satellite-building-segmentation}
}
Contributing
Contributions welcome! Areas for improvement:
- Multi-scale inference
- Attention mechanism optimization
- Additional datasets
- Model compression
- Real-time inference
License
MIT License - See LICENSE file for details.
Acknowledgments
- ISPRS for the Potsdam dataset
- PyTorch community
- Satellite imagery research community
- Enhanced U-Net architecture research
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Evaluation results
- Mean IoU on ISPRS Potsdamself-reported0.656
- Pixel Accuracy on ISPRS Potsdamself-reported0.825