GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework
Overview
This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the GridNet-HD dataset. The framework includes:
- A semantic segmentation pipeline transformer-based with
UperNetForSemanticSegmentation
(via HuggingFace Transformers). - Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time.
- Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering.
- JAX-accelerated operations for efficient 3D projection.
- Logging and experiment tracking with Weights & Biases.
This implementation serves as one of the official baselines for GridNet-HD.
Table of Contents
- Project Structure
- Configuration
- Environment
- Dataset Structure
- Installation
- Supported Modes
- Results
- Pretrained Weights
- Usage Examples
- Weights & Biases Integration
- License
- Contact
- Citation
Project Structure
project_root/
βββ main.py # Pipeline entry point
βββ config.yaml # Main configuration file
βββ datasets/
β βββ semantic_dataset.py # Semantic segmentation dataset class
βββ models/
β βββ upernet_wrapper.py # Model loading utility
βββ train/
β βββ train.py # Training loop
β βββ eval.py # Evaluation loop
βββ inference/
β βββ inference.py # Sliding window inference and output saving
β βββ sliding_window.py # Core logic for windowed inference
β βββ export_logits.py # Export of softmax probabilities
βββ projection/
β βββ lidar_projection.py # Projection of predictions to LiDAR space
β βββ fast_proj.py # Utilities for projection (Agsoft conventions), accelerated with Jax
βββ utils/
β βββ logging_utils.py # Logging setup
β βββ metrics.py # Evaluation metrics (IoU, F1)
β βββ seed.py # Reproducibility utilities
βββ best_model.pth # Weights for best model
βββ requirements.txt # Python dependencies
Configuration
All parameters are managed in config.yaml. Key sections include:
data
: paths, input dimensions, normalization statistics, class remapping.training
: optimizer settings, learning rate schedule, checkpoint directory.validation
: batch sizes, projection parameters.model
: pretrained backbone, number of classes, ignore index.wandb
: project and entity names for Weights & Biases tracking.
Adjust these settings to match your dataset and compute environment.
Example config.yaml
:
data:
root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc.
split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders
resize_size: [1760, 1318] # resize image and mask, PIL style (width, height)
crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size
# Image normalization
mean: [0.5, 0.5, 0.5]
std: [0.5, 0.5, 0.5]
class_map:
- keys: [0, 1, 2, 3, 4] # original values
value: 0 # new value (remap value)
- keys: [5]
value: 1
- keys: [6, 7]
value: 2
- keys: [8, 9, 10, 11]
value: 3
- keys: [14]
value: 4
- keys: [15]
value: 5
- keys: [16]
value: 6
- keys: [17, 18]
value: 7
- keys: [19]
value: 8
- keys: [20]
value: 9
- keys: [21]
value: 10
- keys: [12, 13, 255]
value: 255
model:
pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace)
num_classes: 11 # target classes
ignore_index: 255 # 'ignore' in loss & metrics
training:
output_dir: "./outputs/run" # Where to save checkpoints & logs
seed: 42
batch_size: 32
num_workers: 8 # parallel workers for DataLoader
lr: 0.0001 # Initial learning rate
sched_step: 10 # Scheduler: step every N epochs
sched_gamma: 0.5 # multiply LR by this gamma
epochs: 60
eval_every: 5 # eval every n epochs
val:
batch_size: 8 # number of images per batch during validation and test
num_workers: 8 # parallel workers for DataLoader
batch_size_proj: 5000000 # number of points per batch to project on images
wandb:
project: "GridNet-HD-ImageOnly" # only used for training and validation
entity: "your-team"
Environment
The following environment was used to train and evaluate the baseline model:
Component | Details |
---|---|
GPU | NVIDIA A40 (48 GB VRAM) |
CUDA Version | 12.x |
OS | Ubuntu 22.04 LTS |
Python Version | 3.12 |
PyTorch Version | 2.7+cu126 |
Transformers | π€ Transformers 4.51 |
JAX | jax==0.6.0 |
laspy | >= 2.0 |
RAM | 256 GB (β₯ 64 GB recommended) |
β οΈ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease:
val
batch_size
batch_size_proj
Dataset Structure
The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files. The structure of the GridNet-HD dataset remains the same (see GridNet-HD dataset for more information)
Setup & Installation
Clone the repository:
git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline cd ImageVote_GridNet-HD_baseline
Create a conda virtual environment:
conda create -n gridnet_hd_image python=3.12 conda activate gridnet_hd_image
Install dependencies:
pip install --upgrade pip pip install -r requirements.txt
Supported Modes
Each mode is selected via the --mode
argument in main.py
.
Mode | Description |
---|---|
train |
Train the image segmentation model |
val |
Evaluate the model on validation set (2D) and return metrics at image level |
test |
Run inference on test set (saves predicted masks) |
test3d |
Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file |
val3d |
Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level |
export_probs |
Export softmax logits for each input image |
project_probs_3d |
Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline |
Results
The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in config.yaml
.
Class | IoU (Test set) (%) |
---|---|
Pylon | 85.09 |
Conductor cable | 64.82 |
Structural cable | 45.06 |
Insulator | 71.07 |
High vegetation | 83.86 |
Low vegetation | 63.43 |
Herbaceous vegetation | 84.45 |
Rock, gravel, soil | 38.62 |
Impervious soil (Road) | 80.69 |
Water | 74.87 |
Building | 68.09 |
Mean IoU (mIoU) | 69.10 |
Pretrained Weights
π Pretrained weights for the best performing model are available for download directly in this repo.
This checkpoint corresponds to the model trained using the configuration in
config.yaml
, achieving a mean IoU of 69.10% on test set.
Usage Examples
Training
python main.py --mode train --config config.yaml
2D Validation
python main.py --mode val --weights_path best_model.pth
2D Inference
python main.py --mode test --weights_path best_model.pth
3D Inference (with LiDAR projection)
python main.py --mode test3d --weights_path best_model.pth
3D Validation
python main.py --mode val3d --weights_path best_model.pth
Export Softmax Logits
python main.py --mode export_probs --weights_path best_model.pth
Project Softmax logits onto LiDAR
python main.py --mode project_probs_3d --weights_path best_model.pth
Weights & Biases Integration
To log training and evaluation to Weights & Biases:
wandb login
Set the project and entity fields in your config.yaml
file.
License
This project is open-sourced under the MIT License.
Contact
For questions, issues, or contributions, please open an issue on the repository.
Citation
If you use this repo in research, please cite:
GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure
Masked Authors
Submitted to NeurIPS 2025.
- Downloads last month
- 6
Model tree for heig-vd-geo/ImageVote_GridNet-HD_baseline
Base model
openmmlab/upernet-swin-tiny