GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework

Overview

This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the GridNet-HD dataset. The framework includes:

A semantic segmentation pipeline transformer-based with UperNetForSemanticSegmentation (via HuggingFace Transformers).
Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time.
Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering.
JAX-accelerated operations for efficient 3D projection.
Logging and experiment tracking with Weights & Biases.

This implementation serves as one of the official baselines for GridNet-HD.

Project Structure
Configuration
Environment
Dataset Structure
Installation
Supported Modes
Results
Pretrained Weights
Usage Examples
Weights & Biases Integration
License
Contact
Citation

Project Structure

project_root/
├── main.py                      # Pipeline entry point
├── config.yaml                  # Main configuration file
├── datasets/
│   └── semantic_dataset.py      # Semantic segmentation dataset class
├── models/
│   └── upernet_wrapper.py       # Model loading utility
├── train/
│   ├── train.py                 # Training loop
│   └── eval.py                  # Evaluation loop
├── inference/
│   ├── inference.py             # Sliding window inference and output saving
│   ├── sliding_window.py        # Core logic for windowed inference
│   └── export_logits.py         # Export of softmax probabilities
├── projection/
│   ├── lidar_projection.py      # Projection of predictions to LiDAR space
│   └── fast_proj.py             # Utilities for projection (Agsoft conventions), accelerated with Jax
├── utils/
│   ├── logging_utils.py         # Logging setup
│   ├── metrics.py               # Evaluation metrics (IoU, F1)
│   └── seed.py                  # Reproducibility utilities
├── best_model.pth               # Weights for best model
└── requirements.txt             # Python dependencies

Configuration

All parameters are managed in config.yaml. Key sections include:

data: paths, input dimensions, normalization statistics, class remapping.
training: optimizer settings, learning rate schedule, checkpoint directory.
validation: batch sizes, projection parameters.
model: pretrained backbone, number of classes, ignore index.
wandb: project and entity names for Weights & Biases tracking.

Adjust these settings to match your dataset and compute environment.

Example config.yaml:

data:
  root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc.
  split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders
  resize_size: [1760, 1318] # resize image and mask, PIL style (width, height)
  crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size
  # Image normalization
  mean: [0.5, 0.5, 0.5]
  std:  [0.5, 0.5, 0.5]
  class_map:
    - keys: [0, 1, 2, 3, 4] # original values
      value: 0              # new value (remap value)
    - keys: [5]
      value: 1
    - keys: [6, 7]
      value: 2
    - keys: [8, 9, 10, 11]
      value: 3
    - keys: [14]
      value: 4
    - keys: [15]
      value: 5
    - keys: [16]
      value: 6
    - keys: [17, 18]
      value: 7
    - keys: [19]
      value: 8
    - keys: [20]
      value: 9
    - keys: [21]
      value: 10
    - keys: [12, 13, 255]
      value: 255

model:
  pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace)
  num_classes: 11 # target classes
  ignore_index: 255 # 'ignore' in loss & metrics

training:
  output_dir: "./outputs/run" # Where to save checkpoints & logs
  seed: 42
  batch_size: 32
  num_workers: 8 # parallel workers for DataLoader
  lr: 0.0001 # Initial learning rate
  sched_step: 10 # Scheduler: step every N epochs
  sched_gamma: 0.5 # multiply LR by this gamma
  epochs: 60
  eval_every: 5 # eval every n epochs
 
val:
 batch_size: 8 # number of images per batch during validation and test
 num_workers: 8 # parallel workers for DataLoader
 batch_size_proj: 5000000 # number of points per batch to project on images
 
wandb:
  project: "GridNet-HD-ImageOnly" # only used for training and validation
  entity:  "your-team"

Environment

The following environment was used to train and evaluate the baseline model:

Component	Details
GPU	NVIDIA A40 (48 GB VRAM)
CUDA Version	12.x
OS	Ubuntu 22.04 LTS
Python Version	3.12
PyTorch Version	2.7+cu126
Transformers	🤗 Transformers 4.51
JAX	jax==0.6.0
laspy	>= 2.0
RAM	256 GB (≥ 64 GB recommended)

⚠️ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease:

val
 batch_size
 batch_size_proj

Dataset Structure

The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files. The structure of the GridNet-HD dataset remains the same (see GridNet-HD dataset for more information)

Setup & Installation

Clone the repository:

git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline
cd ImageVote_GridNet-HD_baseline

Create a conda virtual environment:

conda create -n gridnet_hd_image python=3.12
conda activate gridnet_hd_image

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Supported Modes

Each mode is selected via the --mode argument in main.py.

Mode	Description
`train`	Train the image segmentation model
`val`	Evaluate the model on validation set (2D) and return metrics at image level
`test`	Run inference on test set (saves predicted masks)
`test3d`	Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file
`val3d`	Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level
`export_probs`	Export softmax logits for each input image
`project_probs_3d`	Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline

Results

The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in config.yaml.

Class	IoU (Test set) (%)
Pylon	85.09
Conductor cable	64.82
Structural cable	45.06
Insulator	71.07
High vegetation	83.86
Low vegetation	63.43
Herbaceous vegetation	84.45
Rock, gravel, soil	38.62
Impervious soil (Road)	80.69
Water	74.87
Building	68.09
Mean IoU (mIoU)	69.10

Pretrained Weights

🔗 Pretrained weights for the best performing model are available for download directly in this repo.

This checkpoint corresponds to the model trained using the configuration in config.yaml, achieving a mean IoU of 69.10% on test set.

Usage Examples

Training

python main.py --mode train --config config.yaml

2D Validation

python main.py --mode val --weights_path best_model.pth

2D Inference

python main.py --mode test --weights_path best_model.pth

3D Inference (with LiDAR projection)

python main.py --mode test3d --weights_path best_model.pth

3D Validation

python main.py --mode val3d --weights_path best_model.pth

Export Softmax Logits

python main.py --mode export_probs --weights_path best_model.pth

Project Softmax logits onto LiDAR

python main.py --mode project_probs_3d --weights_path best_model.pth

Weights & Biases Integration

To log training and evaluation to Weights & Biases:

wandb login

Set the project and entity fields in your config.yaml file.

License

This project is open-sourced under the MIT License.

Contact

For questions, issues, or contributions, please open an issue on the repository.

Citation

If you use this repo in research, please cite:

GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure
Masked Authors
Submitted to NeurIPS 2025.

heig-vd-geo
/

ImageVote_GridNet-HD_baseline