English

GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework

Overview

This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the GridNet-HD dataset. The framework includes:

  • A semantic segmentation pipeline transformer-based with UperNetForSemanticSegmentation (via HuggingFace Transformers).
  • Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time.
  • Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering.
  • JAX-accelerated operations for efficient 3D projection.
  • Logging and experiment tracking with Weights & Biases.

This implementation serves as one of the official baselines for GridNet-HD.


Table of Contents


Project Structure

project_root/
β”œβ”€β”€ main.py                      # Pipeline entry point
β”œβ”€β”€ config.yaml                  # Main configuration file
β”œβ”€β”€ datasets/
β”‚   └── semantic_dataset.py      # Semantic segmentation dataset class
β”œβ”€β”€ models/
β”‚   └── upernet_wrapper.py       # Model loading utility
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ train.py                 # Training loop
β”‚   └── eval.py                  # Evaluation loop
β”œβ”€β”€ inference/
β”‚   β”œβ”€β”€ inference.py             # Sliding window inference and output saving
β”‚   β”œβ”€β”€ sliding_window.py        # Core logic for windowed inference
β”‚   └── export_logits.py         # Export of softmax probabilities
β”œβ”€β”€ projection/
β”‚   β”œβ”€β”€ lidar_projection.py      # Projection of predictions to LiDAR space
β”‚   └── fast_proj.py             # Utilities for projection (Agsoft conventions), accelerated with Jax
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ logging_utils.py         # Logging setup
β”‚   β”œβ”€β”€ metrics.py               # Evaluation metrics (IoU, F1)
β”‚   └── seed.py                  # Reproducibility utilities
β”œβ”€β”€ best_model.pth               # Weights for best model
└── requirements.txt             # Python dependencies

Configuration

All parameters are managed in config.yaml. Key sections include:

  • data: paths, input dimensions, normalization statistics, class remapping.
  • training: optimizer settings, learning rate schedule, checkpoint directory.
  • validation: batch sizes, projection parameters.
  • model: pretrained backbone, number of classes, ignore index.
  • wandb: project and entity names for Weights & Biases tracking.

Adjust these settings to match your dataset and compute environment.

Example config.yaml:

data:
  root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc.
  split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders
  resize_size: [1760, 1318] # resize image and mask, PIL style (width, height)
  crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size
  # Image normalization
  mean: [0.5, 0.5, 0.5]
  std:  [0.5, 0.5, 0.5]
  class_map:
    - keys: [0, 1, 2, 3, 4] # original values
      value: 0              # new value (remap value)
    - keys: [5]
      value: 1
    - keys: [6, 7]
      value: 2
    - keys: [8, 9, 10, 11]
      value: 3
    - keys: [14]
      value: 4
    - keys: [15]
      value: 5
    - keys: [16]
      value: 6
    - keys: [17, 18]
      value: 7
    - keys: [19]
      value: 8
    - keys: [20]
      value: 9
    - keys: [21]
      value: 10
    - keys: [12, 13, 255]
      value: 255

model:
  pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace)
  num_classes: 11 # target classes
  ignore_index: 255 # 'ignore' in loss & metrics

training:
  output_dir: "./outputs/run" # Where to save checkpoints & logs
  seed: 42
  batch_size: 32
  num_workers: 8 # parallel workers for DataLoader
  lr: 0.0001 # Initial learning rate
  sched_step: 10 # Scheduler: step every N epochs
  sched_gamma: 0.5 # multiply LR by this gamma
  epochs: 60
  eval_every: 5 # eval every n epochs
 
val:
 batch_size: 8 # number of images per batch during validation and test
 num_workers: 8 # parallel workers for DataLoader
 batch_size_proj: 5000000 # number of points per batch to project on images
 
wandb:
  project: "GridNet-HD-ImageOnly" # only used for training and validation
  entity:  "your-team"

Environment

The following environment was used to train and evaluate the baseline model:

Component Details
GPU NVIDIA A40 (48 GB VRAM)
CUDA Version 12.x
OS Ubuntu 22.04 LTS
Python Version 3.12
PyTorch Version 2.7+cu126
Transformers πŸ€— Transformers 4.51
JAX jax==0.6.0
laspy >= 2.0
RAM 256 GB (β‰₯ 64 GB recommended)

⚠️ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease:

val
 batch_size
 batch_size_proj

Dataset Structure

The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files. The structure of the GridNet-HD dataset remains the same (see GridNet-HD dataset for more information)


Setup & Installation

  1. Clone the repository:

    git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline
    cd ImageVote_GridNet-HD_baseline
    
  2. Create a conda virtual environment:

    conda create -n gridnet_hd_image python=3.12
    conda activate gridnet_hd_image
    
  3. Install dependencies:

    pip install --upgrade pip
    pip install -r requirements.txt
    

Supported Modes

Each mode is selected via the --mode argument in main.py.

Mode Description
train Train the image segmentation model
val Evaluate the model on validation set (2D) and return metrics at image level
test Run inference on test set (saves predicted masks)
test3d Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file
val3d Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level
export_probs Export softmax logits for each input image
project_probs_3d Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline

Results

The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in config.yaml.

Class IoU (Test set) (%)
Pylon 85.09
Conductor cable 64.82
Structural cable 45.06
Insulator 71.07
High vegetation 83.86
Low vegetation 63.43
Herbaceous vegetation 84.45
Rock, gravel, soil 38.62
Impervious soil (Road) 80.69
Water 74.87
Building 68.09
Mean IoU (mIoU) 69.10

Pretrained Weights

πŸ”— Pretrained weights for the best performing model are available for download directly in this repo.

This checkpoint corresponds to the model trained using the configuration in config.yaml, achieving a mean IoU of 69.10% on test set.


Usage Examples

Training

python main.py --mode train --config config.yaml

2D Validation

python main.py --mode val --weights_path best_model.pth

2D Inference

python main.py --mode test --weights_path best_model.pth

3D Inference (with LiDAR projection)

python main.py --mode test3d --weights_path best_model.pth

3D Validation

python main.py --mode val3d --weights_path best_model.pth

Export Softmax Logits

python main.py --mode export_probs --weights_path best_model.pth

Project Softmax logits onto LiDAR

python main.py --mode project_probs_3d --weights_path best_model.pth

Weights & Biases Integration

To log training and evaluation to Weights & Biases:

wandb login

Set the project and entity fields in your config.yaml file.


License

This project is open-sourced under the MIT License.


Contact

For questions, issues, or contributions, please open an issue on the repository.


Citation

If you use this repo in research, please cite:

GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure
Masked Authors
Submitted to NeurIPS 2025.
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for heig-vd-geo/ImageVote_GridNet-HD_baseline

Finetuned
(1)
this model

Dataset used to train heig-vd-geo/ImageVote_GridNet-HD_baseline