README and cleanup

Browse files

Files changed (8) hide show

.windsurf/rules/defensive-logic.md +1 -1
README.md +115 -15
SEGMENTATION_PLAN.md +0 -136
WireSegHR.pdf +0 -3
scripts/drive-viewer-key-readme.md +16 -0
scripts/export_onnx_trt.py +0 -167
scripts/trt_infer.py +0 -488
train.py +0 -24

.windsurf/rules/defensive-logic.md CHANGED Viewed

@@ -5,4 +5,4 @@ trigger: always_on
 When deciding if to write defensive logic, e.g. dimensionality handling: `tensor1=tensor1.unsqueeze() if tensor1.ndim==1 else tensor1`, or None handling: `var1=var1 if var1 else torch.zeros(...)`, just don't write these things. In my code, shapes are always static, and there is one execution path for all code. I prefer `assert` over defensive logic. If you are writing something to fix the tests and this seems necessary, it's likely that the tests are setup incorrectly.
 The reason why I don't want it is because defensive logic leads to silent failures, and these are bad for debugging.
-In addition, writing "int()" type casting is also a piece of defensive logic that slows down the application. Don't write it unless really necessary e.g. putting int to string. In most cases, indexing with a tensor should be better.

 When deciding if to write defensive logic, e.g. dimensionality handling: `tensor1=tensor1.unsqueeze() if tensor1.ndim==1 else tensor1`, or None handling: `var1=var1 if var1 else torch.zeros(...)`, just don't write these things. In my code, shapes are always static, and there is one execution path for all code. I prefer `assert` over defensive logic. If you are writing something to fix the tests and this seems necessary, it's likely that the tests are setup incorrectly.
 The reason why I don't want it is because defensive logic leads to silent failures, and these are bad for debugging.
+In addition, writing "int()", "float()" or "bool()" type casting is also a piece of defensive logic that slows down the application. Don't write it unless really necessary e.g. putting int to string. In most cases, indexing with a tensor should be better.

README.md CHANGED Viewed

@@ -1,30 +1,35 @@
 # WireSegHR (Segmentation Only)
-This repository contains the segmentation-only implementation plan and code skeleton for the two-stage WireSegHR model (global-to-local, shared encoder).
-- Paper sources live under `paper-tex/`.
-- Long-term navigation plan: `SEGMENTATION_PLAN.md`.
-## Quick Start (skeleton)
-1) Create a virtual environment and install requirements:
 ```bash
-python -m venv .venv
-source .venv/bin/activate
-pip install -r requirements.txt
 ```
-2) Print configuration and verify the skeleton runs:
 ```bash
-python src/wireseghr/train.py --config configs/default.yaml
-python src/wireseghr/infer.py --config configs/default.yaml --image /path/to/image.png
 ```
-3) Next steps:
-- Implement encoder/decoders/condition/minmax/label downsampling per `SEGMENTATION_PLAN.md`.
-- Implement training and inference logic, then metrics and ablations.
 ## Notes
 - This is a segmentation-only codebase. Inpainting is out of scope here.
@@ -41,6 +46,101 @@ python src/wireseghr/infer.py --config configs/default.yaml --image /path/to/ima
   - `dataset/val/images/...` and `dataset/val/gts/...`
   - `dataset/test/images/...` and `dataset/test/gts/...`
 - Masks are binary: foreground = white (255), background = black (0).
-- The loader strictly enforces numeric stems and 1:1 pairing and will assert on mismatches.
 Update `configs/default.yaml` with your paths under `data.train_images`, `data.train_masks`, etc. Defaults point to `dataset/train/images`, `dataset/train/gts`, and validation to `dataset/val/...`.

 # WireSegHR (Segmentation Only)
+This repository contains the segmentation-only implementation of the two-stage [WireSegHR model](https://arxiv.org/abs/2304.00221), training on the WireSegHR dataset plus the [TTPLA dataset](https://github.com/R3ab/ttpla_dataset).
+## Quick Start
+1) Get secrets necessary for fetching of the dataset:
+You'll need a GDrive service account to fetch the WireSegHR dataset using scripts in this repo. Get a GDrive key as described [in this short README](scripts/drive-viewer-key-readme.md), and put it in `/secrets/drive-json.json`
+2) Run:
 ```bash
+scripts/setup.sh
 ```
+This installs dependencies and merges the TTPLA dataset into the WireSegHR dataset format.
+3) Train and run a quick inference check:
 ```bash
+python3 train.py --config configs/default.yaml
+python3 infer.py --config configs/default.yaml --image /path/to/image.jpg
 ```
+The default config `default.yaml` is suitable for a 24GB VRAM GPU with support for bf16 (e.g., RTX 3090/4090).
+<!-- For a quick RTX GPU setup, I recommend [vast.ai](https://cloud.vast.ai/?ref_id=162850) -->
+## Project Overview
+- Two-stage, global-to-local segmentation with a shared encoder and a fine decoder conditioned on the coarse stage.
+- Full training loop with AMP (optional), Poly LR, periodic evaluation, checkpointing, and test visualizations (`train.py`).
+- Dataset utilities under `src/wireseghr/data/` and model components under `src/wireseghr/model/`.
+- Paper text and figures live in `paper-tex/` (`paper-tex/sections/` contains the Method, Results, etc.).
 ## Notes
 - This is a segmentation-only codebase. Inpainting is out of scope here.
   - `dataset/val/images/...` and `dataset/val/gts/...`
   - `dataset/test/images/...` and `dataset/test/gts/...`
 - Masks are binary: foreground = white (255), background = black (0).
+- The loader strictly enforces numeric stems and 1:1 pairing of naming and will raise on file name mismatches.
 Update `configs/default.yaml` with your paths under `data.train_images`, `data.train_masks`, etc. Defaults point to `dataset/train/images`, `dataset/train/gts`, and validation to `dataset/val/...`.
+## Inference
+- Single image (optionally save outputs to a directory):
+```bash
+python3 infer.py \
+  --config configs/default.yaml \
+  --ckpt ckpt_5000.pt \
+  --image dataset/test/images/123.jpg \
+  --out outputs/infer
+```
+- Compute metrics for a single image (requires a GT mask):
+```bash
+python3 infer.py \
+  --config configs/default.yaml \
+  --ckpt ckpt_5000.pt \
+  --image dataset/test/images/123.jpg \
+  --out outputs/infer \
+  --metrics \
+  --mask dataset/test/gts/123.png
+```
+- Run inference over the entire directory with metrics (images_dir sets the image directory, masks_dir sets the ground truth mask directory):
+```bash
+python3 infer.py \
+  --config configs/default.yaml \
+  --ckpt ckpt_5000.pt \
+  --images_dir dataset/test/images \
+  --out outputs/infer \
+  --metrics \
+  --masks_dir dataset/test/gts
+```
+Notes:
+- Predictions are saved as 0/255 PNGs. For metrics, predictions are binarized with `> 0` to match training logic.
+- Masks are matched by filename stem: `images/123.jpg` ↔ `gts/123.png`.
+## Benchmarking and Metrics
+Benchmark mode times the model on a directory of images and reports coarse/fine/total latency statistics. When `--metrics` is provided, it also computes IoU/F1/Precision/Recall over the benchmark set (both fine and coarse outputs).
+Example (uses `data.test_images` and `data.test_masks` from the config by default):
+```bash
+python3 infer.py \
+  --config configs/default.yaml \
+  --benchmark \
+  --ckpt ckpt_5000.pt \
+  --bench_warmup 2 \
+  --bench_limit 0 \
+  --bench_report_json outputs/bench_report.json \
+  --metrics
+```
+If your ground truth directory is different from `data.test_masks`, please override it with `--bench_masks_dir`:
+```bash
+python3 infer.py \
+  --config configs/default.yaml \
+  --benchmark \
+  --ckpt ckpt_5000.pt \
+  --bench_warmup 2 \
+  --bench_limit 0 \
+  --bench_report_json outputs/bench_report.json \
+  --metrics \
+  --bench_masks_dir /path/to/gts
+```
+You will see output like:
+```
+[WireSegHR][bench] Results (ms):
+  Coarse  avg=50.16  p50=44.48  p95=76.78
+  Fine    avg=534.38  p50=419.52  p95=1187.66
+  Total   avg=584.54  p50=464.73  p95=1300.07
+  Target  < 1000 ms per 3000x4000 image: YES
+[WireSegHR][bench][Fine]   IoU=0.6098 F1=0.7576 P=0.6418 R=0.9244
+[WireSegHR][bench][Coarse] IoU=0.5315 F1=0.6941 P=0.5467 R=0.9502
+```
+**These metrics were obtained after 5000 iterations*
+Optional: you can save a JSON timing report with `--bench_report_json`. Schema:
+- `summary`
+  - `avg_ms`, `p50_ms`, `p95_ms`
+  - `avg_coarse_ms`, `avg_fine_ms`
+  - `images`
+- `per_image`: list of objects with
+  - `path`, `H`, `W`, `t_coarse_ms`, `t_fine_ms`, `t_total_ms`
+Utils:
+- Export your model to inference-only weights by scripts/strip_checkpoint.py

SEGMENTATION_PLAN.md DELETED Viewed

@@ -1,136 +0,0 @@
-# WireSegHR Segmentation-Only Implementation Plan
-This plan distills the model and pipeline described in the paper sources:
-- `paper-tex/sections/method.tex`
-- `paper-tex/sections/method_yq.tex`
-- `paper-tex/figure_tex/pipeline.tex`
-- `paper-tex/tables/{component,logit,thresholds}.tex`
-Focus: segmentation only (no dataset collection or inpainting).
-## Decisions and Defaults (locked)
-- Backbone: SegFormer MiT-B3 via HuggingFace Transformers.
-- Fine/local patch size p: 768.
-- Conditioning: global map + binary location mask by default (Table `tables/logit.tex`).
-- Conditioning map scope: patch-cropped from the global map per `paper-tex/sections/method_yq.tex` (no full-image concatenation variant).
-- MinMax feature augmentation: luminance min and max with a fixed 6×6 window; channels concatenated to inputs (Figure `figure_tex/pipeline.tex`, Sec. “Wire Feature Preservation” in `method_yq.tex`).
-- Loss: CE on both branches, λ = 1 (`method_yq.tex`).
-- α-threshold for refining windows: default 0.01 (Table `tables/thresholds.tex`).
-- Coarse input size: train 512×512; test 1024×1024 (`method.tex`).
-- Optim: AdamW (lr=6e-5, wd=0.01, poly schedule with power=1), ~40k iters, batch size ~8 (`method.tex`).
-## Project Structure
-- `configs/`
-  - `default.yaml` (backbone=mit_b2, p=768, coarse_train=512, coarse_test=1024, alpha=0.01, minmax=true, kernel=6, maxpool_label=true, cond_variant=global)
-- `src/wireseghr/`
-  - `model/`
-    - `encoder.py` (SegFormer MiT-B3, N_in channels expansion)
-    - `decoder.py` (two MLP decoders `D_C`, `D_F` for 2 classes)
-    - `condition.py` (1×1 conv to collapse coarse 2-ch logits → 1-ch cond)
-    - `minmax.py` (6×6 luminance min/max filtering)
-    - `label_downsample.py` (MaxPool-based coarse GT downsampling)
-  - `data/`
-    - `dataset.py` (image/mask loading, full-res to coarse/fine inputs)
-    - `sampler.py` (balanced patch sampling with ≥1% wire pixels)
-    - `transforms.py` (scaling, rotation, flip, photometric distortion)
-  - `train.py` (end-to-end two-branch training)
-  - `infer.py` (coarse-to-fine sliding-window inference + stitching)
-  - `metrics.py` (IoU, F1, Precision, Recall)
-  - `utils.py` (misc: overlap blending, seeding, logging)
-- `tests/` (unit tests for channel wiring, cond alignment, stitching)
-- `README.md` (segmentation-only usage)
-## Model Specification
-- Shared encoder `E`: SegFormer MiT-B3 (HF Transformers preferred).
-  - Input channels (default): 3 (RGB) + 2 (MinMax) + 1 (global cond) + 1 (binary location) = 7.
-  - For the coarse pass, the cond and location channels are zeros to keep channel count consistent (`method_yq.tex`).
-  - Weight init for extra channels: copy mean of RGB conv weights or zero-init.
-- Decoders: two SegFormer MLP decoders
-  - `D_C`: coarse logits (2 channels) at coarse resolution.
-  - `D_F`: fine logits (2 channels) at patch resolution p×p.
-- Conditioning to fine branch (default):
-  - Take coarse pre-softmax logits (2-ch), apply 1×1 conv → 1-ch cond map (`method.tex`).
-  - Binary location mask: 1 inside current patch region (in full-image coordinates), 0 elsewhere.
-  - Pass patch-aligned cond crop and binary mask as channels to the fine branch input.
-- Notes:
-  - We follow the published version (`paper-tex/sections/method_yq.tex`) and use patch-cropped conditioning exclusively; no full-image conditioning variant will be implemented.
-## Data and Preprocessing
-- MinMax luminance features (both branches):
-  - Y = 0.299R + 0.587G + 0.114B.
-  - Y_min = min filter (6×6), Y_max = max filter (6×6).
-  - Concat [Y_min, Y_max] to the input image channels.
-- Coarse GT label generation (MaxPool):
-  - Downsample full-res mask to coarse size with max-pooling to prevent wire vanishing (`method_yq.tex`).
-- Normalization: standard mean/std per backbone; apply consistently across channels (new channels can be mean=0, std=1 by convention, or min-max scaled).
-### Dataset Convention (project-specific)
-- Flat directories with numeric filenames; images are `.jpg`/`.jpeg`, masks are `.png`.
-- Example:
-  - `dataset/images/1.jpg, 2.jpg, ..., N.jpg` (or `.jpeg`)
-  - `dataset/gts/1.png, 2.png, ..., N.png`
-- Masks are binary: foreground = white (255), background = black (0).
-- The loader (`data/dataset.py`) strictly enforces numeric stems and 1:1 pairing and will assert on mismatch.
-## Training Pipeline
-- Augment the full-res image (scaling, rotation, horizontal flip, photometric distortion) before constructing coarse/fine inputs (`method.tex`).
-- Coarse input: downsample augmented full image to 512×512; build channels [RGB+MinMax+zeros(2)] → `E` → `D_C`.
-- Fine input (per iteration select 1–k patches):
-  - Sample p×p patch (p=768) with ≥1% wire pixels (`method.tex`, `method_yq.tex`).
-  - Build cond map from coarse logits via 1×1 conv; crop cond to patch region.
-  - Build binary location mask for patch region.
-  - Build channels [RGB + MinMax + cond + location] → `E` → `D_F`.
-- Losses:
-  - L_glo = CE(Softmax(`D_C(E(coarse))`), G_glo), where G_glo uses MaxPool downsample.
-  - L_loc = CE(Softmax(`D_F(E(fine))`), G_loc).
-  - L = L_glo + λ L_loc, λ=1 (`method_yq.tex`).
-- Optimization:
-  - AdamW (lr=6e-5, wd=0.01), poly schedule (power=1.0), ~40k iterations, batch ≈8 (tune by memory).
-  - AMP and grad accumulation recommended for stability/memory.
-## Inference Pipeline
-- Coarse pass:
-  - Downsample to 1024×1024; predict coarse probability/logits.
-- Window proposal (sliding window on full-res):
-  - Tile with patch size p=768. Overlap ~128px (configurable). Compute wire fraction within each window from coarse prediction (prob>0.5).
-  - If fraction ≥ α (default 0.01), run fine refinement on that patch; else skip (Table `tables/thresholds.tex`).
-- Fine refinement + stitching:
-  - For selected windows, build fine input with cond crop + location mask; predict logits.
-  - Stitch logits into full-res canvas; average in overlaps; final argmax over classes.
-- Outputs: full-res binary mask, plus optional probability map.
-## Metrics and Reporting
-- Implement: IoU, F1, Precision, Recall (global, and optionally per-size bins if available) matching `tables/component.tex`.
-- Validate α trade-offs following `tables/thresholds.tex`.
-## Configuration Surface (key)
-- Backbone/weights: `mit_b2` (pretrained ImageNet-1K).
-- Sizes: `p=768`, `coarse_train=512`, `coarse_test=1024`, `overlap=128`.
-- Conditioning: `cond_from='coarse_logits_1x1'`, `cond_crop='patch'`.
-- MinMax: `enable=true`, `kernel=6`.
-- Label: `coarse_label_downsample='maxpool'`.
-- Training: `iters=40000`, `batch=8`, `lr=6e-5`, `wd=0.01`, `schedule='poly'`, `power=1.0`.
-- Inference: `alpha=0.01`, `prob_threshold=0.5` for wire fraction, `stitch='avg_logits'`.
-## Risks / Gotchas
-- Channel expansion requires careful initialization; confirm no NaNs and stable early training.
-- Precise spatial alignment of cond and location mask with the patch is critical. Add assertions/tests.
-- Even-sized MinMax window (6×6) requires careful padding to maintain alignment.
-- Memory with p=768 and MiT-B3 may need tuning (AMP, batch size, overlap).
-## Milestones
-1) Skeleton + configs + metrics.
-2) Encoder channel expansion + two decoders + 1×1 cond.
-3) MinMax (6×6) + MaxPool label downsampling.
-4) Training loop with ≥1% wire patch sampling.
-5) Inference α-threshold + stitching.
-6) Ablations toggles + scripts + README.
-7) Tests (channel wiring, cond/mask alignment, stitching correctness).
-## References (paper sources)
-- `paper-tex/sections/method.tex`: Two-stage design, shared encoder, 1×1 cond, training/inference sizes, optimizer/schedule.
-- `paper-tex/sections/method_yq.tex`: CE losses, λ, sliding-window with α, MinMax & MaxPool rationale.
-- `paper-tex/figure_tex/pipeline.tex`: System overview; MinMax concatenation.
-- `paper-tex/tables/component.tex`: Ablation of MinMax/MaxPool/coarse.
-- `paper-tex/tables/logit.tex`: Conditioning variants.
-- `paper-tex/tables/thresholds.tex`: α vs speed/quality.

WireSegHR.pdf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:7f6db9a06575398aeb0903c8d19e68f27d983223ca128ff3a3ae12a8aeb8f4a9
-size 17039360

scripts/drive-viewer-key-readme.md ADDED Viewed

	@@ -0,0 +1,16 @@

+## Setting Up Google Drive Access via Service Account (PyDrive2)
+You'll need a service account to fetch the WireSegHR dataset using a script in this folder. Follow the steps below:
+1. **Create a Service Account**
+   - Navigate to the Google Cloud Console. Under **IAM & Admin → Service Accounts**, create a new service account.
+   - Assign it a Viewer role.
+2. **Generate and Download JSON Key**
+   - In the service account details, go to **Keys → Add Key → Create new key**, choose **JSON**, and download the key file.
+   - Save this file locally to this repo as `secrets/drive-json.json`.
+3. **Share the Drive Folder or Files**
+   - Grant the service account access to the target Drive folder - https://drive.google.com/drive/folders/1fgy3wn_yuHEeMNbfiHNVl1-jEdYOfu6p - using its service account email. Go to the folder in Google Drive, click on the share button, and add the service account email with Viewer permissions.

scripts/export_onnx_trt.py DELETED Viewed

@@ -1,167 +0,0 @@
-import argparse
-import os
-import pprint
-import shutil
-import subprocess
-from typing import Tuple
-import torch
-import tensorrt as trt
-from src.wireseghr.model import WireSegHR
-from pathlib import Path
-class CoarseModule(torch.nn.Module):
-    def __init__(self, core: WireSegHR):
-        super().__init__()
-        self.core = core
-    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
-        logits, cond = self.core.forward_coarse(x)
-        return logits, cond
-class FineModule(torch.nn.Module):
-    def __init__(self, core: WireSegHR):
-        super().__init__()
-        self.core = core
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        logits = self.core.forward_fine(x)
-        return logits
-def build_model(cfg: dict, device: torch.device) -> WireSegHR:
-    pretrained_flag = bool(cfg.get("pretrained", False))
-    model = WireSegHR(backbone=cfg["backbone"], in_channels=6, pretrained=pretrained_flag)
-    model = model.to(device)
-    return model
-def main():
-    parser = argparse.ArgumentParser(description="Export WireSegHR to ONNX and TensorRT")
-    parser.add_argument("--config", type=str, default="configs/default.yaml")
-    parser.add_argument("--ckpt", type=str, default="", help="Path to checkpoint .pt")
-    parser.add_argument("--out_dir", type=str, default="exports")
-    parser.add_argument("--coarse_size", type=int, default=1024)
-    parser.add_argument("--fine_patch_size", type=int, default=1024)
-    parser.add_argument("--opset", type=int, default=17)
-    parser.add_argument("--trtexec", type=str, default="", help="Optional path to trtexec to build TRT engines")
-    parser.add_argument("--build_trt", action="store_true", help="Build TensorRT engines after ONNX export")
-    args = parser.parse_args()
-    import yaml
-    with open(args.config, "r") as f:
-        cfg = yaml.safe_load(f)
-    print("[export] Loaded config:")
-    pprint.pprint(cfg)
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    model = build_model(cfg, device)
-    ckpt_path = args.ckpt if args.ckpt else cfg.get("resume", "")
-    if ckpt_path:
-        assert Path(ckpt_path).is_file(), f"Checkpoint not found: {ckpt_path}"
-        print(f"[export] Loading checkpoint: {ckpt_path}")
-        state = torch.load(ckpt_path, map_location=device)
-        model.load_state_dict(state["model"])  # expects dict with key 'model'
-    model.eval()
-    Path(args.out_dir).mkdir(parents=True, exist_ok=True)
-    # Prepare dummy inputs (static shapes for best TRT performance)
-    coarse_in = torch.randn(1, 6, args.coarse_size, args.coarse_size, device=device)
-    fine_in = torch.randn(1, 6, args.fine_patch_size, args.fine_patch_size, device=device)
-    # Coarse export
-    coarse_wrapper = CoarseModule(model).to(device).eval()
-    coarse_onnx = Path(args.out_dir) / f"wireseghr_coarse_{args.coarse_size}.onnx"
-    print(f"[export] Exporting COARSE to {coarse_onnx}")
-    torch.onnx.export(
-        coarse_wrapper,
-        coarse_in,
-        str(coarse_onnx),
-        export_params=True,
-        opset_version=args.opset,
-        do_constant_folding=True,
-        input_names=["x_coarse"],
-        output_names=["logits", "cond"],
-        dynamic_axes=None,
-        dynamo=True
-    )
-    # Fine export
-    fine_wrapper = FineModule(model).to(device).eval()
-    fine_onnx = Path(args.out_dir) / f"wireseghr_fine_{args.fine_patch_size}.onnx"
-    print(f"[export] Exporting FINE to {fine_onnx}")
-    torch.onnx.export(
-        fine_wrapper,
-        fine_in,
-        str(fine_onnx),
-        export_params=True,
-        opset_version=args.opset,
-        do_constant_folding=True,
-        input_names=["x_fine"],
-        output_names=["logits"],
-        dynamic_axes=None,
-    )
-    # Optional TensorRT building via trtexec; fallback to Python API if unavailable
-    if args.build_trt:
-        trtexec_path = args.trtexec if args.trtexec else shutil.which("trtexec")
-        coarse_engine = Path(args.out_dir) / f"wireseghr_coarse_{args.coarse_size}.engine"
-        fine_engine = Path(args.out_dir) / f"wireseghr_fine_{args.fine_patch_size}.engine"
-        if trtexec_path:
-            def build_engine_cli(onnx_path: str, engine_path: str):
-                print(f"[export] Building TRT engine (trtexec): {engine_path}")
-                cmd = [
-                    trtexec_path,
-                    f"--onnx={onnx_path}",
-                    f"--saveEngine={engine_path}",
-                    "--explicitBatch",
-                    "--fp16",
-                ]
-                subprocess.run(cmd, check=True)
-            build_engine_cli(str(coarse_onnx), str(coarse_engine))
-            build_engine_cli(str(fine_onnx), str(fine_engine))
-        else:
-            print("[export] trtexec not found; building engines via TensorRT Python API")
-            def build_engine_py(onnx_path: str, engine_path: str):
-                logger = trt.Logger(trt.Logger.WARNING)
-                builder = trt.Builder(logger)
-                network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
-                parser = trt.OnnxParser(network, logger)
-                with open(str(onnx_path), "rb") as f:
-                    data = f.read()
-                ok = parser.parse(data)
-                if not ok:
-                    for i in range(parser.num_errors):
-                        print(f"[TRT][parser] {parser.get_error(i)}")
-                    raise RuntimeError("ONNX parse failed")
-                config = builder.create_builder_config()
-                config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)
-                if builder.platform_has_fast_fp16:
-                    config.set_flag(trt.BuilderFlag.FP16)
-                print(f"[export] Building TRT engine (Python): {engine_path}")
-                serialized = builder.build_serialized_network(network, config)
-                assert serialized is not None, "Failed to build TensorRT engine"
-                with open(str(engine_path), "wb") as f:
-                    f.write(serialized)
-            build_engine_py(coarse_onnx, coarse_engine)
-            build_engine_py(fine_onnx, fine_engine)
-    else:
-        print("[export] Skipping TensorRT engine build (use --build_trt to enable)")
-    print("[export] Done.")
-if __name__ == "__main__":
-    main()

scripts/trt_infer.py DELETED Viewed

@@ -1,488 +0,0 @@
-import argparse
-import os
-import pprint
-import time
-from typing import List, Tuple, Optional, Dict, Any
-import numpy as np
-import cv2
-# TensorRT + CUDA
-try:
-    import tensorrt as trt
-    import pycuda.autoinit  # noqa: F401  # initializes CUDA driver context
-    import pycuda.driver as cuda
-except Exception as e:  # pragma: no cover
-    raise RuntimeError(
-        "TensorRT runner requires 'tensorrt' and 'pycuda' Python packages and a valid CUDA/TensorRT install"
-    ) from e
-import yaml
-from pathlib import Path
-# ---- Utility: TRT engine wrapper ----
-class TrtEngine:
-    def __init__(self, engine_path: str):
-        assert Path(engine_path).is_file(), f"Engine not found: {engine_path}"
-        logger = trt.Logger(trt.Logger.ERROR)
-        with open(engine_path, 'rb') as f, trt.Runtime(logger) as runtime:
-            self.engine = runtime.deserialize_cuda_engine(f.read())
-        assert self.engine is not None, f"Failed to load engine: {engine_path}"
-        self.context = self.engine.create_execution_context()
-        # Default to profile 0
-        try:
-            self.context.active_optimization_profile = 0
-        except Exception:
-            pass
-        self.stream = cuda.Stream()
-        self.bindings: List[int] = [0] * self.engine.num_bindings
-        self.host_mem: Dict[int, Any] = {}
-        self.device_mem: Dict[int, Any] = {}
-    def _allocate_binding(self, idx: int, shape: Tuple[int, ...]):
-        dtype = trt.nptype(self.engine.get_binding_dtype(idx))
-        nbytes = int(np.prod(shape)) * np.dtype(dtype).itemsize
-        if idx in self.device_mem:
-            # Reuse if same size; else reallocate
-            old = self.device_mem[idx]
-            if old.size >= nbytes:
-                self.host_mem[idx] = np.empty(shape, dtype=dtype)
-                return
-            # free old and reallocate
-            del old
-        self.host_mem[idx] = np.empty(shape, dtype=dtype)
-        self.device_mem[idx] = cuda.mem_alloc(nbytes)
-    def infer(self, inputs: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
-        # Map names -> indices
-        name_to_idx = {self.engine.get_binding_name(i): i for i in range(self.engine.num_bindings)}
-        # Set input shapes (handle dynamic batch if present)
-        for name, arr in inputs.items():
-            idx = name_to_idx[name]
-            assert self.engine.binding_is_input(idx)
-            # Set shape if dynamic
-            shape = tuple(arr.shape)
-            try:
-                self.context.set_binding_shape(idx, shape)
-            except Exception:
-                # Static shape engines won't allow setting; assert it matches
-                eshape = tuple(self.engine.get_binding_shape(idx))
-                assert eshape == shape, f"Static engine expects {eshape}, got {shape} for input {name}"
-            self._allocate_binding(idx, shape)
-        # Allocate outputs for resolved shapes
-        for i in range(self.engine.num_bindings):
-            if not self.engine.binding_is_input(i):
-                shape = tuple(self.context.get_binding_shape(i))
-                assert all(s > 0 for s in shape), f"Unresolved output shape at binding {i}: {shape}"
-                self._allocate_binding(i, shape)
-        # Copy inputs H2D
-        for name, arr in inputs.items():
-            idx = name_to_idx[name]
-            h_arr = self.host_mem[idx]
-            assert h_arr.shape == arr.shape and h_arr.dtype == arr.dtype
-            cuda.memcpy_htod_async(self.device_mem[idx], arr, self.stream)
-            self.bindings[idx] = int(self.device_mem[idx])
-        # Set output bindings
-        for i in range(self.engine.num_bindings):
-            if not self.engine.binding_is_input(i):
-                self.bindings[i] = int(self.device_mem[i])
-        # Execute
-        self.context.execute_async_v2(self.bindings, self.stream.handle)
-        # D2H outputs
-        outputs: Dict[str, np.ndarray] = {}
-        for i in range(self.engine.num_bindings):
-            if not self.engine.binding_is_input(i):
-                name = self.engine.get_binding_name(i)
-                h_arr = self.host_mem[i]
-                cuda.memcpy_dtoh_async(h_arr, self.device_mem[i], self.stream)
-                outputs[name] = h_arr
-        self.stream.synchronize()
-        # Return copies to detach from internal buffers
-        return {k: np.array(v) for k, v in outputs.items()}
-# ---- Pre/post processing consistent with infer.py ----
-def _pad_for_minmax(kernel: int) -> Tuple[int, int, int, int]:
-    if (kernel % 2) == 0:
-        return (kernel // 2 - 1, kernel // 2, kernel // 2 - 1, kernel // 2)
-    else:
-        return (kernel // 2, kernel // 2, kernel // 2, kernel // 2)
-def _build_6ch_coarse(rgb: np.ndarray, coarse_size: int, minmax_enable: bool, minmax_kernel: int) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
-    # rgb: HxWx3 float32 [0,1]
-    H, W = int(rgb.shape[0]), int(rgb.shape[1])
-    # To match training/minmax in torch, we replicate exact pad + pool logic via torch CPU
-    import torch
-    import torch.nn.functional as F
-    t_img = torch.from_numpy(rgb.transpose(2, 0, 1)).unsqueeze(0).float()  # 1x3xHxW
-    y_t = 0.299 * t_img[:, 0:1] + 0.587 * t_img[:, 1:2] + 0.114 * t_img[:, 2:3]
-    if minmax_enable:
-        pad = _pad_for_minmax(minmax_kernel)
-        y_p = F.pad(y_t, pad, mode="replicate")
-        y_max_full = F.max_pool2d(y_p, kernel_size=minmax_kernel, stride=1)
-        y_min_full = -F.max_pool2d(-y_p, kernel_size=minmax_kernel, stride=1)
-    else:
-        y_min_full = y_t
-        y_max_full = y_t
-    # Resize for coarse
-    rgb_c = cv2.resize(rgb, (coarse_size, coarse_size), interpolation=cv2.INTER_LINEAR)
-    y_min_c = cv2.resize(y_min_full[0, 0].numpy(), (coarse_size, coarse_size), interpolation=cv2.INTER_LINEAR)
-    y_max_c = cv2.resize(y_max_full[0, 0].numpy(), (coarse_size, coarse_size), interpolation=cv2.INTER_LINEAR)
-    zeros_c = np.zeros((coarse_size, coarse_size), dtype=np.float32)
-    x6 = np.stack([
-        rgb_c[:, :, 0], rgb_c[:, :, 1], rgb_c[:, :, 2], y_min_c, y_max_c, zeros_c
-    ], axis=0)  # 6xHc x Wc
-    return x6.astype(np.float32), y_min_full[0, 0].numpy().astype(np.float32), y_max_full[0, 0].numpy().astype(np.float32), t_img.numpy().astype(np.float32)
-def _softmax_channel(x: np.ndarray, axis: int = 1) -> np.ndarray:
-    x_max = np.max(x, axis=axis, keepdims=True)
-    e = np.exp(x - x_max)
-    return e / np.sum(e, axis=axis, keepdims=True)
-def _tiled_fine_trt(
-    fine: TrtEngine,
-    t_img: np.ndarray,          # 1x3xHxW float32
-    cond_map: np.ndarray,       # 1x1xhxw float32
-    y_min_full: np.ndarray,     # HxW float32
-    y_max_full: np.ndarray,     # HxW float32
-    patch_size: int,
-    overlap: int,
-    fine_batch: int,
-) -> np.ndarray:
-    H, W = int(t_img.shape[2]), int(t_img.shape[3])
-    P = patch_size
-    stride = P - overlap
-    assert stride > 0
-    assert H >= P and W >= P
-    prob_sum = np.zeros((H, W), dtype=np.float32)
-    weight = np.zeros((H, W), dtype=np.float32)
-    hc4, wc4 = int(cond_map.shape[2]), int(cond_map.shape[3])
-    ys = list(range(0, H - P + 1, stride))
-    if ys[-1] != (H - P):
-        ys.append(H - P)
-    xs = list(range(0, W - P + 1, stride))
-    if xs[-1] != (W - P):
-        xs.append(W - P)
-    coords: List[Tuple[int, int]] = [(y0, x0) for y0 in ys for x0 in xs]
-    # Run with batches supported by engine if dynamic; otherwise enforce 1
-    input_name = None
-    for i in range(fine.engine.num_bindings):
-        if fine.engine.binding_is_input(i):
-            input_name = fine.engine.get_binding_name(i)
-            shape_decl = fine.engine.get_binding_shape(i)
-            break
-    assert input_name is not None
-    dynamic_batch = -1 in list(shape_decl)
-    batch_allowed = fine_batch if dynamic_batch else 1
-    for i0 in range(0, len(coords), batch_allowed):
-        batch_coords = coords[i0 : i0 + batch_allowed]
-        B = len(batch_coords)
-        xs_list: List[np.ndarray] = []
-        for (y0, x0) in batch_coords:
-            y1, x1 = y0 + P, x0 + P
-            y0c = (y0 * hc4) // H
-            y1c = ((y1 * hc4) + H - 1) // H
-            x0c = (x0 * wc4) // W
-            x1c = ((x1 * wc4) + W - 1) // W
-            cond_sub = cond_map[:, :, y0c:y1c, x0c:x1c][0, 0]
-            cond_patch = cv2.resize(cond_sub, (P, P), interpolation=cv2.INTER_LINEAR)
-            rgb_patch = t_img[0, :, y0:y1, x0:x1]  # 3xPxP
-            ymin_patch = y_min_full[y0:y1, x0:x1][None, ...]  # 1xPxP
-            ymax_patch = y_max_full[y0:y1, x0:x1][None, ...]  # 1xPxP
-            x6 = np.concatenate([rgb_patch, ymin_patch, ymax_patch, cond_patch[None, ...]], axis=0)
-            xs_list.append(x6)
-        x_batch = np.stack(xs_list, axis=0).astype(np.float32)  # Bx6xPxP
-        outputs = fine.infer({input_name: x_batch})
-        # Assume single output named 'logits' or similar; take the first one
-        out_name = [n for n in outputs.keys()][0]
-        logits = outputs[out_name]  # Bx2xPxP
-        prob = _softmax_channel(logits, axis=1)[:, 1, :, :]  # BxPxP
-        for bi, (y0, x0) in enumerate(batch_coords):
-            y1, x1 = y0 + P, x0 + P
-            prob_sum[y0:y1, x0:x1] += prob[bi]
-            weight[y0:y1, x0:x1] += 1.0
-    prob_full = prob_sum / weight
-    return prob_full.astype(np.float32)
-def _coarse_trt(coarse: TrtEngine, rgb: np.ndarray, coarse_size: int, minmax_enable: bool, minmax_kernel: int) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
-    x6, y_min_full, y_max_full, t_img = _build_6ch_coarse(rgb, coarse_size, minmax_enable, minmax_kernel)
-    # Engine input name
-    input_name = None
-    for i in range(coarse.engine.num_bindings):
-        if coarse.engine.binding_is_input(i):
-            input_name = coarse.engine.get_binding_name(i)
-            break
-    assert input_name is not None
-    x = x6[None, ...].astype(np.float32)  # 1x6xHc x Wc
-    outputs = coarse.infer({input_name: x})
-    # Identify outputs: we expect 2 outputs (logits 1x2xHc x Wc, cond 1x1xHc x Wc)
-    assert len(outputs) == 2, f"Coarse engine must have 2 outputs, got {list(outputs.keys())}"
-    # Determine which is cond by channel dim =1
-    names = list(outputs.keys())
-    a, b = outputs[names[0]], outputs[names[1]]
-    if a.shape[1] == 1:
-        cond = a
-        logits = b
-    else:
-        cond = b
-        logits = a
-    # Coarse prob upsampled to full HxW (optional)
-    prob_c = _softmax_channel(logits, axis=1)[:, 1:2]
-    H, W = int(t_img.shape[2]), int(t_img.shape[3])
-    prob_up = cv2.resize(prob_c[0, 0], (W, H), interpolation=cv2.INTER_LINEAR)
-    return prob_up.astype(np.float32), cond.astype(np.float32), t_img.astype(np.float32), y_min_full, y_max_full
-# ---- Inference API ----
-def infer_image_trt(
-    coarse: TrtEngine,
-    fine: TrtEngine,
-    img_path: str,
-    cfg: dict,
-    out_dir: Optional[str] = None,
-    save_prob: bool = False,
-    prob_thresh: Optional[float] = None,
-) -> Tuple[np.ndarray, np.ndarray]:
-    assert Path(img_path).is_file(), f"Image not found: {img_path}"
-    bgr = cv2.imread(img_path, cv2.IMREAD_COLOR)
-    assert bgr is not None, f"Failed to read {img_path}"
-    rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
-    coarse_size = int(cfg["coarse"]["test_size"])
-    patch_size = int(cfg["inference"]["fine_patch_size"])  # 1024 for inference
-    overlap = int(cfg["fine"]["overlap"])
-    minmax_enable = bool(cfg["minmax"]["enable"])
-    minmax_kernel = int(cfg["minmax"]["kernel"])
-    if prob_thresh is None:
-        prob_thresh = float(cfg["inference"]["prob_threshold"])
-    prob_c, cond_map, t_img, y_min_full, y_max_full = _coarse_trt(
-        coarse, rgb, coarse_size, minmax_enable, minmax_kernel
-    )
-    prob_f = _tiled_fine_trt(
-        fine,
-        t_img,
-        cond_map,
-        y_min_full,
-        y_max_full,
-        patch_size,
-        overlap,
-        int(cfg.get("eval", {}).get("fine_batch", 16)),
-    )
-    pred = (prob_f > prob_thresh).astype(np.uint8) * 255
-    if out_dir is not None:
-        Path(out_dir).mkdir(parents=True, exist_ok=True)
-        stem = Path(img_path).stem
-        out_mask = Path(out_dir) / f"{stem}_pred.png"
-        cv2.imwrite(str(out_mask), pred)
-        if save_prob:
-            out_prob = Path(out_dir) / f"{stem}_prob.npy"
-            np.save(str(out_prob), prob_f.astype(np.float32))
-    return pred, prob_f
-def main():
-    parser = argparse.ArgumentParser(description="WireSegHR TensorRT Inference")
-    parser.add_argument("--config", type=str, default="configs/default.yaml")
-    parser.add_argument("--coarse_engine", type=str, required=True)
-    parser.add_argument("--fine_engine", type=str, required=True)
-    parser.add_argument("--image", type=str, default="", help="Path to single image")
-    parser.add_argument("--images_dir", type=str, default="", help="Directory with images")
-    parser.add_argument("--out", type=str, default="outputs/trt_infer")
-    parser.add_argument("--save_prob", action="store_true")
-    # Benchmarking
-    parser.add_argument("--benchmark", action="store_true")
-    parser.add_argument("--bench_images_dir", type=str, default="")
-    parser.add_argument("--bench_limit", type=int, default=0)
-    parser.add_argument("--bench_warmup", type=int, default=2)
-    parser.add_argument("--bench_size_filter", type=str, default="")
-    parser.add_argument("--bench_report_json", type=str, default="")
-    args = parser.parse_args()
-    with open(args.config, "r") as f:
-        cfg = yaml.safe_load(f)
-    print("[TRT][infer] Loaded config:")
-    pprint.pprint(cfg)
-    coarse = TrtEngine(args.coarse_engine)
-    fine = TrtEngine(args.fine_engine)
-    if args.benchmark:
-        bench_dir = args.bench_images_dir or cfg["data"]["test_images"]
-        assert Path(bench_dir).is_dir(), f"Not a directory: {bench_dir}"
-        size_filter: Optional[Tuple[int, int]] = None
-        if args.bench_size_filter:
-            try:
-                h_str, w_str = args.bench_size_filter.lower().split("x")
-                size_filter = (int(h_str), int(w_str))
-            except Exception:
-                raise AssertionError(
-                    f"Invalid --bench_size_filter format: {args.bench_size_filter} (use HxW)"
-                )
-        img_files = sorted(
-            [
-                str(Path(bench_dir) / p)
-                for p in os.listdir(bench_dir)
-                if p.lower().endswith((".jpg", ".jpeg"))
-            ]
-        )
-        assert len(img_files) > 0, f"No .jpg/.jpeg in {bench_dir}"
-        if size_filter is not None:
-            sel: List[str] = []
-            for p in img_files:
-                im = cv2.imread(p, cv2.IMREAD_COLOR)
-                assert im is not None
-                if im.shape[0] == size_filter[0] and im.shape[1] == size_filter[1]:
-                    sel.append(p)
-            img_files = sel
-            assert len(img_files) > 0, (
-                f"No images matching {size_filter[0]}x{size_filter[1]} in {bench_dir}"
-            )
-        if args.bench_limit > 0:
-            img_files = img_files[: args.bench_limit]
-        print(f"[TRT][bench] Images: {len(img_files)} from {bench_dir}")
-        print(f"[TRT][bench] Warmup: {args.bench_warmup}")
-        timings: List[Dict[str, Any]] = []
-        # Warmup
-        for i in range(min(args.bench_warmup, len(img_files))):
-            infer_image_trt(coarse, fine, img_files[i], cfg, out_dir=None, save_prob=False)
-        # Timed runs
-        for p in img_files[args.bench_warmup :]:
-            t0 = time.perf_counter()
-            bgr = cv2.imread(p, cv2.IMREAD_COLOR)
-            assert bgr is not None
-            rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
-            coarse_size = int(cfg["coarse"]["test_size"])
-            minmax_enable = bool(cfg["minmax"]["enable"])
-            minmax_kernel = int(cfg["minmax"]["kernel"])
-            c0 = time.perf_counter()
-            prob_c, cond_map, t_img, y_min_full, y_max_full = _coarse_trt(
-                coarse, rgb, coarse_size, minmax_enable, minmax_kernel
-            )
-            c1 = time.perf_counter()
-            patch_size = int(cfg["inference"]["fine_patch_size"])  # 1024
-            overlap = int(cfg["fine"]["overlap"])
-            prob_f = _tiled_fine_trt(
-                fine,
-                t_img,
-                cond_map,
-                y_min_full,
-                y_max_full,
-                patch_size,
-                overlap,
-                int(cfg.get("eval", {}).get("fine_batch", 16)),
-            )
-            c2 = time.perf_counter()
-            timings.append(
-                {
-                    "path": p,
-                    "H": int(t_img.shape[2]),
-                    "W": int(t_img.shape[3]),
-                    "t_coarse_ms": (c1 - c0) * 1000.0,
-                    "t_fine_ms": (c2 - c1) * 1000.0,
-                    "t_total_ms": (c2 - t0) * 1000.0,
-                }
-            )
-        if len(timings) == 0:
-            print("[TRT][bench] Nothing to benchmark after warmup.")
-            return
-        def _agg(key: str) -> Tuple[float, float, float]:
-            vals = sorted([t[key] for t in timings])
-            n = len(vals)
-            p50 = vals[n // 2]
-            p95 = vals[min(n - 1, int(0.95 * (n - 1)))]
-            avg = sum(vals) / n
-            return avg, p50, p95
-        avg_c, p50_c, p95_c = _agg("t_coarse_ms")
-        avg_f, p50_f, p95_f = _agg("t_fine_ms")
-        avg_t, p50_t, p95_t = _agg("t_total_ms")
-        print("[TRT][bench] Results (ms):")
-        print(f"  Coarse  avg={avg_c:.2f}  p50={p50_c:.2f}  p95={p95_c:.2f}")
-        print(f"  Fine    avg={avg_f:.2f}  p50={p50_f:.2f}  p95={p95_f:.2f}")
-        print(f"  Total   avg={avg_t:.2f}  p50={p50_t:.2f}  p95={p95_t:.2f}")
-        print(f"  Target  < 1000 ms per 3000x4000 image: {'YES' if p50_t < 1000.0 else 'NO'}")
-        if args.bench_report_json:
-            import json
-            report = {
-                "summary": {
-                    "avg_ms": avg_t,
-                    "p50_ms": p50_t,
-                    "p95_ms": p95_t,
-                    "avg_coarse_ms": avg_c,
-                    "avg_fine_ms": avg_f,
-                    "images": len(timings),
-                },
-                "per_image": timings,
-            }
-            with open(args.bench_report_json, "w") as f:
-                json.dump(report, f, indent=2)
-        return
-    # Non-benchmark single/directory
-    assert (args.image != "") ^ (args.images_dir != ""), "Provide exactly one of --image or --images_dir"
-    if args.image:
-        infer_image_trt(coarse, fine, args.image, cfg, out_dir=args.out, save_prob=args.save_prob)
-        print("[TRT][infer] Done.")
-        return
-    img_dir = args.images_dir
-    assert Path(img_dir).is_dir()
-    Path(args.out).mkdir(parents=True, exist_ok=True)
-    img_files = sorted([p for p in os.listdir(img_dir) if p.lower().endswith((".jpg", ".jpeg"))])
-    assert len(img_files) > 0
-    for name in img_files:
-        p = str(Path(img_dir) / name)
-        infer_image_trt(coarse, fine, p, cfg, out_dir=args.out, save_prob=args.save_prob)
-    print("[TRT][infer] Done.")
-if __name__ == "__main__":
-    main()

train.py CHANGED Viewed

@@ -401,30 +401,6 @@ def main():
     print("[WireSegHR][train] Done.")
-def _sample_batch_same_size(
-    dset: WireSegDataset, batch_size: int
-) -> Tuple[List[np.ndarray], List[np.ndarray]]:
-    # Use precomputed size bins to sample a batch from a single (H, W) bin
-    assert len(dset) > 0
-    bins = dset.size_bins
-    keys = list(bins.keys())
-    random.shuffle(keys)
-    chosen_key = None
-    for hw in keys:
-        if len(bins[hw]) >= batch_size:
-            chosen_key = hw
-            break
-    assert chosen_key is not None, f"No size bin with at least {batch_size} samples"
-    pool = bins[chosen_key]
-    idxs = np.random.choice(pool, size=batch_size, replace=False)
-    imgs: List[np.ndarray] = []
-    masks: List[np.ndarray] = []
-    for idx in idxs:
-        item = dset[int(idx)]
-        imgs.append(item["image"])
-        masks.append(item["mask"])
-    return imgs, masks
 def _prepare_batch(
     imgs: List[np.ndarray],

     print("[WireSegHR][train] Done.")
 def _prepare_batch(
     imgs: List[np.ndarray],