MRiabov commited on 9 days ago

Commit

8ea2eff

1 Parent(s): 5cab910

Project skeleton generated

Browse files

Files changed (29) hide show

.gitignore +116 -0
.windsurf/rules/creating-new-class-variables.md +18 -0
.windsurf/rules/defensive-logic.md +8 -0
.windsurf/rules/running-tests.md +6 -0
.windsurf/rules/when-pytest-not-found.md +6 -0
README.md +31 -0
SEGMENTATION_PLAN.md +130 -0
WireSegHR-tex.tar.gz +0 -3
configs/default.yaml +42 -0
pytest.ini +2 -0
requirements.txt +8 -0
src/wireseghr/__init__.py +6 -0
src/wireseghr/data/__init__.py +7 -0
src/wireseghr/data/dataset.py +54 -0
src/wireseghr/data/sampler.py +14 -0
src/wireseghr/data/transforms.py +9 -0
src/wireseghr/infer.py +27 -0
src/wireseghr/metrics.py +9 -0
src/wireseghr/model/__init__.py +16 -0
src/wireseghr/model/condition.py +14 -0
src/wireseghr/model/decoder.py +58 -0
src/wireseghr/model/encoder.py +42 -0
src/wireseghr/model/label_downsample.py +26 -0
src/wireseghr/model/minmax.py +29 -0
src/wireseghr/model/model.py +42 -0
src/wireseghr/train.py +25 -0
src/wireseghr/utils.py +2 -0
tests/test_model_forward.py +19 -0
tests/test_skeleton_imports.py +7 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,116 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+pytestdebug.log
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# pipenv
+Pipfile.lock
+# poetry
+poetry.lock
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype
+.pytype/
+# Cython debug symbols
+cython_debug/
+# VS Code
+.vscode/
+# Mac
+.DS_Store
+# Environment
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/

.windsurf/rules/creating-new-class-variables.md ADDED Viewed

	@@ -0,0 +1,18 @@

+---
+trigger: model_decision
+description: When creating new class variable
+---
+It is preferable to create class variable docstrings instead of comments. E.g:
+```py
+class Class123:
+    var1: int
+    """Variable description"""
+```
+is preferred over
+```py
+class Class123:
+    # Variable description
+    var1: int
+```

.windsurf/rules/defensive-logic.md ADDED Viewed

	@@ -0,0 +1,8 @@

+---
+trigger: always_on
+---
+When deciding if to write defensive logic, e.g. dimensionality handling: `tensor1=tensor1.unsqueeze() if tensor1.ndim==1 else tensor1`, or None handling: `var1=var1 if var1 else torch.zeros(...)`, just don't write these things. In my code, shapes are always static, and there is one execution path for all code. I prefer `assert` over defensive logic. If you are writing something to fix the tests and this seems necessary, it's likely that the tests are setup incorrectly.
+The reason why I don't want it is because defensive logic leads to silent failures, and these are bad for debugging.
+In addition, writing "int()" type casting is also a piece of defensive logic that slows down the application. Don't write it unless really necessary e.g. putting int to string. In most cases, indexing with a tensor should be better.

.windsurf/rules/running-tests.md ADDED Viewed

	@@ -0,0 +1,6 @@

+---
+trigger: model_decision
+description: When deciding which tests to run
+---
+When calling `pytest`, it's better to never execute the whole test suite because it takes over 5 minutes to run. So, never run `pytest -q` without test file or `-k`. Instead, run an individual test function, class, module or a combination of them.

.windsurf/rules/when-pytest-not-found.md ADDED Viewed

	@@ -0,0 +1,6 @@

+---
+trigger: model_decision
+description: When pytest is not found
+---
+Sometimes when running tests, `pytest` would be not found. This means that `venv` is not activated, and you can activate it with `source ../.venv/bin/activate`. After that, it will certainly work. Do not try to run `./venv/bin/pytest` directly, activate the venv, and then run `pytest` as usual.

README.md ADDED Viewed

	@@ -0,0 +1,31 @@

+# WireSegHR (Segmentation Only)
+This repository contains the segmentation-only implementation plan and code skeleton for the two-stage WireSegHR model (global-to-local, shared encoder).
+- Paper sources live under `paper-tex/`.
+- Long-term navigation plan: `SEGMENTATION_PLAN.md`.
+## Quick Start (skeleton)
+1) Create a virtual environment and install requirements:
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+2) Print configuration and verify the skeleton runs:
+```bash
+python src/wireseghr/train.py --config configs/default.yaml
+python src/wireseghr/infer.py --config configs/default.yaml --image /path/to/image.png
+```
+3) Next steps:
+- Implement encoder/decoders/condition/minmax/label downsampling per `SEGMENTATION_PLAN.md`.
+- Implement training and inference logic, then metrics and ablations.
+## Notes
+- This is a segmentation-only codebase. Inpainting is out of scope here.
+- Defaults locked: MiT-B3 encoder, patch size 768, MinMax 6×6, global+binary mask conditioning with patch-cropped global map.

SEGMENTATION_PLAN.md ADDED Viewed

	@@ -0,0 +1,130 @@

+# WireSegHR Segmentation-Only Implementation Plan
+This plan distills the model and pipeline described in the paper sources:
+- `paper-tex/sections/method.tex`
+- `paper-tex/sections/method_yq.tex`
+- `paper-tex/figure_tex/pipeline.tex`
+- `paper-tex/tables/{component,logit,thresholds}.tex`
+Focus: segmentation only (no dataset collection or inpainting).
+## Decisions and Defaults (locked)
+- Backbone: SegFormer MiT-B3 (shared encoder `E`).
+- Fine/local patch size p: 768.
+- Conditioning: global map + binary location mask by default (Table `tables/logit.tex`).
+- Conditioning map scope: patch-cropped from the global map per `paper-tex/sections/method_yq.tex` (no full-image concatenation variant).
+- MinMax feature augmentation: luminance min and max with a fixed 6×6 window; channels concatenated to inputs (Figure `figure_tex/pipeline.tex`, Sec. “Wire Feature Preservation” in `method_yq.tex`).
+- Loss: CE on both branches, λ = 1 (`method_yq.tex`).
+- α-threshold for refining windows: default 0.01 (Table `tables/thresholds.tex`).
+- Coarse input size: train 512×512; test 1024×1024 (`method.tex`).
+- Optim: AdamW (lr=6e-5, wd=0.01, poly schedule with power=1), ~40k iters, batch size ~8 (`method.tex`).
+## Project Structure
+- `configs/`
+  - `default.yaml` (backbone=mit_b3, p=768, coarse_train=512, coarse_test=1024, alpha=0.01, minmax=true, kernel=6, maxpool_label=true, cond_variant=global+binary_mask)
+- `src/wireseghr/`
+  - `model/`
+    - `encoder.py` (SegFormer MiT-B3, N_in channels expansion)
+    - `decoder.py` (two MLP decoders `D_C`, `D_F` for 2 classes)
+    - `condition.py` (1×1 conv to collapse coarse 2-ch logits → 1-ch cond)
+    - `minmax.py` (6×6 luminance min/max filtering)
+    - `label_downsample.py` (MaxPool-based coarse GT downsampling)
+  - `data/`
+    - `dataset.py` (image/mask loading, full-res to coarse/fine inputs)
+    - `sampler.py` (balanced patch sampling with ≥1% wire pixels)
+    - `transforms.py` (scaling, rotation, flip, photometric distortion)
+  - `train.py` (end-to-end two-branch training)
+  - `infer.py` (coarse-to-fine sliding-window inference + stitching)
+  - `metrics.py` (IoU, F1, Precision, Recall)
+  - `utils.py` (misc: overlap blending, seeding, logging)
+- `tests/` (unit tests for channel wiring, cond alignment, stitching)
+- `README.md` (segmentation-only usage)
+## Model Specification
+- Shared encoder `E`: SegFormer MiT-B3.
+  - Input channels (default): 3 (RGB) + 2 (MinMax) + 1 (global cond) + 1 (binary location) = 7.
+  - For the coarse pass, the cond and location channels are zeros to keep channel count consistent (`method_yq.tex`).
+  - Weight init for extra channels: copy mean of RGB conv weights or zero-init.
+- Decoders: two SegFormer MLP decoders
+  - `D_C`: coarse logits (2 channels) at coarse resolution.
+  - `D_F`: fine logits (2 channels) at patch resolution p×p.
+- Conditioning to fine branch (default):
+  - Take coarse pre-softmax logits (2-ch), apply 1×1 conv → 1-ch cond map (`method.tex`).
+  - Binary location mask: 1 inside current patch region (in full-image coordinates), 0 elsewhere.
+  - Pass patch-aligned cond crop and binary mask as channels to the fine branch input.
+- Notes:
+  - We expose a config toggle to switch conditioning variant between: `global+binary_mask` (default) and `global_only` (Table `tables/logit.tex`).
+  - We follow the published version (`paper-tex/sections/method_yq.tex`) and use patch-cropped conditioning exclusively; no full-image conditioning variant will be implemented.
+## Data and Preprocessing
+- MinMax luminance features (both branches):
+  - Y = 0.299R + 0.587G + 0.114B.
+  - Y_min = min filter (6×6), Y_max = max filter (6×6).
+  - Concat [Y_min, Y_max] to the input image channels.
+- Coarse GT label generation (MaxPool):
+  - Downsample full-res mask to coarse size with max-pooling to prevent wire vanishing (`method_yq.tex`).
+- Normalization: standard mean/std per backbone; apply consistently across channels (new channels can be mean=0, std=1 by convention, or min-max scaled).
+## Training Pipeline
+- Augment the full-res image (scaling, rotation, horizontal flip, photometric distortion) before constructing coarse/fine inputs (`method.tex`).
+- Coarse input: downsample augmented full image to 512×512; build channels [RGB+MinMax+zeros(2)] → `E` → `D_C`.
+- Fine input (per iteration select 1–k patches):
+  - Sample p×p patch (p=768) with ≥1% wire pixels (`method.tex`, `method_yq.tex`).
+  - Build cond map from coarse logits via 1×1 conv; crop cond to patch region.
+  - Build binary location mask for patch region.
+  - Build channels [RGB + MinMax + cond + location] → `E` → `D_F`.
+- Losses:
+  - L_glo = CE(Softmax(`D_C(E(coarse))`), G_glo), where G_glo uses MaxPool downsample.
+  - L_loc = CE(Softmax(`D_F(E(fine))`), G_loc).
+  - L = L_glo + λ L_loc, λ=1 (`method_yq.tex`).
+- Optimization:
+  - AdamW (lr=6e-5, wd=0.01), poly schedule (power=1.0), ~40k iterations, batch ≈8 (tune by memory).
+  - AMP and grad accumulation recommended for stability/memory.
+## Inference Pipeline
+- Coarse pass:
+  - Downsample to 1024×1024; predict coarse probability/logits.
+- Window proposal (sliding window on full-res):
+  - Tile with patch size p=768. Overlap ~128px (configurable). Compute wire fraction within each window from coarse prediction (prob>0.5).
+  - If fraction ≥ α (default 0.01), run fine refinement on that patch; else skip (Table `tables/thresholds.tex`).
+- Fine refinement + stitching:
+  - For selected windows, build fine input with cond crop + location mask; predict logits.
+  - Stitch logits into full-res canvas; average in overlaps; final argmax over classes.
+- Outputs: full-res binary mask, plus optional probability map.
+## Metrics and Reporting
+- Implement: IoU, F1, Precision, Recall (global, and optionally per-size bins if available) matching `tables/component.tex`.
+- Validate α trade-offs following `tables/thresholds.tex`.
+- Ablations: MinMax on/off, MaxPool on/off, conditioning variant (Table `tables/logit.tex`).
+## Configuration Surface (key)
+- Backbone/weights: `mit_b3` (pretrained ImageNet-1K).
+- Sizes: `p=768`, `coarse_train=512`, `coarse_test=1024`, `overlap=128`.
+- Conditioning: `use_binary_location=true`, `cond_from='coarse_logits_1x1'`, `cond_crop='patch'`.
+- MinMax: `enable=true`, `kernel=6`.
+- Label: `coarse_label_downsample='maxpool'`.
+- Training: `iters=40000`, `batch=8`, `lr=6e-5`, `wd=0.01`, `schedule='poly'`, `power=1.0`.
+- Inference: `alpha=0.01`, `prob_threshold=0.5` for wire fraction, `stitch='avg_logits'`.
+## Risks / Gotchas
+- Channel expansion requires careful initialization; confirm no NaNs and stable early training.
+- Precise spatial alignment of cond and location mask with the patch is critical. Add assertions/tests.
+- Even-sized MinMax window (6×6) requires careful padding to maintain alignment.
+- Memory with p=768 and MiT-B3 may need tuning (AMP, batch size, overlap).
+## Milestones
+1) Skeleton + configs + metrics.
+2) Encoder channel expansion + two decoders + 1×1 cond.
+3) MinMax (6×6) + MaxPool label downsampling.
+4) Training loop with ≥1% wire patch sampling.
+5) Inference α-threshold + stitching.
+6) Ablations toggles + scripts + README.
+7) Tests (channel wiring, cond/mask alignment, stitching correctness).
+## References (paper sources)
+- `paper-tex/sections/method.tex`: Two-stage design, shared encoder, 1×1 cond, training/inference sizes, optimizer/schedule.
+- `paper-tex/sections/method_yq.tex`: CE losses, λ, sliding-window with α, MinMax & MaxPool rationale.
+- `paper-tex/figure_tex/pipeline.tex`: System overview; MinMax concatenation.
+- `paper-tex/tables/component.tex`: Ablation of MinMax/MaxPool/coarse.
+- `paper-tex/tables/logit.tex`: Conditioning variants.
+- `paper-tex/tables/thresholds.tex`: α vs speed/quality.

WireSegHR-tex.tar.gz DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:0a17096a2eaad07f51345426465697fcf0ee1a0c5b54aa6742b4ac23406f6bc4
-size 33690376

configs/default.yaml ADDED Viewed

	@@ -0,0 +1,42 @@

+# Default configuration for WireSegHR (segmentation-only)
+backbone: mit_b3
+coarse:
+  train_size: 512
+  test_size: 1024
+fine:
+  patch_size: 768
+  overlap: 128
+conditioning:
+  use_binary_location: true
+  cond_from: coarse_logits_1x1
+  cond_crop: patch  # per published method (method_yq)
+minmax:
+  enable: true
+  kernel: 6  # fixed 6x6 luminance min/max
+label:
+  coarse_downsample: maxpool
+inference:
+  alpha: 0.01
+  prob_threshold: 0.5
+  stitch: avg_logits
+optim:
+  iters: 40000
+  batch_size: 8
+  lr: 6e-5
+  weight_decay: 0.01
+  schedule: poly
+  power: 1.0
+# dataset paths (placeholders)
+data:
+  train_images: /path/to/train/images
+  train_masks: /path/to/train/masks
+  val_images: /path/to/val/images
+  val_masks: /path/to/val/masks

pytest.ini ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [pytest]
2	+ pythonpath = src

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+torch>=2.1.0
+torchvision>=0.16.0
+timm>=0.9.8
+numpy>=1.24.0
+opencv-python>=4.8.0.76
+Pillow>=9.5.0
+PyYAML>=6.0.1
+tqdm>=4.65.0

src/wireseghr/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+__all__ = [
+    "model",
+    "data",
+]
+__version__ = "0.1.0"

src/wireseghr/data/__init__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+from .dataset import WireSegDataset
+from .sampler import BalancedPatchSampler
+__all__ = [
+    "WireSegDataset",
+    "BalancedPatchSampler",
+]

src/wireseghr/data/dataset.py ADDED Viewed

	@@ -0,0 +1,54 @@

+# Dataset placeholder for wire segmentation
+"""WireSeg dataset indexing and loading.
+Pairs images in `images_dir` with masks in `masks_dir` by matching filename stems.
+Mask is loaded as single-channel 0/1.
+"""
+from typing import Any, Dict, List
+from pathlib import Path
+import numpy as np
+import cv2
+class WireSegDataset:
+    def __init__(self, images_dir: str, masks_dir: str, split: str = "train"):
+        self.images_dir = Path(images_dir)
+        self.masks_dir = Path(masks_dir)
+        self.split = split
+        assert self.images_dir.exists(), f"Missing images_dir: {self.images_dir}"
+        assert self.masks_dir.exists(), f"Missing masks_dir: {self.masks_dir}"
+        self._items: List[tuple[Path, Path]] = self._index_pairs()
+    def __len__(self) -> int:
+        return len(self._items)
+    def __getitem__(self, idx: int) -> Dict[str, Any]:
+        img_path, mask_path = self._items[idx]
+        img_bgr = cv2.imread(str(img_path), cv2.IMREAD_COLOR)
+        assert img_bgr is not None, f"Failed to read image: {img_path}"
+        img = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
+        mask = cv2.imread(str(mask_path), cv2.IMREAD_GRAYSCALE)
+        assert mask is not None, f"Failed to read mask: {mask_path}"
+        mask_bin = (mask > 0).astype(np.uint8)
+        return {"image": img, "mask": mask_bin, "image_path": str(img_path), "mask_path": str(mask_path)}
+    def _index_pairs(self) -> List[tuple[Path, Path]]:
+        exts_img = {".png", ".jpg", ".jpeg", ".bmp", ".tif", ".tiff"}
+        exts_mask = {".png", ".jpg", ".jpeg", ".bmp", ".tif", ".tiff"}
+        imgs: Dict[str, Path] = {}
+        for p in sorted(self.images_dir.rglob("*")):
+            if p.is_file() and p.suffix.lower() in exts_img:
+                imgs[p.stem] = p
+        masks: Dict[str, Path] = {}
+        for p in sorted(self.masks_dir.rglob("*")):
+            if p.is_file() and p.suffix.lower() in exts_mask:
+                masks[p.stem] = p
+        pairs: List[tuple[Path, Path]] = []
+        for stem, ip in imgs.items():
+            if stem in masks:
+                pairs.append((ip, masks[stem]))
+        assert len(pairs) > 0, f"No image-mask pairs found in {self.images_dir} and {self.masks_dir}"
+        return pairs

src/wireseghr/data/sampler.py ADDED Viewed

	@@ -0,0 +1,14 @@

+# Balanced patch sampler (>=1% wire pixels)
+# TODO: Implement logic over mask to pick patches with wire ratio >= threshold.
+from dataclasses import dataclass
+@dataclass
+class BalancedPatchSampler:
+    patch_size: int = 768
+    min_wire_ratio: float = 0.01
+    def sample(self, image, mask):
+        # TODO: sample and return top-left (y, x) of a valid patch
+        return 0, 0

src/wireseghr/data/transforms.py ADDED Viewed

	@@ -0,0 +1,9 @@

+# Training-time transforms: scaling, rotation, flip, photometric distortion
+# TODO: Implement deterministic transform composition for reproducibility
+class TrainTransforms:
+    def __init__(self):
+        pass
+    def __call__(self, image, mask):
+        return image, mask

src/wireseghr/infer.py ADDED Viewed

	@@ -0,0 +1,27 @@

+import argparse
+import os
+import pprint
+import yaml
+def main():
+    parser = argparse.ArgumentParser(description="WireSegHR inference (skeleton)")
+    parser.add_argument("--config", type=str, default="configs/default.yaml", help="Path to YAML config")
+    parser.add_argument("--image", type=str, required=False, help="Path to input image")
+    args = parser.parse_args()
+    cfg_path = args.config
+    if not os.path.isabs(cfg_path):
+        cfg_path = os.path.join(os.getcwd(), cfg_path)
+    with open(cfg_path, "r") as f:
+        cfg = yaml.safe_load(f)
+    print("[WireSegHR][infer] Loaded config from:", cfg_path)
+    pprint.pprint(cfg)
+    print("[WireSegHR][infer] Image:", args.image)
+    print("[WireSegHR][infer] Skeleton OK. Implement inference per SEGMENTATION_PLAN.md.")
+if __name__ == "__main__":
+    main()

src/wireseghr/metrics.py ADDED Viewed

	@@ -0,0 +1,9 @@

+# Metrics placeholder: IoU, F1, Precision, Recall
+# TODO: Implement proper metrics matching paper tables.
+from typing import Dict
+def compute_metrics(pred_mask, gt_mask) -> Dict[str, float]:
+    # TODO: implement
+    return {"iou": 0.0, "f1": 0.0, "precision": 0.0, "recall": 0.0}

src/wireseghr/model/__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+from .encoder import SegFormerEncoder
+from .decoder import CoarseDecoder, FineDecoder
+from .condition import Conditioning1x1
+from .minmax import MinMaxLuminance
+from .label_downsample import downsample_label_maxpool
+from .model import WireSegHR
+__all__ = [
+    "SegFormerEncoder",
+    "CoarseDecoder",
+    "FineDecoder",
+    "Conditioning1x1",
+    "MinMaxLuminance",
+    "downsample_label_maxpool",
+    "WireSegHR",
+]

src/wireseghr/model/condition.py ADDED Viewed

	@@ -0,0 +1,14 @@

+# 1x1 conv to collapse 2-ch coarse logits into 1-ch conditioning map
+# TODO: Wire with coarse decoder outputs and proper resize/cropping.
+import torch
+import torch.nn as nn
+class Conditioning1x1(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv = nn.Conv2d(2, 1, kernel_size=1, bias=True)
+    def forward(self, coarse_logits: torch.Tensor) -> torch.Tensor:
+        return self.conv(coarse_logits)

src/wireseghr/model/decoder.py ADDED Viewed

	@@ -0,0 +1,58 @@

+"""SegFormer-like multi-scale decoder heads for coarse and fine branches.
+Fuse four feature maps from MiT encoder via 1x1 projections, upsample to the
+highest spatial resolution (stage 0), concatenate, and predict 2-class logits.
+"""
+from typing import List
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class _ConvBNReLU(nn.Module):
+    def __init__(self, in_ch: int, out_ch: int, k: int, s: int = 1, p: int = 0):
+        super().__init__()
+        self.conv = nn.Conv2d(in_ch, out_ch, kernel_size=k, stride=s, padding=p, bias=False)
+        self.bn = nn.BatchNorm2d(out_ch)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.relu(x)
+        return x
+class _SegFormerHead(nn.Module):
+    def __init__(self, in_chs: List[int], embed_dim: int = 128, num_classes: int = 2):
+        super().__init__()
+        assert len(in_chs) == 4
+        self.proj = nn.ModuleList([nn.Conv2d(c, embed_dim, kernel_size=1) for c in in_chs])
+        self.fuse = _ConvBNReLU(embed_dim * 4, embed_dim, k=3, p=1)
+        self.cls = nn.Conv2d(embed_dim, num_classes, kernel_size=1)
+    def forward(self, feats: List[torch.Tensor]) -> torch.Tensor:
+        assert len(feats) == 4
+        h, w = feats[0].shape[2], feats[0].shape[3]
+        xs = []
+        for f, proj in zip(feats, self.proj):
+            x = proj(f)
+            if x.shape[2] != h or x.shape[3] != w:
+                x = F.interpolate(x, size=(h, w), mode="bilinear", align_corners=False)
+            xs.append(x)
+        x = torch.cat(xs, dim=1)
+        x = self.fuse(x)
+        x = self.cls(x)
+        return x
+class CoarseDecoder(_SegFormerHead):
+    def __init__(self, in_chs: List[int] = (64, 128, 320, 512), embed_dim: int = 128, num_classes: int = 2):
+        super().__init__(list(in_chs), embed_dim, num_classes)
+class FineDecoder(_SegFormerHead):
+    def __init__(self, in_chs: List[int] = (64, 128, 320, 512), embed_dim: int = 128, num_classes: int = 2):
+        super().__init__(list(in_chs), embed_dim, num_classes)

src/wireseghr/model/encoder.py ADDED Viewed

	@@ -0,0 +1,42 @@

+"""SegFormer MiT encoder wrapper with adjustable input channels.
+Uses timm to instantiate MiT (e.g., mit_b3) and returns a list of multi-scale
+features [C1, C2, C3, C4].
+"""
+from typing import List, Tuple
+import torch
+import torch.nn as nn
+import timm
+class SegFormerEncoder(nn.Module):
+    def __init__(
+        self,
+        backbone: str = "mit_b3",
+        in_channels: int = 7,
+        pretrained: bool = True,
+        out_indices: Tuple[int, int, int, int] = (0, 1, 2, 3),
+    ):
+        super().__init__()
+        self.backbone_name = backbone
+        self.in_channels = in_channels
+        self.pretrained = pretrained
+        self.out_indices = out_indices
+        # Create MiT with features_only to obtain multi-scale feature maps.
+        # in_chans allows expanded inputs (RGB + minmax + cond + loc)
+        self.encoder = timm.create_model(
+            backbone,
+            pretrained=pretrained,
+            features_only=True,
+            out_indices=out_indices,
+            in_chans=in_channels,
+        )
+    def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
+        feats = self.encoder(x)
+        # Ensure list of tensors is returned
+        assert isinstance(feats, (list, tuple)) and len(feats) == len(self.out_indices)
+        return list(feats)

src/wireseghr/model/label_downsample.py ADDED Viewed

	@@ -0,0 +1,26 @@

+# MaxPool-based downsampling for coarse labels
+"""Downsample binary masks preserving thin positives.
+We use area-based resize on float32 masks followed by a >0 threshold.
+This emulates block-wise max pooling: any positive in the source region
+produces a positive in the target pixel.
+"""
+import numpy as np
+def downsample_label_maxpool(mask: np.ndarray, out_h: int, out_w: int) -> np.ndarray:
+    """
+    Args:
+        mask: HxW binary (0/1) numpy array
+        out_h, out_w: target size
+    Returns:
+        H'xW' binary array via max-pooling-like downsample
+    """
+    assert mask.ndim == 2
+    # Convert to float32 so area resize yields fractional averages > 0 if any positive present
+    import cv2
+    m = mask.astype(np.float32)
+    r = cv2.resize(m, (out_w, out_h), interpolation=cv2.INTER_AREA)
+    out = (r > 0.0).astype(np.uint8)
+    return out

src/wireseghr/model/minmax.py ADDED Viewed

	@@ -0,0 +1,29 @@

+# MinMax luminance feature computation (6x6 window)
+# Implemented with OpenCV morphology (erode=min, dilate=max) using 6x6 kernel and replicate border.
+from typing import Tuple
+import numpy as np
+class MinMaxLuminance:
+    def __init__(self, kernel: int = 6):
+        assert kernel == 6, "Per plan, kernel is fixed to 6x6"
+        self.kernel = kernel
+    def __call__(self, img_rgb: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
+        """
+        Args:
+            img_rgb: HxWx3 uint8 or float32 in [0,255] or [0,1]
+        Returns:
+            (Y_min, Y_max): two HxW float32 arrays
+        """
+        assert img_rgb.ndim == 3 and img_rgb.shape[2] == 3
+        r, g, b = img_rgb[..., 0], img_rgb[..., 1], img_rgb[..., 2]
+        y = (0.299 * r + 0.587 * g + 0.114 * b).astype(np.float32)
+        import cv2  # lazy import to avoid test-time dependency at module import
+        kernel = np.ones((self.kernel, self.kernel), dtype=np.uint8)
+        y_min = cv2.erode(y, kernel, borderType=cv2.BORDER_REPLICATE)
+        y_max = cv2.dilate(y, kernel, borderType=cv2.BORDER_REPLICATE)
+        return y_min.astype(np.float32), y_max.astype(np.float32)

src/wireseghr/model/model.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from typing import Tuple
+import torch
+import torch.nn as nn
+from .encoder import SegFormerEncoder
+from .decoder import CoarseDecoder, FineDecoder
+from .condition import Conditioning1x1
+class WireSegHR(nn.Module):
+    """
+    Two-stage WireSegHR model wrapper with shared encoder.
+    Expects callers to prepare input channel stacks according to the plan:
+    - Coarse input: RGB + MinMax (and any extra channels per config), shape (B, Cc, Hc, Wc)
+    - Fine input: RGB + MinMax + cond_crop + binary_location_mask, shape (B, Cf, p, p)
+    Conditioning 1x1 is applied to coarse logits to produce a single-channel map.
+    """
+    def __init__(self, backbone: str = "mit_b3", in_channels: int = 7, pretrained: bool = True):
+        super().__init__()
+        self.encoder = SegFormerEncoder(backbone=backbone, in_channels=in_channels, pretrained=pretrained)
+        # Default MiT-B3 channel dims for stages
+        in_chs = (64, 128, 320, 512)
+        self.coarse_head = CoarseDecoder(in_chs=in_chs, embed_dim=128, num_classes=2)
+        self.fine_head = FineDecoder(in_chs=in_chs, embed_dim=128, num_classes=2)
+        self.cond1x1 = Conditioning1x1()
+    def forward_coarse(self, x_coarse: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        assert x_coarse.dim() == 4
+        feats = self.encoder(x_coarse)
+        logits_coarse = self.coarse_head(feats)
+        cond_map = self.cond1x1(logits_coarse)
+        return logits_coarse, cond_map
+    def forward_fine(self, x_fine: torch.Tensor) -> torch.Tensor:
+        assert x_fine.dim() == 4
+        feats = self.encoder(x_fine)
+        logits_fine = self.fine_head(feats)
+        return logits_fine

src/wireseghr/train.py ADDED Viewed

	@@ -0,0 +1,25 @@

+import argparse
+import os
+import pprint
+import yaml
+def main():
+    parser = argparse.ArgumentParser(description="WireSegHR training (skeleton)")
+    parser.add_argument("--config", type=str, default="configs/default.yaml", help="Path to YAML config")
+    args = parser.parse_args()
+    cfg_path = args.config
+    if not os.path.isabs(cfg_path):
+        cfg_path = os.path.join(os.getcwd(), cfg_path)
+    with open(cfg_path, "r") as f:
+        cfg = yaml.safe_load(f)
+    print("[WireSegHR][train] Loaded config from:", cfg_path)
+    pprint.pprint(cfg)
+    print("[WireSegHR][train] Skeleton OK. Implement training per SEGMENTATION_PLAN.md.")
+if __name__ == "__main__":
+    main()

src/wireseghr/utils.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ def log(msg: str):
2	+ print(f"[WireSegHR] {msg}")

tests/test_model_forward.py ADDED Viewed

	@@ -0,0 +1,19 @@

+import torch
+from wireseghr.model import WireSegHR
+def test_wireseghr_forward_shapes():
+    # Use small input to keep test light and avoid downloading weights
+    model = WireSegHR(backbone="mit_b3", in_channels=3, pretrained=False)
+    x = torch.randn(1, 3, 64, 64)
+    logits_coarse, cond = model.forward_coarse(x)
+    assert logits_coarse.shape[0] == 1 and logits_coarse.shape[1] == 2
+    assert cond.shape[0] == 1 and cond.shape[1] == 1
+    # Expect stage 0 resolution ~ 1/4 of input for MiT
+    assert logits_coarse.shape[2] == 16 and logits_coarse.shape[3] == 16
+    assert cond.shape[2] == 16 and cond.shape[3] == 16
+    logits_fine = model.forward_fine(x)
+    assert logits_fine.shape == logits_coarse.shape

tests/test_skeleton_imports.py ADDED Viewed

	@@ -0,0 +1,7 @@

+def test_imports():
+    import wireseghr
+    import wireseghr.model as m
+    import wireseghr.data as d
+    assert hasattr(m, "SegFormerEncoder")
+    assert hasattr(d, "WireSegDataset")