|
|
|
|
|
|
|
|
|
\documentclass[10pt,twocolumn,letterpaper]{article} |
|
|
|
|
|
|
|
\usepackage{cvpr} |
|
|
|
|
|
\usepackage[accsupp]{axessibility} |
|
|
|
|
|
\usepackage[normalem]{ulem} |
|
\usepackage{graphicx} |
|
\usepackage{amsmath} |
|
\usepackage{amssymb} |
|
\usepackage{booktabs} |
|
\usepackage{xcolor} |
|
\usepackage{comment} |
|
\usepackage{enumitem} |
|
\usepackage{multirow} |
|
\usepackage{footnote} |
|
\newcommand{\todo}[1]{{\color{red}#1}} |
|
|
|
\newcommand{\benchmark}{WireSegHR} |
|
|
|
|
|
\makeatletter |
|
\newcommand\footnoteref[1]{\protected@xdef\@thefnmark{\ref{#1}}\@footnotemark} |
|
\makeatother |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\usepackage[pagebackref,breaklinks,colorlinks]{hyperref} |
|
|
|
|
|
|
|
\usepackage[capitalize]{cleveref} |
|
\crefname{section}{Sec.}{Secs.} |
|
\Crefname{section}{Section}{Sections} |
|
\Crefname{table}{Table}{Tables} |
|
\crefname{table}{Tab.}{Tabs.} |
|
|
|
|
|
|
|
\def\cvprPaperID{699} |
|
\def\confName{CVPR} |
|
\def\confYear{2023} |
|
|
|
|
|
\makeatletter |
|
\def\@maketitle |
|
{ |
|
\newpage |
|
\null |
|
\iftoggle{cvprrebuttal}{\vspace*{-.3in}}{\vskip .375in} |
|
\begin{center} |
|
|
|
\iftoggle{cvprrebuttal}{{\large \bf \@title \par}}{{\Large \bf \@title \par}} |
|
|
|
\iftoggle{cvprrebuttal}{\vspace*{-22pt}}{\vspace*{24pt}} |
|
{ |
|
\large |
|
\lineskip .5em |
|
\begin{tabular}[t]{c} |
|
\iftoggle{cvprfinal}{ |
|
\@author |
|
}{ |
|
\iftoggle{cvprrebuttal}{}{ |
|
Anonymous \confName~submission\\ |
|
\vspace*{1pt}\\ |
|
Paper ID \cvprPaperID |
|
} |
|
} |
|
\end{tabular} |
|
\par |
|
} |
|
|
|
|
|
\vspace*{-10mm} |
|
\vspace{-5mm} |
|
\end{center} |
|
} |
|
\makeatother |
|
|
|
\title{Supplementary Material: Automatic High Resolution Wire Segmentation and Removal} |
|
|
|
\begin{document} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\maketitle |
|
\thispagestyle{empty} |
|
|
|
|
|
\vspace{-1mm} |
|
\section{Comparison with Pixel 6} |
|
\vspace{-1mm} |
|
We show a visual comparison between our model and Pixel 6's ``Magic Eraser'' feature in Figure~\ref{fig:pixel6}. Without manual intervention, Google Pixel 6's ``Magic Eraser'' performs well on wires with clean background, but suffers from thin wires that are hardly visible ((A) upper), and also on wires with complicated background ((A) lower). We also pass our segmentation mask to our wire inpainting model to acquire the wire removal result, as shown in the lower image of (B). |
|
|
|
\vspace{-1mm} |
|
\section{Failure cases} |
|
\vspace{-1mm} |
|
We show some challenging cases where our model fails to predict accurate wire masks in Figure~\ref{fig:new_failures}. These include regions that are very similar to wires (top row), severe background blending (middle row) and extreme lighting conditions (bottom row). |
|
|
|
\vspace{-1mm} |
|
\section{Panorama} |
|
\vspace{-1mm} |
|
Our two-stage model leverages the sparsity of wires in natural images, and efficiently generalizes to ultra-high resolution images such as panoramas. We show one panoramic image of $11$K by $1.5$K resolution in Figure~\ref{fig:new_panorama}. Note that our method produces high-quality wire segmentation that covers wires that are almost invisible. As a result, our proposed wire removal step can effectively remove these regions. |
|
|
|
|
|
\input{figure_tex/pixel6.tex} |
|
\input{figure_tex/new_failure_cases.tex} |
|
\input{figure_tex/new_panorama.tex} |
|
|
|
|
|
\section{Segmentation and inpainting visualizations} |
|
\vspace{-1mm} |
|
We show our wire segmentation and inpainting results in several common photography scenes as well as in some challenging cases in Figure~\ref{fig:additional_visualizations}. We provide more visualizations of wire segmentation and subsequent inpainting results. Our model successfully handles numerous challenging scenarios, including strong backlit (top row), complex background texture (2nd row), low light (3rd row), and barely visible wires (4th row). A typical use case is shown in the last row. |
|
|
|
|
|
\section{Experiments on other datasets} |
|
\vspace{-1mm} |
|
Most existing wire-like datasets either are at low resolutions or are for specific purposes (e.g., aerial imaging) and thus do not contain the scene diversity like WireSegHR does. The suggested TTPLA~[2] dataset shares the Power Lines class with our dataset, although it only contains aerial images. Table~\ref{ttpla_exp} shows evaluation results of the TTPLA dataset on our model and also our WireSegHR dataset on the TTPLA model. |
|
|
|
|
|
|
|
|
|
|
|
\begin{table}[h!] |
|
\resizebox{\linewidth}{!}{ |
|
\centering |
|
\begin{tabular}{c|c|c} |
|
Dataset & Model & IoU (\%) \\\hline\hline |
|
\multirow{3}{*}{TTPLA (Power Line only)} & TTPLA (ResNet-50, 700$\times$ 700) & 18.9 \\ |
|
& Ours (ResNet-50) & 33.1 \\ |
|
& Ours (MiT-b2) & 42.7 \\\hline |
|
\multirow{3}{*}{WireSegHR} & TTPLA (ResNet-50, 700$\times$ 700) & 3.5 \\ |
|
& Ours (ResNet-50) & 47.8 \\ |
|
& Ours (MiT-b2) & 60.8\\\hline |
|
\end{tabular}} |
|
\caption{Comparison with TTPLA.} |
|
\vspace{-5mm} |
|
\label{ttpla_exp} |
|
\end{table} |
|
|
|
|
|
TTPLA is trained on fixed resolution (700 $\times$ 700) and takes in the entire image for inference, which requires significant downsampling of our test set. |
|
|
|
As a result, the quality of thin wires deteriorates in both the image and the label. Our model drops in performance on the TTPLA dataset due to different annotation definitions: we annotate all wire-like objects while TTPLA only annotates power lines. |
|
|
|
\vspace{-1mm} |
|
\section{Additional training details} |
|
\vspace{-1mm} |
|
\paragraph{CascadePSP~\cite{cascadepsp}} |
|
We follow the default training steps provided by the CascadePSP code\footnote{\label{note1}\href{https://github.com/hkchengrex/CascadePSP}{https://github.com/hkchengrex/CascadePSP}}. During training, we sample patches in the image that contain at least 1\% of wire pixels. During inference, we feed the predictions of the global DeepLabv3+ to the pretrained/retrained CascadePSP model to get the refined wire mask. In both cases, we follow the default inference code\footnoteref{note1} to obtain the final mask. |
|
\vspace{-3mm} |
|
\paragraph{MagNet~\cite{magnet}} MagNet\footnote{\href{https://github.com/VinAIResearch/MagNet}{https://github.com/VinAIResearch/MagNet}} obtains the initial mask predictions from a single backbone trained on all refinement scales. For a fair comparison, we adopt a 2-scale setting of MagNet, similar to our two-stage model, where the image is downsampled to $1024\times 1024$ in the global scale, and is kept at the original resolution in the local scale. To this end, we train a single DeepLabv3+ model by either downsampling the sample image to $1024\times 1024$ or randomly cropping $1024\times 1024$ patches at the original resolution. The sampled patches contain at least 1\% of wire pixels. We then train the refinement module based on the predictions from the DeepLabv3+ model, following the default setting. Inference is kept the same as the original MagNet model. |
|
|
|
\vspace{-2mm} |
|
\paragraph{ISDNet~\cite{isdnet}} |
|
ISDNet\footnote{\href{https://github.com/cedricgsh/ISDNet}{https://github.com/cedricgsh/ISDNet}} performs inference on the entire image without sliding window. As a result, during training, we resize all images to $5000\times 5000$ and randomly crop $2500\times 2500$ windows, such that the input images can fit inside the GPUs. Sampled patches should contain 1\% wire pixels. During inference, all images are resized to $5000\times 5000$. We observe that this yields better results than if we keep images below $5000\times 5000$ at their original sizes. |
|
|
|
\input{figure_tex/additional_visualizations.tex} |
|
|
|
{\small |
|
\bibliographystyle{ieee_fullname} |
|
\bibliography{egbib} |
|
} |
|
|
|
\end{document} |
|
|