WireSegHR / paper-tex /supplement.tex
MRiabov's picture
Unpack `.tex` source.
5cab910
% CVPR 2023 Paper Template
% based on the CVPR template provided by Ming-Ming Cheng (https://github.com/MCG-NKU/CVPR_Template)
% modified and extended by Stefan Roth ([email protected])
\documentclass[10pt,twocolumn,letterpaper]{article}
%%%%%%%%% PAPER TYPE - PLEASE UPDATE FOR FINAL VERSION
% \usepackage[review]{cvpr} % To produce the REVIEW version
\usepackage{cvpr} % To produce the CAMERA-READY version
%\usepackage[pagenumbers]{cvpr} % To force page numbers, e.g. for an arXiv version
\usepackage[accsupp]{axessibility} % Improves PDF readability for those with disabilities.
% Include other packages here, before hyperref.
\usepackage[normalem]{ulem}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{booktabs}
\usepackage{xcolor}
\usepackage{comment}
\usepackage{enumitem}
\usepackage{multirow}
\usepackage{footnote}
\newcommand{\todo}[1]{{\color{red}#1}}
\newcommand{\benchmark}{WireSegHR}
\makeatletter
\newcommand\footnoteref[1]{\protected@xdef\@thefnmark{\ref{#1}}\@footnotemark}
\makeatother
% It is strongly recommended to use hyperref, especially for the review version.
% hyperref with option pagebackref eases the reviewers' job.
% Please disable hyperref *only* if you encounter grave issues, e.g. with the
% file validation for the camera-ready version.
%
% If you comment hyperref and then uncomment it, you should delete
% ReviewTempalte.aux before re-running LaTeX.
% (Or just hit 'q' on the first LaTeX run, let it finish, and you
% should be clear).
\usepackage[pagebackref,breaklinks,colorlinks]{hyperref}
% Support for easy cross-referencing
\usepackage[capitalize]{cleveref}
\crefname{section}{Sec.}{Secs.}
\Crefname{section}{Section}{Sections}
\Crefname{table}{Table}{Tables}
\crefname{table}{Tab.}{Tabs.}
%%%%%%%%% PAPER ID - PLEASE UPDATE
\def\cvprPaperID{699} % *** Enter the CVPR Paper ID here
\def\confName{CVPR}
\def\confYear{2023}
\makeatletter
\def\@maketitle
{
\newpage
\null
\iftoggle{cvprrebuttal}{\vspace*{-.3in}}{\vskip .375in}
\begin{center}
% smaller title font only for rebuttal
\iftoggle{cvprrebuttal}{{\large \bf \@title \par}}{{\Large \bf \@title \par}}
% additional two empty lines at the end of the title
\iftoggle{cvprrebuttal}{\vspace*{-22pt}}{\vspace*{24pt}}
{
\large
\lineskip .5em
\begin{tabular}[t]{c}
\iftoggle{cvprfinal}{
\@author
}{
\iftoggle{cvprrebuttal}{}{
Anonymous \confName~submission\\
\vspace*{1pt}\\
Paper ID \cvprPaperID
}
}
\end{tabular}
\par
}
% additional small space at the end of the author name
% additional empty line at the end of the title block
\vspace*{-10mm}
\vspace{-5mm}
\end{center}
}
\makeatother
\title{Supplementary Material: Automatic High Resolution Wire Segmentation and Removal}
\begin{document}
%%%%%%%%% TITLE - PLEASE UPDATE
% \title{Automatic Wire Segmentation for Inpainting}
% \title{Segmentation to the Extreme: \\A Large-Scale Wire Segmentation Dataset and a Pilot Study}
%\title{Segmentation to the Extreme: \\A Large-Scale High-Resolution Wire Segmentation Dataset and a Pilot Study}
%\title{WINE: WIre NEver Appeared in Your Photos}
\maketitle
\thispagestyle{empty}
\vspace{-1mm}
\section{Comparison with Pixel 6}
\vspace{-1mm}
We show a visual comparison between our model and Pixel 6's ``Magic Eraser'' feature in Figure~\ref{fig:pixel6}. Without manual intervention, Google Pixel 6's ``Magic Eraser'' performs well on wires with clean background, but suffers from thin wires that are hardly visible ((A) upper), and also on wires with complicated background ((A) lower). We also pass our segmentation mask to our wire inpainting model to acquire the wire removal result, as shown in the lower image of (B).
\vspace{-1mm}
\section{Failure cases}
\vspace{-1mm}
We show some challenging cases where our model fails to predict accurate wire masks in Figure~\ref{fig:new_failures}. These include regions that are very similar to wires (top row), severe background blending (middle row) and extreme lighting conditions (bottom row).
\vspace{-1mm}
\section{Panorama}
\vspace{-1mm}
Our two-stage model leverages the sparsity of wires in natural images, and efficiently generalizes to ultra-high resolution images such as panoramas. We show one panoramic image of $11$K by $1.5$K resolution in Figure~\ref{fig:new_panorama}. Note that our method produces high-quality wire segmentation that covers wires that are almost invisible. As a result, our proposed wire removal step can effectively remove these regions.
\input{figure_tex/pixel6.tex}
\input{figure_tex/new_failure_cases.tex}
\input{figure_tex/new_panorama.tex}
\section{Segmentation and inpainting visualizations}
\vspace{-1mm}
We show our wire segmentation and inpainting results in several common photography scenes as well as in some challenging cases in Figure~\ref{fig:additional_visualizations}. We provide more visualizations of wire segmentation and subsequent inpainting results. Our model successfully handles numerous challenging scenarios, including strong backlit (top row), complex background texture (2nd row), low light (3rd row), and barely visible wires (4th row). A typical use case is shown in the last row.
%We also provide 20 additional samples in the attached HTML \mbox{(\textit{additional\_visualizations.html})}.
\section{Experiments on other datasets}
\vspace{-1mm}
Most existing wire-like datasets either are at low resolutions or are for specific purposes (e.g., aerial imaging) and thus do not contain the scene diversity like WireSegHR does. The suggested TTPLA~[2] dataset shares the Power Lines class with our dataset, although it only contains aerial images. Table~\ref{ttpla_exp} shows evaluation results of the TTPLA dataset on our model and also our WireSegHR dataset on the TTPLA model.
% We provide a model performance comparison on this dataset using two experiments.
% To abide by the request to refrain from significant additional experiments,
% We first test our trained model on the TTPLA test set. We then test the trained TTPLA model on our WireSegHR-500 test set:
\begin{table}[h!]
\resizebox{\linewidth}{!}{
\centering
\begin{tabular}{c|c|c}
Dataset & Model & IoU (\%) \\\hline\hline
\multirow{3}{*}{TTPLA (Power Line only)} & TTPLA (ResNet-50, 700$\times$ 700) & 18.9 \\
& Ours (ResNet-50) & 33.1 \\
& Ours (MiT-b2) & 42.7 \\\hline
\multirow{3}{*}{WireSegHR} & TTPLA (ResNet-50, 700$\times$ 700) & 3.5 \\
& Ours (ResNet-50) & 47.8 \\
& Ours (MiT-b2) & 60.8\\\hline
\end{tabular}}
\caption{Comparison with TTPLA.}
\vspace{-5mm}
\label{ttpla_exp}
\end{table}
% With our ResNet-50 model, MiT-b2 model, we obtain 33.1\% and 42.7\% wire IoU respectively, against 18.9\% from their original model.
TTPLA is trained on fixed resolution (700 $\times$ 700) and takes in the entire image for inference, which requires significant downsampling of our test set.
% may be fine for large structures such as transmission towers, but
As a result, the quality of thin wires deteriorates in both the image and the label. Our model drops in performance on the TTPLA dataset due to different annotation definitions: we annotate all wire-like objects while TTPLA only annotates power lines.
\vspace{-1mm}
\section{Additional training details}
\vspace{-1mm}
\paragraph{CascadePSP~\cite{cascadepsp}}
We follow the default training steps provided by the CascadePSP code\footnote{\label{note1}\href{https://github.com/hkchengrex/CascadePSP}{https://github.com/hkchengrex/CascadePSP}}. During training, we sample patches in the image that contain at least 1\% of wire pixels. During inference, we feed the predictions of the global DeepLabv3+ to the pretrained/retrained CascadePSP model to get the refined wire mask. In both cases, we follow the default inference code\footnoteref{note1} to obtain the final mask.
\vspace{-3mm}
\paragraph{MagNet~\cite{magnet}} MagNet\footnote{\href{https://github.com/VinAIResearch/MagNet}{https://github.com/VinAIResearch/MagNet}} obtains the initial mask predictions from a single backbone trained on all refinement scales. For a fair comparison, we adopt a 2-scale setting of MagNet, similar to our two-stage model, where the image is downsampled to $1024\times 1024$ in the global scale, and is kept at the original resolution in the local scale. To this end, we train a single DeepLabv3+ model by either downsampling the sample image to $1024\times 1024$ or randomly cropping $1024\times 1024$ patches at the original resolution. The sampled patches contain at least 1\% of wire pixels. We then train the refinement module based on the predictions from the DeepLabv3+ model, following the default setting. Inference is kept the same as the original MagNet model.
\vspace{-2mm}
\paragraph{ISDNet~\cite{isdnet}}
ISDNet\footnote{\href{https://github.com/cedricgsh/ISDNet}{https://github.com/cedricgsh/ISDNet}} performs inference on the entire image without sliding window. As a result, during training, we resize all images to $5000\times 5000$ and randomly crop $2500\times 2500$ windows, such that the input images can fit inside the GPUs. Sampled patches should contain 1\% wire pixels. During inference, all images are resized to $5000\times 5000$. We observe that this yields better results than if we keep images below $5000\times 5000$ at their original sizes.
\input{figure_tex/additional_visualizations.tex}
{\small
\bibliographystyle{ieee_fullname}
\bibliography{egbib}
}
\end{document}