DETERMINING PLANE-SWEEP SAMPLING POINTS IN IMAGE SPACE USING THE CROSS-RATIO FOR IMAGE-BASED DEPTH ESTIMATION
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W6, 2017
International Conference on Unmanned Aerial Vehicles in Geomatics, 4–7 September 2017, Bonn, Germany
DETERMINING PLANE-SWEEP SAMPLING POINTS IN IMAGE SPACE
USING THE CROSS-RATIO FOR IMAGE-BASED DEPTH ESTIMATION
B. Ruf a,b , B. Erdnuess a,b , M. Weinmann b
a
Fraunhofer IOSB, Video Exploitation Systems, 76131 Karlsruhe, Germany (boitumelo.ruf, bastian.erdnuess)@iosb.fraunhofer.de
b
Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology,
76131 Karlsruhe, Germany - (boitumelo.ruf, bastian.erdnuess, martin.weinmann)@kit.edu
Commission II, WG II/4
KEY WORDS: depth estimation, image sequence, plane-sweep, cross-ratio, online processing
ABSTRACT:
With the emergence of small consumer Unmanned Aerial Vehicles (UAVs), the importance and interest of image-based
depth estimation and model generation from aerial images has greatly increased in the photogrammetric society. In
our work, we focus on algorithms that allow an online image-based dense depth estimation from video sequences, which
enables the direct and live structural analysis of the depicted scene. Therefore, we use a multi-view plane-sweep algorithm
with a semi-global matching (SGM) optimization which is parallelized for general purpose computation on a GPU
(GPGPU), reaching sufficient performance to keep up with the key-frames of input sequences. One important aspect
to reach good performance is the way to sample the scene space, creating plane hypotheses. A small step size between
consecutive planes, which is needed to reconstruct details in the near vicinity of the camera may lead to ambiguities in
distant regions, due to the perspective projection of the camera. Furthermore, an equidistant sampling with a small step
size produces a large number of plane hypotheses, leading to high computational effort. To overcome these problems,
we present a novel methodology to directly determine the sampling points of plane-sweep algorithms in image space.
The use of the perspective invariant cross-ratio allows us to derive the location of the sampling planes directly from the
image data. With this, we efficiently sample the scene space, achieving higher sampling density in areas which are close
to the camera and a lower density in distant regions. We evaluate our approach on a synthetic benchmark dataset for
quantitative evaluation and on a real-image dataset consisting of aerial imagery. The experiments reveal that an inverse
sampling achieves equal and better results than a linear sampling, with less sampling points and thus less runtime. Our
algorithm allows an online computation of depth maps for subsequences of five frames, provided that the relative poses
between all frames are given.
1.
INTRODUCTION
In recent years, the importance and interest of imagebased depth estimation and model generation from aerial
images has greatly increased in the photogrammetric society. This trend is especially due to the emergence of
small consumer Unmanned Aerial Vehicles (UAVs), which
easily and cost-effectively allow the capturing of images
from an aerial viewpoint. These images are used to generate three-dimensional models depicting our surrounding
and, in turn, using such models alleviates various applications such as urban reconstruction (Blaha et al., 2016;
Musialski et al., 2013; Rothermel et al., 2014), urban navigation (Serna and Marcotegui, 2013), scene interpretation (Weinmann, 2016), security surveillance (Pollok and
Monari, 2016) and change detection (Taneja et al., 2013).
An important step in the process of model generation from
imagery is the image-based depth estimation, commonly
known as Structure-from-Motion (SfM). While the accuracy achieved by state-of-the-art SfM algorithms is quite
impressive, such results come at the cost of performance
and runtime, in particular when it comes to high resolution
dense depth estimation.
In our work, we focus on algorithms that allow an online image-based dense depth estimation from video se-
quences. Online processing does not necessarily aim to
estimate depth maps for each input frame of the video
sequence, but rather for every key-frame which are typically generated at 1Hz - 2Hz. This enables the direct and
live structural analysis of the depicted scene. As video
sequences allow the use of multiple images for reconstruction, we employ a plane-sweep algorithm for image matching. Apart from its ability of true multi-image matching
(Collins, 1995), the plane-sweep algorithm can efficiently
be optimized for general purpose computation on a GPU
(GPGPU), necessary in order to achieve sufficient performance for online processing. Furthermore, urban surroundings are well-suited for an approximation by planar
structures. The accuracy achieved and the runtime needed
by SfM algorithms mainly depend on two factors: One is
the optimization step employed after the image matching,
which determines the per-pixel depth value of the resulting depth map. The second factor is the sampling of the
scene space.
In general, the plane-sweep algorithm is parametrized by
the sweeping direction and the step size at which the planes
are swept through space, i.e. the location of the planes in
scene space. As we aim to achieve an online dense depth
estimation, it is important that the scene space is sam-
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-2-W6-325-2017 | © Authors 2017. CC BY 4.0 License.
325
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W6, 2017
International Conference on Unmanned Aerial Vehicles in Geomatics, 4–7 September 2017, Bonn, Germany
pled efficiently, especially when it comes to oblique aerial
images due to the large scene depth. Moreover, a typical characteristic of perspective cameras is that sizes and
lengths of objects become smaller as these objects move
away from the camera. And as the input data are the images, it is vital that the sampling points are selected in
image space, instead of in scene space.
In this paper, we present a methodology to derive the sampling points of plane-sweep algorithms directly from points
in image space. Our contributions are:
• The use of the perspective invariant cross-ratio to derive the sampling points of plane-sweep algorithms
from correspondences in image space.
• The employment of a true multi-image plane-sweep
algorithm for the image-based depth estimation of urban environments from aerial imagery.
This paper is structured as follows: In Section 2, we
briefly summarize related work. Thereby, we give a short
overview of different sampling strategies in image-based
depth estimation, and we also explain how the plane-sweep
algorithm differs from tensor-based strategies. Furthermore, we introduce different work that has been done in reconstruction of urban environments, especially from aer (...truncated)