DETERMINING PLANE-SWEEP SAMPLING POINTS IN IMAGE SPACE USING THE CROSS-RATIO FOR IMAGE-BASED DEPTH ESTIMATION (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-2-W6/325/2017/isprs-archives-XLII-2-W6-325-2017.pdf

DETERMINING PLANE-SWEEP SAMPLING POINTS IN IMAGE SPACE USING THE CROSS-RATIO FOR IMAGE-BASED DEPTH ESTIMATION

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W6, 2017 International Conference on Unmanned Aerial Vehicles in Geomatics, 4–7 September 2017, Bonn, Germany DETERMINING PLANE-SWEEP SAMPLING POINTS IN IMAGE SPACE USING THE CROSS-RATIO FOR IMAGE-BASED DEPTH ESTIMATION B. Ruf a,b , B. Erdnuess a,b , M. Weinmann b a Fraunhofer IOSB, Video Exploitation Systems, 76131 Karlsruhe, Germany (boitumelo.ruf, bastian.erdnuess)@iosb.fraunhofer.de b Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany - (boitumelo.ruf, bastian.erdnuess, martin.weinmann)@kit.edu Commission II, WG II/4 KEY WORDS: depth estimation, image sequence, plane-sweep, cross-ratio, online processing ABSTRACT: With the emergence of small consumer Unmanned Aerial Vehicles (UAVs), the importance and interest of image-based depth estimation and model generation from aerial images has greatly increased in the photogrammetric society. In our work, we focus on algorithms that allow an online image-based dense depth estimation from video sequences, which enables the direct and live structural analysis of the depicted scene. Therefore, we use a multi-view plane-sweep algorithm with a semi-global matching (SGM) optimization which is parallelized for general purpose computation on a GPU (GPGPU), reaching sufficient performance to keep up with the key-frames of input sequences. One important aspect to reach good performance is the way to sample the scene space, creating plane hypotheses. A small step size between consecutive planes, which is needed to reconstruct details in the near vicinity of the camera may lead to ambiguities in distant regions, due to the perspective projection of the camera. Furthermore, an equidistant sampling with a small step size produces a large number of plane hypotheses, leading to high computational effort. To overcome these problems, we present a novel methodology to directly determine the sampling points of plane-sweep algorithms in image space. The use of the perspective invariant cross-ratio allows us to derive the location of the sampling planes directly from the image data. With this, we efficiently sample the scene space, achieving higher sampling density in areas which are close to the camera and a lower density in distant regions. We evaluate our approach on a synthetic benchmark dataset for quantitative evaluation and on a real-image dataset consisting of aerial imagery. The experiments reveal that an inverse sampling achieves equal and better results than a linear sampling, with less sampling points and thus less runtime. Our algorithm allows an online computation of depth maps for subsequences of five frames, provided that the relative poses between all frames are given. 1. INTRODUCTION In recent years, the importance and interest of imagebased depth estimation and model generation from aerial images has greatly increased in the photogrammetric society. This trend is especially due to the emergence of small consumer Unmanned Aerial Vehicles (UAVs), which easily and cost-effectively allow the capturing of images from an aerial viewpoint. These images are used to generate three-dimensional models depicting our surrounding and, in turn, using such models alleviates various applications such as urban reconstruction (Blaha et al., 2016; Musialski et al., 2013; Rothermel et al., 2014), urban navigation (Serna and Marcotegui, 2013), scene interpretation (Weinmann, 2016), security surveillance (Pollok and Monari, 2016) and change detection (Taneja et al., 2013). An important step in the process of model generation from imagery is the image-based depth estimation, commonly known as Structure-from-Motion (SfM). While the accuracy achieved by state-of-the-art SfM algorithms is quite impressive, such results come at the cost of performance and runtime, in particular when it comes to high resolution dense depth estimation. In our work, we focus on algorithms that allow an online image-based dense depth estimation from video se- quences. Online processing does not necessarily aim to estimate depth maps for each input frame of the video sequence, but rather for every key-frame which are typically generated at 1Hz - 2Hz. This enables the direct and live structural analysis of the depicted scene. As video sequences allow the use of multiple images for reconstruction, we employ a plane-sweep algorithm for image matching. Apart from its ability of true multi-image matching (Collins, 1995), the plane-sweep algorithm can efficiently be optimized for general purpose computation on a GPU (GPGPU), necessary in order to achieve sufficient performance for online processing. Furthermore, urban surroundings are well-suited for an approximation by planar structures. The accuracy achieved and the runtime needed by SfM algorithms mainly depend on two factors: One is the optimization step employed after the image matching, which determines the per-pixel depth value of the resulting depth map. The second factor is the sampling of the scene space. In general, the plane-sweep algorithm is parametrized by the sweeping direction and the step size at which the planes are swept through space, i.e. the location of the planes in scene space. As we aim to achieve an online dense depth estimation, it is important that the scene space is sam- This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W6-325-2017 | © Authors 2017. CC BY 4.0 License. 325 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W6, 2017 International Conference on Unmanned Aerial Vehicles in Geomatics, 4–7 September 2017, Bonn, Germany pled efficiently, especially when it comes to oblique aerial images due to the large scene depth. Moreover, a typical characteristic of perspective cameras is that sizes and lengths of objects become smaller as these objects move away from the camera. And as the input data are the images, it is vital that the sampling points are selected in image space, instead of in scene space. In this paper, we present a methodology to derive the sampling points of plane-sweep algorithms directly from points in image space. Our contributions are: • The use of the perspective invariant cross-ratio to derive the sampling points of plane-sweep algorithms from correspondences in image space. • The employment of a true multi-image plane-sweep algorithm for the image-based depth estimation of urban environments from aerial imagery. This paper is structured as follows: In Section 2, we briefly summarize related work. Thereby, we give a short overview of different sampling strategies in image-based depth estimation, and we also explain how the plane-sweep algorithm differs from tensor-based strategies. Furthermore, we introduce different work that has been done in reconstruction of urban environments, especially from aer (...truncated)