REAL-TIME SLAM FOR THE OFF-ROAD AUTONOMOUS DRIVING (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLIII-B2-2020/631/2020/isprs-archives-XLIII-B2-2020-631-2020.pdf

REAL-TIME SLAM FOR THE OFF-ROAD AUTONOMOUS DRIVING

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) REAL-TIME SLAM FOR THE OFF-ROAD AUTONOMOUS DRIVING B. Vishnyakov *, V. Sheverdin FGUP «State Research Institute of Aviation Systems», Russia, 125319, Moscow, Viktorenko street, 7 (vishnyakov, sheverdin)@gosniias.ru Commission II, WG II/4 KEY WORDS: SLAM, off-road, autonomous driving, DAS, scene reconstruction, SfM, odometry. ABSTRACT: In this paper we propose a new SLAM algorithm that is robust to the changing environment of the countryside. The hardware part consists of two separate machine vision cameras, joined in stereo, and can be supplemented with LiDAR, IMU and GPS. We introduce a method that can be used to reliably calculate the position of a vehicle in natural environments. To estimate the pose and produce a three-dimensional reconstruction we use a stereo camera rig, inertial measurement unit and the global positioning system. While solving the problem of visual odometry in outdoor scenes we faced a number of difficulties, arising from high dynamic range, as well as the presence of a large number of "similar elements", such as leaves, grass, trees and etc. Under these conditions, it becomes difficult to match feature points in image sequences. HOG-based methods, such as SIFT, SURF and others often do not obtain good matching due to noise, lack of a sufficient number of gradients, and the presence of identical domains. Using neighborhood-based detectors such as DAISY often allows to identify the correct matches, but using them is worth it too expensive. These methods are very demanding on the computational resources and prone to drift. We needed a method that is less expensive, but at the same time provides sufficient accuracy in the trajectory estimation. Direct methods, such as optical flow calculating or direct image matching allow us to map pointto-point in these conditions with high reliability. They also have disadvantages that can be eliminated by using an IMU and modern algorithms. To improve the quality of the algorithms, we solve the reconstruction problem for several frames using the LevenbergMarquardt optimization method for bundle adjustment. Each pass optimizes frames that are directly related to the last one, we use two threads that perform partial and full optimization of the entire trajectory using graphs to significantly increase the performance of the method. 1. INTRODUCTION Methods of simultaneous localization and mapping use sensors of a very different nature to improve their quality. In particular, visual sensors, LiDARs, inertial measuring devices, GLONASS and GPS systems, and radars are used. In this paper, we consider using two machine vision cameras combined into stereo, IMU and GPS to solve the problem of SLAM. Simultaneous localization and three-dimensional scene reconstruction involve the joint solution of two subtasks: determining own position (odometry) and reconstructing a three-dimensional scene (building a spatial model). Figure 1. Sample of ORB-SLAM. There are basically two classes of methods for solving the SLAM problem using visual sensors: feature-based and direct. The first one is based on matching feature-based image tracking and performs sparse reconstruction of three-dimensional space (Sparse reconstruction). The second set of methods, called direct methods, compares the images themselves or some of their parts connected by topology, allowing you to get a dense and semidense reconstruction of space (Dense and Semi-dense reconstruction). Feature-based class of methods includes such implementations as PTAM (Klein, Murray, 2007), monoSLAM (Davison et al., 2007), ORB-SLAM (Mur-Artal, Tardós, 2017), ProSLAM (Schlegel et al., 2018), OpenVSLAM (Sumikura et al., 2019) e.g. in Figure 1. Direct class includes, for example, DSO (Engel et al., 2018), SVO (Forster et al., 2014), DTAM (Newcombe et al., 2011), LSD-SLAM (Engel, 2017) can be seen in Figure 2. * Figure 2. Sample of LSD-SLAM. These approaches have both advantages and disadvantages. For example, direct methods show good stability, although they require some pre-processing. They require fewer computing resources, but at the moment the class of these odometry methods is still undergoing a stage of active development and has some problems. Good results can be achieved by combining the solution of direct and indirect methods using the Kalman filter and other Bayesian filters (Lee et al., 2019). Corresponding author This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-631-2020 | © Authors 2020. CC BY 4.0 License. 631 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Feature point detection and matching techniques, based on histograms of oriented gradients, such as SIFT (Lowe, 2004), SURF (Bay et al., 2008) and others often do not obtain good matching due to noise and insufficient number of gradients, presence of identical domains. Neighborhood-based feature point detectors such as DAISY (Tola et al., 2008) often allow to identify the correct matches, but using them reduces the algorithm speed drastically. Reliable solution of the SLAM problem requires joint modelling using a variety of methods, sensors, and probabilistic filtering algorithms – sensor fusion. Moreover, the science society is now actively researching algorithms that work with event cameras, which are able to increase the stability of reconstruction in conditions of fast movement and insufficient lighting. In this paper we propose a complex solution for off-road and countryside scene reconstruction. Our method is based on the following key steps: • • • • • • 2. Gradient-based camera auto exposure. Adaptive filtering of captured images to reduce noise. Fast stereo correspondence calculation Using IMU measurements for sensor initial extrinsic Using a semi-direct method for calculating visual odometry Obtaining semi-dense reconstruction by using DAISY features descriptor or block matching algorithm gradients, and lower values on strong ones, 𝐼 is the pixel intensity value. Undoubtedly, the quality of three-dimensional reconstruction directly depends on the quality of images received from the camera. Factors that influence the result are the length of the exposure time, the sensitivity of the sensor, lens distortion, vignetting. In order to improve the quality of the algorithm, we implement automatic exposure and sensitivity control algorithms. Unlike everyday cameras, where auto exposure is designed to improve the quality of human visual perception, in machine vision, the main quality parameter is the number of features in the image (Znang et al., 2017). The more differences (gradients) of brightness can be distinguished, the more accurate and high-quality the result will be achieved. There (...truncated)