REAL-TIME SLAM FOR THE OFF-ROAD AUTONOMOUS DRIVING
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020
XXIV ISPRS Congress (2020 edition)
REAL-TIME SLAM FOR THE OFF-ROAD AUTONOMOUS DRIVING
B. Vishnyakov *, V. Sheverdin
FGUP «State Research Institute of Aviation Systems», Russia, 125319, Moscow, Viktorenko street, 7
(vishnyakov, sheverdin)@gosniias.ru
Commission II, WG II/4
KEY WORDS: SLAM, off-road, autonomous driving, DAS, scene reconstruction, SfM, odometry.
ABSTRACT:
In this paper we propose a new SLAM algorithm that is robust to the changing environment of the countryside. The hardware part
consists of two separate machine vision cameras, joined in stereo, and can be supplemented with LiDAR, IMU and GPS. We introduce
a method that can be used to reliably calculate the position of a vehicle in natural environments. To estimate the pose and produce a
three-dimensional reconstruction we use a stereo camera rig, inertial measurement unit and the global positioning system. While solving
the problem of visual odometry in outdoor scenes we faced a number of difficulties, arising from high dynamic range, as well as the
presence of a large number of "similar elements", such as leaves, grass, trees and etc. Under these conditions, it becomes difficult to
match feature points in image sequences. HOG-based methods, such as SIFT, SURF and others often do not obtain good matching due
to noise, lack of a sufficient number of gradients, and the presence of identical domains. Using neighborhood-based detectors such as
DAISY often allows to identify the correct matches, but using them is worth it too expensive. These methods are very demanding on
the computational resources and prone to drift. We needed a method that is less expensive, but at the same time provides sufficient
accuracy in the trajectory estimation. Direct methods, such as optical flow calculating or direct image matching allow us to map pointto-point in these conditions with high reliability. They also have disadvantages that can be eliminated by using an IMU and modern
algorithms. To improve the quality of the algorithms, we solve the reconstruction problem for several frames using the LevenbergMarquardt optimization method for bundle adjustment. Each pass optimizes frames that are directly related to the last one, we use two
threads that perform partial and full optimization of the entire trajectory using graphs to significantly increase the performance of the
method.
1. INTRODUCTION
Methods of simultaneous localization and mapping use sensors
of a very different nature to improve their quality. In particular,
visual sensors, LiDARs, inertial measuring devices, GLONASS
and GPS systems, and radars are used. In this paper, we consider
using two machine vision cameras combined into stereo, IMU
and GPS to solve the problem of SLAM. Simultaneous
localization and three-dimensional scene reconstruction involve
the joint solution of two subtasks: determining own position
(odometry) and reconstructing a three-dimensional scene
(building a spatial model).
Figure 1. Sample of ORB-SLAM.
There are basically two classes of methods for solving the SLAM
problem using visual sensors: feature-based and direct.
The first one is based on matching feature-based image tracking
and performs sparse reconstruction of three-dimensional space
(Sparse reconstruction). The second set of methods, called direct
methods, compares the images themselves or some of their parts
connected by topology, allowing you to get a dense and semidense reconstruction of space (Dense and Semi-dense
reconstruction).
Feature-based class of methods includes such implementations as
PTAM (Klein, Murray, 2007), monoSLAM (Davison et al.,
2007), ORB-SLAM (Mur-Artal, Tardós, 2017), ProSLAM
(Schlegel et al., 2018), OpenVSLAM (Sumikura et al., 2019) e.g.
in Figure 1. Direct class includes, for example, DSO (Engel et al.,
2018), SVO (Forster et al., 2014), DTAM (Newcombe et al.,
2011), LSD-SLAM (Engel, 2017) can be seen in Figure 2.
*
Figure 2. Sample of LSD-SLAM.
These approaches have both advantages and disadvantages. For
example, direct methods show good stability, although they
require some pre-processing. They require fewer computing
resources, but at the moment the class of these odometry methods
is still undergoing a stage of active development and has some
problems. Good results can be achieved by combining the
solution of direct and indirect methods using the Kalman filter
and other Bayesian filters (Lee et al., 2019).
Corresponding author
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-631-2020 | © Authors 2020. CC BY 4.0 License.
631
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020
XXIV ISPRS Congress (2020 edition)
Feature point detection and matching techniques, based on
histograms of oriented gradients, such as SIFT (Lowe, 2004),
SURF (Bay et al., 2008) and others often do not obtain good
matching due to noise and insufficient number of gradients,
presence of identical domains. Neighborhood-based feature point
detectors such as DAISY (Tola et al., 2008) often allow to
identify the correct matches, but using them reduces the
algorithm speed drastically.
Reliable solution of the SLAM problem requires joint modelling
using a variety of methods, sensors, and probabilistic filtering
algorithms – sensor fusion. Moreover, the science society is now
actively researching algorithms that work with event cameras,
which are able to increase the stability of reconstruction in
conditions of fast movement and insufficient lighting.
In this paper we propose a complex solution for off-road and
countryside scene reconstruction. Our method is based on the
following key steps:
•
•
•
•
•
•
2.
Gradient-based camera auto exposure.
Adaptive filtering of captured images to reduce noise.
Fast stereo correspondence calculation
Using IMU measurements for sensor initial extrinsic
Using a semi-direct method for calculating visual
odometry
Obtaining semi-dense reconstruction by using DAISY
features descriptor or block matching algorithm
gradients, and lower values on strong ones, 𝐼 is the pixel intensity
value.
Undoubtedly, the quality of three-dimensional reconstruction
directly depends on the quality of images received from the
camera. Factors that influence the result are the length of the
exposure time, the sensitivity of the sensor, lens distortion,
vignetting. In order to improve the quality of the algorithm, we
implement automatic exposure and sensitivity control
algorithms. Unlike everyday cameras, where auto exposure is
designed to improve the quality of human visual perception, in
machine vision, the main quality parameter is the number of
features in the image (Znang et al., 2017). The more differences
(gradients) of brightness can be distinguished, the more accurate
and high-quality the result will be achieved.
There (...truncated)