A ROBUST METHOD FOR STEREO VISUAL ODOMETRY BASED ON MULTIPLE EUCLIDEAN DISTANCE CONSTRAINT AND RANSAC ALGORITHM
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W1, 2017
2017 International Symposium on Planetary Remote Sensing and Mapping, 13–16 August 2017, Hong Kong
A ROBUST METHOD FOR STEREO VISUAL ODOMETRY BASED ON MULTIPLE
EUCLIDEAN DISTANCE CONSTRAINT AND RANSAC ALGORITHM
Qi Zhoua, Xiaohua Tong a, Shijie Liu a, Xiaojun Lu b, Sicong Liu a, Peng Chena,Yanming Jina, Huan Xiea
a
College of Surveying and Geo-Informatics, Tongji University, Shanghai, China –
b China International Engineering Consulting Corporation, Beijing, China
Commission III, WG III/2
KEY WORDS: Visual Odometry, Stereo Vision, Robot Navigation, RANSAC Algorithm
ABSTRACT:
Visual Odometry (VO) is a critical component for planetary robot navigation and safety. It estimates the ego-motion using stereo
images frame by frame. Feature points extraction and matching is one of the key steps for robotic motion estimation which largely
influences the precision and robustness. In this work, we choose the Oriented FAST and Rotated BRIEF (ORB) features by
considering both accuracy and speed issues. For more robustness in challenging environment e.g., rough terrain or planetary surface,
this paper presents a robust outliers elimination method based on Euclidean Distance Constraint (EDC) and Random Sample
Consensus (RANSAC) algorithm. In the matching process, a set of ORB feature points are extracted from the current left and right
synchronous images and the Brute Force (BF) matcher is used to find the correspondences between the two images for the Space
Intersection. Then the EDC and RANSAC algorithms are carried out to eliminate mismatches whose distances are beyond a
predefined threshold. Similarly, when the left image of the next time matches the feature points with the current left images, the EDC
and RANSAC are iteratively performed. After the above mentioned, there are exceptional remaining mismatched points in some
cases, for which the third time RANSAC is applied to eliminate the effects of those outliers in the estimation of the ego-motion
parameters (Interior Orientation and Exterior Orientation). The proposed approach has been tested on a real-world vehicle dataset
and the result benefits from its high robustness.
1. INTRODUCTION
Autonomous navigation is quite significant for many robotic
applications such as planetary exploration and auto drive. For
these robotic applications, Visual Odometry is the critical
method for relative locating, especially in GPS-denied
environments. VO estimates the ego-motion of robot using a
single or stereo cameras, which is more accurate than the
conventional wheel odometry according to Maimone et al.
(2007a).
VO is a specific application of Structure From Motion (SFM),
which contains the camera pose estimation and 3D scene point
reconstruction according to Scaramuzza et al. (2011).
Simultaneously, VO differs from the SLAM (Simultaneous
Localization And Mapping), which contains the mapping
process and loop closure (Engel et al., 2015; Pire et al., 2015).
In the last few decades, VO has been divided into two kinds of
method, monocular and stereo cameras. For monocular VO, the
main issue is solving the scale ambiguity problem. Some
researchers set the translation scale between the two consecutive
frames to a predefined value. Ke and Kanade (2003) virtually
rotate the camera to the bottom-view pose, which eliminates the
ambiguity between the rotation and translation and improves the
motion estimation process. On the other side, some researchers
assume that the environment around the monocular camera is
flat ground and the monocular camera is equipped on a fixed
height with fixed depression angle, like the situation in Bajpai
et al. (2016a). According to Bajpai et al. (2016a), the advantage
of monocular VO method is smaller computational cost
compared to the stereo VO, which is quite important for those
real-time embedded applications.
For large robotic platforms with strong computational ability
like automatic drive platforms and future planetary exploration
robots, the stereo cameras perform superior to the monocular
one. Because of the certain baseline between the left and right
camera, the ambiguity scale problem does not exist in stereo
VO. And the Stereo VO can estimate the 6-Degree of Freedom
(DOF) ego-motion no matter what kinds of environment the
system works in. Currently, there are two kinds of stereo VO
methods, 3D-3D method and 3D-2D method.
In both kinds of stereo VO method, feature detection and
matching great influence both the accuracy and speed issues. In
feature point matching field, there are many feature point
detectors and descriptors having been presented in last twenty
years. Scale-Invariant Feature Transform (SIFT) invented by
Lowe (1999) is the most famous one because of its excellent
detecting accuracy and robustness. Bay et al. (2006) presents
the Speeded Up Robust Feature (SURF), which is an improved
version of SIFT. It uses Haar wavelet to approximate the
gradient method in SIFT, using integral image technology at the
same time to calculate fast. In most cases, its performance can
reach the same level precision compared to SIFT, with 3-7
times faster. For those cases with very fast speed issue, Oriented
FAST and Rotated BRIEF (ORB) is employed by Rublee et al.
(2011).
Corresponding author
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-3-W1-219-2017 | © Authors 2017. CC BY 4.0 License.
219
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W1, 2017
2017 International Symposium on Planetary Remote Sensing and Mapping, 13–16 August 2017, Hong Kong
3D-3D method treats the stereo cameras as the point cloud
generator, which make use the stereo cameras to generate 3D
point cloud and estimates the rotation and translation between
two consecutive frames using 3D point cloud registration
method like Iterative Closest Point (ICP) algorithm in
Balazadegan et al. (2016a). ICP algorithm can only converge to
the local minimum, which differs from VO propose. Therefore,
we must obtain a good initial value for the VO motion
estimation parameters according to Hong et al. (2015a).
On the other side, the aim of the 3D-2D VO method is to solve
the Perspective-n-Point (PnP) problem. According
to
Scaramuzza et al. (2011), 3D-2D method is more accurate than
the 3D-3D method, therefore 3D-2D method has received
attention in both Photogrammetry like McGlove et al. (2004a)
and Computer Vision like Hartley and Zisserman (2000a). The
least feature points needed in PnP problem are 3, which called
P3P problem. Gao et al. (2003a) presents a solution to the
3point algorithm for P3P problem. For n>3 points, some more
accurate but slower methods based on iteration exist presented
by Quan and Lan (1999a) and Ansar and Daniilidis (2003a).
In 3D-2D method, the outlier elimination is quite important
because the preci (...truncated)