A ROBUST METHOD FOR STEREO VISUAL ODOMETRY BASED ON MULTIPLE EUCLIDEAN DISTANCE CONSTRAINT AND RANSAC ALGORITHM (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-3-W1/219/2017/isprs-archives-XLII-3-W1-219-2017.pdf

A ROBUST METHOD FOR STEREO VISUAL ODOMETRY BASED ON MULTIPLE EUCLIDEAN DISTANCE CONSTRAINT AND RANSAC ALGORITHM

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W1, 2017 2017 International Symposium on Planetary Remote Sensing and Mapping, 13–16 August 2017, Hong Kong A ROBUST METHOD FOR STEREO VISUAL ODOMETRY BASED ON MULTIPLE EUCLIDEAN DISTANCE CONSTRAINT AND RANSAC ALGORITHM Qi Zhoua, Xiaohua Tong a, Shijie Liu a, Xiaojun Lu b, Sicong Liu a, Peng Chena,Yanming Jina, Huan Xiea a College of Surveying and Geo-Informatics, Tongji University, Shanghai, China – b China International Engineering Consulting Corporation, Beijing, China Commission III, WG III/2 KEY WORDS: Visual Odometry, Stereo Vision, Robot Navigation, RANSAC Algorithm ABSTRACT: Visual Odometry (VO) is a critical component for planetary robot navigation and safety. It estimates the ego-motion using stereo images frame by frame. Feature points extraction and matching is one of the key steps for robotic motion estimation which largely influences the precision and robustness. In this work, we choose the Oriented FAST and Rotated BRIEF (ORB) features by considering both accuracy and speed issues. For more robustness in challenging environment e.g., rough terrain or planetary surface, this paper presents a robust outliers elimination method based on Euclidean Distance Constraint (EDC) and Random Sample Consensus (RANSAC) algorithm. In the matching process, a set of ORB feature points are extracted from the current left and right synchronous images and the Brute Force (BF) matcher is used to find the correspondences between the two images for the Space Intersection. Then the EDC and RANSAC algorithms are carried out to eliminate mismatches whose distances are beyond a predefined threshold. Similarly, when the left image of the next time matches the feature points with the current left images, the EDC and RANSAC are iteratively performed. After the above mentioned, there are exceptional remaining mismatched points in some cases, for which the third time RANSAC is applied to eliminate the effects of those outliers in the estimation of the ego-motion parameters (Interior Orientation and Exterior Orientation). The proposed approach has been tested on a real-world vehicle dataset and the result benefits from its high robustness. 1. INTRODUCTION Autonomous navigation is quite significant for many robotic applications such as planetary exploration and auto drive. For these robotic applications, Visual Odometry is the critical method for relative locating, especially in GPS-denied environments. VO estimates the ego-motion of robot using a single or stereo cameras, which is more accurate than the conventional wheel odometry according to Maimone et al. (2007a). VO is a specific application of Structure From Motion (SFM), which contains the camera pose estimation and 3D scene point reconstruction according to Scaramuzza et al. (2011). Simultaneously, VO differs from the SLAM (Simultaneous Localization And Mapping), which contains the mapping process and loop closure (Engel et al., 2015; Pire et al., 2015). In the last few decades, VO has been divided into two kinds of method, monocular and stereo cameras. For monocular VO, the main issue is solving the scale ambiguity problem. Some researchers set the translation scale between the two consecutive frames to a predefined value. Ke and Kanade (2003) virtually rotate the camera to the bottom-view pose, which eliminates the ambiguity between the rotation and translation and improves the motion estimation process. On the other side, some researchers assume that the environment around the monocular camera is flat ground and the monocular camera is equipped on a fixed height with fixed depression angle, like the situation in Bajpai et al. (2016a). According to Bajpai et al. (2016a), the advantage  of monocular VO method is smaller computational cost compared to the stereo VO, which is quite important for those real-time embedded applications. For large robotic platforms with strong computational ability like automatic drive platforms and future planetary exploration robots, the stereo cameras perform superior to the monocular one. Because of the certain baseline between the left and right camera, the ambiguity scale problem does not exist in stereo VO. And the Stereo VO can estimate the 6-Degree of Freedom (DOF) ego-motion no matter what kinds of environment the system works in. Currently, there are two kinds of stereo VO methods, 3D-3D method and 3D-2D method. In both kinds of stereo VO method, feature detection and matching great influence both the accuracy and speed issues. In feature point matching field, there are many feature point detectors and descriptors having been presented in last twenty years. Scale-Invariant Feature Transform (SIFT) invented by Lowe (1999) is the most famous one because of its excellent detecting accuracy and robustness. Bay et al. (2006) presents the Speeded Up Robust Feature (SURF), which is an improved version of SIFT. It uses Haar wavelet to approximate the gradient method in SIFT, using integral image technology at the same time to calculate fast. In most cases, its performance can reach the same level precision compared to SIFT, with 3-7 times faster. For those cases with very fast speed issue, Oriented FAST and Rotated BRIEF (ORB) is employed by Rublee et al. (2011). Corresponding author This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-3-W1-219-2017 | © Authors 2017. CC BY 4.0 License. 219 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W1, 2017 2017 International Symposium on Planetary Remote Sensing and Mapping, 13–16 August 2017, Hong Kong 3D-3D method treats the stereo cameras as the point cloud generator, which make use the stereo cameras to generate 3D point cloud and estimates the rotation and translation between two consecutive frames using 3D point cloud registration method like Iterative Closest Point (ICP) algorithm in Balazadegan et al. (2016a). ICP algorithm can only converge to the local minimum, which differs from VO propose. Therefore, we must obtain a good initial value for the VO motion estimation parameters according to Hong et al. (2015a). On the other side, the aim of the 3D-2D VO method is to solve the Perspective-n-Point (PnP) problem. According to Scaramuzza et al. (2011), 3D-2D method is more accurate than the 3D-3D method, therefore 3D-2D method has received attention in both Photogrammetry like McGlove et al. (2004a) and Computer Vision like Hartley and Zisserman (2000a). The least feature points needed in PnP problem are 3, which called P3P problem. Gao et al. (2003a) presents a solution to the 3point algorithm for P3P problem. For n>3 points, some more accurate but slower methods based on iteration exist presented by Quan and Lan (1999a) and Ansar and Daniilidis (2003a). In 3D-2D method, the outlier elimination is quite important because the preci (...truncated)