ROBUST FEATURE MATCHING IN TERRESTRIAL IMAGE SEQUENCES
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3, 2018
ISPRS TC III Mid-term Symposium “Developments, Technologies and Applications in Remote Sensing”, 7–10 May, Beijing, China
ROBUST FEATURE MATCHING IN TERRESTRIAL IMAGE SEQUENCES
A. Abbas1 , S. Ghuffar2 ∗
1, 2
Geospatial Research and Education Lab (GREL)
Dept. of Space Science, Institute of Space Technology, Islamabad, Pakistan
(ahsan, sajid.ghuffar)@grel.ist.edu.pk
Commission III, Urban Sensing and Mobility
KEY WORDS: Feature Detection, Feature Matching, SIFT, SURF, RANSAC, 3D Reconstruction
ABSTRACT:
From the last decade, the feature detection, description and matching techniques are most commonly exploited in various photogrammetric and computer vision applications, which includes: 3D reconstruction of scenes, image stitching for panoramic creation, image
classification, or object recognition etc. However, in terrestrial imagery of urban scenes contains various issues, which include duplicate and identical structures (i.e. repeated windows and doors) that cause the problem in feature matching phase and ultimately lead
to failure of results specially in case of camera pose and scene structure estimation. In this paper, we will address the issue related to
ambiguous feature matching in urban environment due to repeating patterns.
1. INTRODUCTION
Many photogrammetric and computer vision applications are relying on more than one image of same scene or object. In order
to relate images to one another, the corresponding points of same
scene (3D features) are need to be matched across those images.
From the last few years, image feature detectors and descriptors
are most widely used techniques for such applications which includes 3D scene reconstruction, panoramic mosaicking/stitching,
image classification, object recognition and robot localization
etc., all are depends upon the presence of stable and representative features in an image space. Thus, the image features detection and extraction are important steps for these applications
(Hassaballah et al., 2016).
Nowadays there are number of algorithms available for feature detectors and descriptors, which provide region of interest, edges or corners (Remondino, n.d.) the most common of
them are Speeded Up Robust Features (SURF) (Bay et al., 2006),
Scale Invariant Feature Transform (SIFT) (Lowe, 2004), Features
from Accelerated Segment Test (FAST) (Rosten and Drummond,
2005) or Binary Robust Invariant Scalable Key points (BRISK)
(Leutenegger et al., 2011) etc. Ideally the feature matching characteristics reported by (Haralick and Shapiro, 1992) are: invariant (independent from geometric and radiometric distortions),
stability (robust against image noise), distinctness (clearly distinguish from background) and uniqueness (distinguishable from
other points).
The feature detection and matching can be split into three steps.
1) Detection: find the keypoints in each images. 2) Description:
Ideally, the local appearance around each feature point should
be invariant to scale, rotation, noise, change in illuminations and
affine transformations. The distinctive feature descriptors are calculated from each region by picking the neighborhood region
around the every key point. Normally we end up with a descriptor vector for each keypoint. 3) Matching: To identify similar
∗ Corresponding author
features, descriptors are compared across the images. In successfully matched features we may get the pairs of (xi , yi ) ↔
(xi , yi ). Where (xi , yi ) is features in first image and (xi , yi ) is
the matched feature in other image.
However in terrestrial imagery of the urban scenes, there are
many repeated feature patterns, nearly identical or duplicate
structures with similar texture patters, which ultimately cause the
problems in feature matching and subsequently lead to applications result failure (e.g. sparse scene 3D reconstruction). Removal of these incorrect matches is a necessary step to perform
specially in case of urban scenes, where the accurate recovery
of camera pose and scene structure is necessary. Typical feature matching strategies lead to high number of outliers and due
to the fact that the ambiguous matches are parallel to the epipolar lines due to inherent scene geometry and camera motion, robust estimators like RANSAC (used to reject incorrect matches)
sometimes lead to wrong solution of correspondences and camera
poses.
In the current paper, we investigate and discuss the issues related
to ambiguous feature matching using SIFT (Vedaldi and Fulkerson, 2008) and SURF (MATLAB based Implementation) algorithms in urban environment due to repeating patterns that ultimately lead to false camera pose estimation for scene reconstruction. We also provide advices and suggestions about the removal
of these known issues. The reason of using SIFT and SURF descriptors is due to their good performance and are widely used
technique in many applications.
2. RELATED WORK
In urban scene architecture, symmetry and repetition in designs
are most commonly used. The buildings contain hierarchy of
symmetries and repetitions on frontage: for example windows
and doors, which excessively appears along the horizontal direction. Changchang Wu et al. (Wu et al., 2010) presented the technique to find the repeated features on architectural frontal plane
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-3-3-2018 | © Authors 2018. CC BY 4.0 License.
3
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3, 2018
ISPRS TC III Mid-term Symposium “Developments, Technologies and Applications in Remote Sensing”, 7–10 May, Beijing, China
with precise recovery of boundary selection for finding the repetition. There method works well for horizontal direction repetition
and low-count.
Kyle Wilson et al. (Wilson and Snavely, 2013) also presented the
new approach for urban scenes, that contains the repeated features by considering the local visibility graph. There model leads
to highly scalable, fast and simple technique for disambiguating
the repeated elements without solely relying on geometric reasoning. They used the large datasets drawn from internet photo
collections for demonstration of their method and compared it
with other geometry based technique of disambiguation.
Richard Roberts et al. (Roberts et al., 2011) examined the geometric ambiguities caused by existence of duplicate and repeated
structures when different instances are matched on the basis of
visual similarity. They proposed the algorithm that recovers the
true data association (problem of determining the correspondence
either in whole image or feature points) even if there is large number of false pairwise matches exist.
Similarly, the Nianjuan Jiang et al. (Jiang et al., 2012) also
worked on the repetitive scene structure, which cause the issue
in epipolar geometry (EG) due to (...truncated)