AUTOMATIC DETECTION AND RECOGNITION OF MAN-MADE OBJECTS IN HIGH RESOLUTION REMOTE SENSING IMAGES USING HIERARCHICAL SEMANTIC GRAPH MODEL
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences,
Volume XL-1/W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany
AUTOMATIC DETECTION AND RECOGNITION OF MAN-MADE OBJECTS
IN HIGH RESOLUTION REMOTE SENSING IMAGES
USING HIERARCHICAL SEMANTIC GRAPH MODEL
X. Sun a,b,c *, A. Thiele a, S. Hinz a, K. Fu b,c
a
Institute of Photogrammetry and Remote Sensing (IPF), Karlsruhe Institute of Technology (KIT),
Karlsruhe, Germany
b
Institute of Electronic, Chinese Academy of Sciences, Beijing, China
c
Key Laboratory of Spatial Information Processing and Application System Technology,
Chinese Academy of Sciences, Beijing, China
Email: , (antje.thiele, stefan.hinz)@kit.edu,
KEY WORDS: Objects detection, Objects recognition, High resolution remote sensing images, Semantic graph model
ABSTRACT:
In this paper, we propose a hierarchical semantic graph model to detect and recognize man-made objects in high resolution remote
sensing images automatically. Following the idea of part-based methods, our model builds a hierarchical possibility framework to
explore both the appearance information and semantic relationships between objects and background. This multi-levels structure is
promising to enable a more comprehensive understanding of natural scenes. After training local classifiers to calculate parts
properties, we use belief propagation to transmit messages quantitatively, which could enhance the utilization of spatial constrains
existed in images. Besides, discriminative learning and generative learning are combined interleavely in the inference procedure, to
improve the training error and recognition efficiency. The experimental results demonstrate that this method is able to detect manmade objects in complicated surroundings with satisfactory precision and robustness.
them to reflect the variances between different appearances and
sizes accurately.
Kannan et. al (2007) thus proposed a ‘jigsaw’ model, and the
shapes, size of parts are learned from the repeated structures in
a set of training images. By learning such irregularly shaped
pieces, both the shape and the scale of parts can be discovered
without supervision. Also, Ni et. al (2009) made some
improvements, by constructing a generative model to capture
the appearance and geometric structure of the whole scenes.
Their models suffer from errors in scenes containing complicate
contents because they only rely on single level processing.
Furthermore, their descriptions do not make full use of spatial
relations existed in images, particularly the ones with various
background clutters.
In this paper, we propose a specific hierarchical semantic graph
model. Unlike traditional parts-based approaches, this model
can yield more comprehensive understanding of images. It can
not only build the semantic constrains between objects and
background at high level, but also reinforces the geometrical
relations between different components at low level. Our model
also uses belief propagation to enhance the utilization of spatial
information existed in scenes, by training local classifiers. This
is done to calculate parts properties and using messages to
transmit their semantic relationships quantitatively. Besides,
discriminative learning and generative learning are combined in
inference procedure interleavely, to improve the training and
recognition efficiency. The experiments on our dataset
demonstrate that it can detect and recognize man-made objects
in high resolution remote sensing images with satisfactory
precision and robustness.
In the following, section 2 explains the hierarchical semantic
model. Section 3 introduces the procedure of messages
propagation, and section 4 illustrates the flow of hybrid
1. INTRODUCTION
With the development of remote sensing technology, a large
number of high-resolution remote sensing images are available,
which can provide us geo-spatial information in detail. The task
of interpreting various types of man-made objects has become a
key problem in remote sensing image analysis.
Many approaches have been proposed for object detection and
recognition, using textural features, wavelet filters, and so on.
Since most of man-made objects are complex structures and
surrounded by disturbing background, the mentioned low-level
methods can not detect objects as accurately as expected.
Besides holistic approaches some parts-based models have been
introduced, following the theory that man-made objects can be
taken as a composition of features or sub-objects according to
certain spatial rules.
Initially, those works used simple primitives to describe parts,
like structured lines or curves, and defined the relationships by
numbers or ratio between adjacent ones. Obviously, those
descriptors are too simple to explore useful information in
images. Later, Webber et. al (2000) represent objects as
constellations of rigid parts, and recognized objects with a join
probability density function on the shape of rigid parts by
similarity matching. Fergus et. al (2003) and Opelt et. al (2004)
proposed category models composed of some more flexible
parts, and estimated the parameters of the parts using
expectation-maximization algorithm. Leibe et. al (2004)
introduced an implicit shape model which organizes different
contour fragments to extract objects from cluttered scenes.
Vijayanarasimhan & Grauman (2008) also presented an
unsupervised learning method to analyze objects by calculating
relationship between their parts. However, the parts in those
methods are mostly pre-defined, which means it is difficult for
* Corresponding author
333
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences,
Volume XL-1/W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany
The node B in M is associated with an offset vector
l i = (lix , liy , liz ) to describe its spatial information, where lix and
inference. Section 5 and section 6 give the experimental results
and conclusion.
liy are the offset value of node coordinate, liz is the offset value
of node layer. Then, we can build a mapping function between
segments in training image and nodes in semantic graph as:
l i = (t i − ri ) mod G
where
(a) Level 1
(b) Level 2
(c) Level 3
Figure 1. Multi-segmentation results
(2)
t i = original vector of segments in I
ri = semantic vector of nodes in G
G = dimension of graph G
The offset vector can be calculated as following:
I1
⎧lix = tix − rix
⎪
⎨liy = tiy − riy
⎪l = t − r
⎩ iz iz iz
M1
where
(3)
tix , tiy , tiz = center coordinates and layer of t i
rix , rix , riy = center coordinates and layer of ri
G
Mn
In
Figure 2. Hierarchical semantic graph model
It is easy to deduce that if two adjacent segments have the same
offset values in an image, they should also be adjacent in
mapping graph. We design following criterion to evaluate this
consistent relationship:
2. HIERARCHICAL SEMANTIC GRAPH MODEL
Though remote sensing images (...truncated)