Modeling Bottom-Up Visual Attention Using Dihedral Group D4 §
SS symmetry
Article
Modeling Bottom-Up Visual Attention Using
Dihedral Group D4 †
Puneet Sharma
Department of Engineering & Safety (IIS-IVT), UiT-The Arctic University of Norway, Tromsø-9037, Norway;
; Tel.: +47-776-60391
† This paper is an extended version of my paper published in 11th International Symposium on Visual
Computing (ISVC 2015).
Academic Editors: Marco Bertamini and Lewis Griffin
Received: 27 April 2016; Accepted: 9 August 2016; Published: 15 August 2016
Abstract: In this paper, first, we briefly describe the dihedral group D4 that serves as the basis for
calculating saliency in our proposed model. Second, our saliency model makes two major changes in
a latest state-of-the-art model known as group-based asymmetry. First, based on the properties of
the dihedral group D4 , we simplify the asymmetry calculations associated with the measurement
of saliency. This results is an algorithm that reduces the number of calculations by at least half
that makes it the fastest among the six best algorithms used in this research article. Second, in
order to maximize the information across different chromatic and multi-resolution features, the color
image space is de-correlated. We evaluate our algorithm against 10 state-of-the-art saliency models.
Our results show that by using optimal parameters for a given dataset, our proposed model can
outperform the best saliency algorithm in the literature. However, as the differences among the (few)
best saliency models are small, we would like to suggest that our proposed model is among the
best and the fastest among the best. Finally, as a part of future work, we suggest that our proposed
approach on saliency can be extended to include three-dimensional image data.
Keywords: image analysis; saliency
1. Introduction
While searching for a person on a busy street, we look at people while neglecting other aspects
of the scene, such as road signs, buildings and cars. However, in the absence of the given task, we
would pay attention to different features of the same scene. In the literature [1], it is described as a
combination of two different mechanisms: top-down and bottom-up.
Top-down pertains to how a target object is defined or described in the scene; for instance, while
searching for a person, we would start by selecting all people in the scene as likely candidates and
disregard the candidates that do not match the features of the target person until the correct person
is found. To model this, we need a description of the scene in terms of all of the objects, and the
unique features associated with each object, such that the uniqueness of the features can be used for
distinguishing similar objects from one another. Given the sheer number of man-made and natural
objects in our daily lives and the ambiguity associated with the definition of an object itself makes the
modeling of top-down mechanisms perplexing. To this end, recent attempts have been made by [2,3]
using machine learning-based methods.
Bottom-up (also known as visual saliency) mechanisms are associated with the attributes of a scene
that draw our attention to a particular location. These low-level image attributes include: motion, color,
contrast and brightness [4]. Bottom-up mechanisms are involuntary and faster compared to top-down
ones [1]. For instance, a red object among green objects and an object placed horizontally among
vertical objects are some stimuli that would automatically capture our attention in the environment.
Symmetry 2016, 8, 79; doi:10.3390/sym8080079
www.mdpi.com/journal/symmetry
Symmetry 2016, 8, 79
2 of 14
Owing to the limited number of low-level image attributes, modeling visual saliency is relatively
less complex.
In the past two decades, modeling visual saliency has generated much interest in the research
community. In addition to contributing towards the understanding of human vision, it has also paved
the way for a number of computer and machine vision applications. These applications include:
image and video compression [5–8], robot localization [9,10], image retrieval [11], image and video
quality assessment [12,13], dynamic lighting [14], advertisement [15], artistic image rendering [16]
and human-robot interaction [17,18]. In salient object detection, the applications include: target
detection [19], image segmentation [20,21] and image resizing [22,23].
In a recent study by Alsam et al. [24,25], it was proposed that asymmetry can be used as a measure
of saliency. In order to calculate the asymmetry of an image region, the authors used dihedral group
D4 , which is the symmetry group of the square. D4 consists of eight group elements, namely rotation
by 0, 90, 180 and 270 degrees and reflection about the horizontal, vertical and two diagonal axes.
The saliency maps obtained from their algorithm show good correspondence with the saliency maps
calculated from the classic visual saliency model by Itti et al. [26].
Inspired by the fact that bottom-up calculations are fast, in this paper, we use the symmetries
present in the dihedral group D4 to make the calculations associated with the D4 group elements
simpler and faster to implement. In doing so, we modify the saliency model proposed by
Alsam et al. [24,25]. For details, please see Section 3.
Next, we are motivated by the study by Garcia-Diaz et al. [27], which implies that in order
to quantify distinct information in a scene, our visual system de-correlates its chromatic and
multi-resolution features. Based on this, we perform the de-correlation of the input color image
by calculating its principal components (details in Section 3.3).
2. Theory
A dihedral group Dn is the group of symmetries of an n-sided regular polygon, i.e., all sides have
the same length, and all angles are equal. Dn has n rotational symmetries and n reflection symmetries.
In other words, it has n axes of symmetry and 2n different symmetries [28]. For instance, the polygons
for n = 3, 4, 5 and 6 and the associated reflection symmetries are shown in Figure 1. Here, we can see
that when n is odd, each axis of symmetry connects the vertex with the midpoint of the opposite side.
When n is even, there are n/2 symmetry axes connecting the midpoints of opposite sides and n/2
symmetry axes connecting opposite vertices.
Figure 1. Polygons for n = 3, 4, 5 and 6 and the associated reflection symmetries. Here, we can see
that when n is odd, each axis of symmetry connects the vertex with the midpoint of the opposite side.
When n is even, there are n/2 symmetry axes connecting the midpoints of opposite sides and n/2
symmetry axes connecting opposite vertices.
A group is a set G together with a binary operation ∗ on its elements. This operation ∗ must
behave such that:
(i)
G must be closed under ∗, that is for every pair of elements g1 , g2 in G, we must have that g1 ∗ g2
is again an element in G.
Symmetry 2016, 8, 79
(ii)
3 of 14
The operation ∗ must be associative, that is for all elements g1 , g2 , g3 in G, we (...truncated)