Predicting Eye Fixations on Complex Visual Stimuli Using Local Symmetry
Gert Kootstra
0
1
2
Bart de Boer
0
1
2
Lambert R. B. Schomaker
0
1
2
0
L. R. B. Schomaker University of Groningen
, Groningen,
The Netherlands
1
B. de Boer University of Amsterdam
,
Amsterdam, The Netherlands
2
G. Kootstra (&) CAS/CVAP,
Royal Institute of Technology (KTH)
,
100 44 Stockholm, Sweden
Most bottom-up models that predict human eye fixations are based on contrast features. The saliency model of Itti, Koch and Niebur is an example of such contrastsaliency models. Although the model has been successfully compared to human eye fixations, we show that it lacks preciseness in the prediction of fixations on mirror-symmetrical forms. The contrast model gives high response at the borders, whereas human observers consistently look at the symmetrical center of these forms. We propose a saliency model that predicts eye fixations using local mirror symmetry. To test the model, we performed an eye-tracking experiment with participants viewing complex photographic images and compared the data with our symmetry model and the contrast model. The results show that our symmetry model predicts human eye fixations significantly better on a wide variety of images including many that are not selected for their symmetrical content. Moreover, our results show that especially early fixations are on highly symmetrical areas of the images. We conclude that symmetry is a strong predictor of human eye fixations and that it can be used as a predictor of the order of fixation.
-
Humans continuously make eye movements to investigate
the visual environment in an efficient manner. Interesting
parts of the visual field are focused on and inspected with
high acuity. Eye movements are influenced both topdown,
for instance based on the task at hand or past experiences, and
bottom-up, based on properties of the stimulus. Although
both influences play a role, we are only interested in the role
of the stimulus in guiding eye fixations. The questions that
we address in this paper are the following: what are
properties of the stimulus that attract overt visual attention and
can we predict human eye fixations with bottom-up models?
More specifically, we will investigate the role of local
symmetry as an alternative to contrast for the prediction of
eye fixations. We propose saliency models that calculate
the conspicuousness in an image on the basis of mirror
symmetry and discuss the results of comparing these
models to human eye fixations recorded in an eye-tracking
experiment. The main result shows that mirror symmetry is
a better predictor of human gaze than contrast.
The paper is organized as follows. We first discuss the
backgrounds of the presented research. Then, the
symmetrysaliency models are presented, along with the performed
eyetracking experiment and the methods to compare the models
with the human data. Next, the experiments and results are
presented, and we end with a discussion on these results.
When we use the word symmetry in the paper, we refer to
mirror symmetry, unless explicitly stated differently.
In this section, we discuss the backgrounds of the control of
eye movements and the prediction of eye fixations using
saliency models. We furthermore introduce the role of
symmetry in natural vision and computer vision.
Bottom-Up Control of Eye Movements
There are definitely top-down influences on the control of
eye movements [111]. However, in this paper, we focus
on bottom-up visual attention. The role of the stimulus in
the guidance of eye movements has been pointed out in
many studies. Teeuwes [12, 13], for instance, showed that
in a search task, a salient distractor could capture attention.
Even after extended practice, the irrelevant stimulus
influenced the eye movements, and complete top-down
guidance was not possible [14]. Also for more complex
photographic stimuli, overt attention is attracted toward
contrast-manipulated parts of the images [15]. Since the
contrast enhancement did not change the meaning of the
stimulus, this is a clear bottom-up effect on attention.
Mannan et al. [16] concluded that eye movements made
during brief presentation of photographic images are a
response to the spatial features of the image.
We are interested in the role of the stimulus in the
guidance of eye movements. We are specifically interested
in the visual features that can be used to predict human eye
fixations. This gives us insight into the inherent properties
of the stimulus that attract attention. To investigate this, we
propose a saliency model that determines the salient
regions in an image and compare the model to human eye
fixations on the same images. Whereas most existing
saliency models focus on contrast features to determine parts
of the image that stand out from their local environment,
we use local symmetry to predict the eye movements.
Although saliency models exist that combine bottom-up
and top-down factors [1721], in this paper we will focus
on saliency models that base their prediction on the
stimulus. Most existing bottom-up saliency models use contrast
features to determine the saliency in an image. The
influential saliency model of Itti, Koch and Niebur, for instance,
calculates the saliency of an image on the basis of contrast
in three different feature channels: intensity, color and
orientation [22, 23]. The model is based on a biologically
plausible architecture for visual attention [24] and is an
implementation of the feature-integration theory of human
visual search [25]. It can correctly predict human behavior
in visual pop-out experiments [26]. Parkhurst et al. [27]
compared the model to human eye fixations on complex
photographic images. They showed that the saliency at the
points of human fixation, as measured by the model, is
significantly higher than expected by chance. Similarly,
Ouerhani et al. [28] found a positive correlation between
the resulting saliency maps and human fixations.
Other saliency models, like the model of Le Meur et al.
[32] are also based on contrast calculations. They found a
positive correlation between their model and human data,
which was slightly higher than the performance of Itti and
Kochs model. The saliency model of Bruce and Tsotsos
[33] compares the distribution of features in the center to
the surround and defines the saliency based on the contrast
between the two. The center-surround structure also
emerged as the most representative receptive fields when
fitting a non-parametric model to human eye-fixation data
[34]. However, the model used was limited in the way that
it could not result in the concept of symmetry, as we
propose in this paper. Privitera and Stark [35] investigated
a set of simpler contrast-saliency operators. These
operators were also found to predict human fixation points to
some extent.
Although contrast has been the dominant feature for
saliency models, we can see a clear deficiency in the
current visual attention models when we look at Fig. 1. For the
images that are shown in the first column, ou (...truncated)