An Image Statistics–Based Model for Fixation Prediction
Victoria Yanulevskaya
0
1
Jan Bernard Marsman
0
1
Frans Cornelissen
0
1
Jan-Mark Geusebroek
0
1
0
J. B. Marsman F. Cornelissen Laboratory for Experimental Ophthalmology, School of Behavioural and Cognitive Neurosciences, University Medical Center Groningen
, PO Box 30.001, 9700, RB, Groningen,
The Netherlands
1
V. Yanulevskaya (&) J.-M. Geusebroek Intelligent Systems Lab Amsterdam,
Informatics Institute University of Amsterdam
, Postbus 94323, 1090, GH,
Amsterdam, The Netherlands
The problem of predicting where people look at, or equivalently salient region detection, has been related to the statistics of several types of low-level image features. Among these features, contrast and edge information seem to have the highest correlation with the fixation locations. The contrast distribution of natural images can be adequately characterized using a two-parameter Weibull distribution. This distribution catches the structure of local contrast and edge frequency in a highly meaningful way. We exploit these observations and investigate whether the parameters of the Weibull distribution constitute a simple model for predicting where people fixate when viewing natural images. Using a set of images with associated eye movements, we assess the joint distribution of the Weibull parameters at fixated and non-fixated regions. Then, we build a simple classifier based on the log-likelihood ratio between these two joint distributions. Our results show that as few as two values per image region are already enough to achieve a performance comparable with the state-of-theart in bottom-up saliency prediction.
-
While observing the world around us, we constantly shift
our gaze from point to point to visually sample our
surrounding. These shifts are not random but are driven by
visual stimuli, like simple variations in contrast or colour
[1, 19, 26, 30], or the presence of faces [5]. The visual
projection of the world on our eye is not random either, but
highly organized and structured. The latter is reflected in
the spatial statistics of the perceived scene, whose
regularities are captured by the statistical laws of natural
images [11]. Therefore, one would expect eye-fixations to be
closely connected with the laws of natural image statistics.
In this work, we study in how far a direct connection can be
established between image statistics and locations of
eyefixations.
Low-level visual features are the basis from which many
saliency indicators have been derived. Itti et al. [19],
followed by others [15, 22, 31], construct a biologically
inspired saliency map by considering colour, contrast, and
orientation features at various scales. The model combines
a total of 42 feature maps into a single saliency map,
resulting in the labelling of regions that deviate from the
average for these features. Their influential approach has
set a standard in saliency prediction. However, it is unclear
how much these 42 features contribute to the fixation
prediction and whether it is necessary to consider all of
them.
Reinagel and Zador [30] take the fixation locations as a
starting point for analysis. They consider the difference
between the image statistics of fixated and non-fixated
image locations. The issue here is how to choose plausible
image features from which to derive eye movements.
A number of image regularities have been considered, see
[1] for an overview. Most researchers [29, 30, 38] confirm
that contrast and edges yield significant difference between
their statistics of fixated and non-fixated locations.
In the field of natural image statistics, Geusebroek and
Smeulders [14] have shown the two-parameter Weibull
distribution to describe the local contrast statistics
adequately. They show that both contrast and edge frequency
are simultaneously captured by the Weibull distribution,
conjecturing that its parameters might be relevant in
fixation prediction. Scholte et al. [34] examined to which
degree the brain is sensitive to these parameters and found
a correlation of 84 and 93%, respectively, between the two
Weibull parameters and a simple model of the parvo- and
magnocellular system. Given these results, one would
expect image contrasts around fixation locations to reflect
these Weibull statistics.
The central issue addressed in this paper is the
following: Do the parameters of the Weibull distribution predict
locations of eye-fixations? If so, the Weibull distribution
can be used as, or might even be ground for, a simple
predictor of fixation locations.
Our approach elaborates on the work of Zhang et al.
[41]. They infer bottom-up saliency from the information
gain between the local contrast in a given image when
compared against the average statistics over a larger image
collection, as parameterized by a Generalized Gaussian
distributiona cousin of the Weibull family [14]. Our
approach aims at learning the parameters of local statistics
as parameterized by the Weibull distribution at fixated and
non-fixated locations. As such, saliency is expressed by the
likelihood of the parameters of the distribution to occur in
scenes, the parameters being tuned to the statistics of local
scene content. We show that, using as few as two
parameters of such a simple Weibull model, we obtain prediction
of fixation locations comparable with the state-of-the-art in
bottom-up saliency [4].
We treat eye-fixation prediction as a two-class
classification problem. The salient class consists of fovea-sized (1 ,
which is 30 pixels in our experiments) regions around
fixated locations, and the rest of the image is considered as
the non-salient class. Our approach is based on the
assessment of local image statistics which are learned for
salient and non-salient classes. Particularly, we model the
distribution of the regional colour gradient magnitude
responses with the Weibull distribution as discussed below.
The classification decision is based on the log-likelihood
ratio with null hypothesis that the Weibull parameters
describe the salient region, and alternative hypothesis that
the Weibull parameters describe the non-salient region.
The proposed method is summarized in Fig. 1.
To determine the non-fixated locations for an image, we
follow [1] and randomly select the fixated locations from
different images, which are at least 1 , i.e. fovea size, apart
from the fixations on the current image. As a result, we
have the same number of fixated and non-fixated regions
per image. This way of selecting non-fixated locations
ensures similar distributions of fixated and non-fixated
regions [1].
Feature Extraction
In our approach, we model local colour contrast statistics
with the Weibull distribution. After that, we estimate the
joint distribution of the Weibull parameters at the fixated
and non-fixated regions.
Colour Contrast
Colour contrast of an image is determined by the gradient
magnitude, calculated using Gaussian derivative filters,
We follow [13] and convert RGB values to an opponent
colour space with (...truncated)