An Image Statistics–Based Model for Fixation Prediction (pdf)

Article PDF cannot be displayed. You can download it here:

http://link.springer.com/content/pdf/10.1007%2Fs12559-010-9087-7.pdf

An Image Statistics–Based Model for Fixation Prediction

Victoria Yanulevskaya 0 1 Jan Bernard Marsman 0 1 Frans Cornelissen 0 1 Jan-Mark Geusebroek 0 1 0 J. B. Marsman F. Cornelissen Laboratory for Experimental Ophthalmology, School of Behavioural and Cognitive Neurosciences, University Medical Center Groningen , PO Box 30.001, 9700, RB, Groningen, The Netherlands 1 V. Yanulevskaya (&) J.-M. Geusebroek Intelligent Systems Lab Amsterdam, Informatics Institute University of Amsterdam , Postbus 94323, 1090, GH, Amsterdam, The Netherlands The problem of predicting where people look at, or equivalently salient region detection, has been related to the statistics of several types of low-level image features. Among these features, contrast and edge information seem to have the highest correlation with the fixation locations. The contrast distribution of natural images can be adequately characterized using a two-parameter Weibull distribution. This distribution catches the structure of local contrast and edge frequency in a highly meaningful way. We exploit these observations and investigate whether the parameters of the Weibull distribution constitute a simple model for predicting where people fixate when viewing natural images. Using a set of images with associated eye movements, we assess the joint distribution of the Weibull parameters at fixated and non-fixated regions. Then, we build a simple classifier based on the log-likelihood ratio between these two joint distributions. Our results show that as few as two values per image region are already enough to achieve a performance comparable with the state-of-theart in bottom-up saliency prediction. - While observing the world around us, we constantly shift our gaze from point to point to visually sample our surrounding. These shifts are not random but are driven by visual stimuli, like simple variations in contrast or colour [1, 19, 26, 30], or the presence of faces [5]. The visual projection of the world on our eye is not random either, but highly organized and structured. The latter is reflected in the spatial statistics of the perceived scene, whose regularities are captured by the statistical laws of natural images [11]. Therefore, one would expect eye-fixations to be closely connected with the laws of natural image statistics. In this work, we study in how far a direct connection can be established between image statistics and locations of eyefixations. Low-level visual features are the basis from which many saliency indicators have been derived. Itti et al. [19], followed by others [15, 22, 31], construct a biologically inspired saliency map by considering colour, contrast, and orientation features at various scales. The model combines a total of 42 feature maps into a single saliency map, resulting in the labelling of regions that deviate from the average for these features. Their influential approach has set a standard in saliency prediction. However, it is unclear how much these 42 features contribute to the fixation prediction and whether it is necessary to consider all of them. Reinagel and Zador [30] take the fixation locations as a starting point for analysis. They consider the difference between the image statistics of fixated and non-fixated image locations. The issue here is how to choose plausible image features from which to derive eye movements. A number of image regularities have been considered, see [1] for an overview. Most researchers [29, 30, 38] confirm that contrast and edges yield significant difference between their statistics of fixated and non-fixated locations. In the field of natural image statistics, Geusebroek and Smeulders [14] have shown the two-parameter Weibull distribution to describe the local contrast statistics adequately. They show that both contrast and edge frequency are simultaneously captured by the Weibull distribution, conjecturing that its parameters might be relevant in fixation prediction. Scholte et al. [34] examined to which degree the brain is sensitive to these parameters and found a correlation of 84 and 93%, respectively, between the two Weibull parameters and a simple model of the parvo- and magnocellular system. Given these results, one would expect image contrasts around fixation locations to reflect these Weibull statistics. The central issue addressed in this paper is the following: Do the parameters of the Weibull distribution predict locations of eye-fixations? If so, the Weibull distribution can be used as, or might even be ground for, a simple predictor of fixation locations. Our approach elaborates on the work of Zhang et al. [41]. They infer bottom-up saliency from the information gain between the local contrast in a given image when compared against the average statistics over a larger image collection, as parameterized by a Generalized Gaussian distributiona cousin of the Weibull family [14]. Our approach aims at learning the parameters of local statistics as parameterized by the Weibull distribution at fixated and non-fixated locations. As such, saliency is expressed by the likelihood of the parameters of the distribution to occur in scenes, the parameters being tuned to the statistics of local scene content. We show that, using as few as two parameters of such a simple Weibull model, we obtain prediction of fixation locations comparable with the state-of-the-art in bottom-up saliency [4]. We treat eye-fixation prediction as a two-class classification problem. The salient class consists of fovea-sized (1 , which is 30 pixels in our experiments) regions around fixated locations, and the rest of the image is considered as the non-salient class. Our approach is based on the assessment of local image statistics which are learned for salient and non-salient classes. Particularly, we model the distribution of the regional colour gradient magnitude responses with the Weibull distribution as discussed below. The classification decision is based on the log-likelihood ratio with null hypothesis that the Weibull parameters describe the salient region, and alternative hypothesis that the Weibull parameters describe the non-salient region. The proposed method is summarized in Fig. 1. To determine the non-fixated locations for an image, we follow [1] and randomly select the fixated locations from different images, which are at least 1 , i.e. fovea size, apart from the fixations on the current image. As a result, we have the same number of fixated and non-fixated regions per image. This way of selecting non-fixated locations ensures similar distributions of fixated and non-fixated regions [1]. Feature Extraction In our approach, we model local colour contrast statistics with the Weibull distribution. After that, we estimate the joint distribution of the Weibull parameters at the fixated and non-fixated regions. Colour Contrast Colour contrast of an image is determined by the gradient magnitude, calculated using Gaussian derivative filters, We follow [13] and convert RGB values to an opponent colour space with (...truncated)