Predicting Eye Fixations on Complex Visual Stimuli Using Local Symmetry (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs12559-010-9089-5.pdf

Predicting Eye Fixations on Complex Visual Stimuli Using Local Symmetry

Gert Kootstra 0 1 2 Bart de Boer 0 1 2 Lambert R. B. Schomaker 0 1 2 0 L. R. B. Schomaker University of Groningen , Groningen, The Netherlands 1 B. de Boer University of Amsterdam , Amsterdam, The Netherlands 2 G. Kootstra (&) CAS/CVAP, Royal Institute of Technology (KTH) , 100 44 Stockholm, Sweden Most bottom-up models that predict human eye fixations are based on contrast features. The saliency model of Itti, Koch and Niebur is an example of such contrastsaliency models. Although the model has been successfully compared to human eye fixations, we show that it lacks preciseness in the prediction of fixations on mirror-symmetrical forms. The contrast model gives high response at the borders, whereas human observers consistently look at the symmetrical center of these forms. We propose a saliency model that predicts eye fixations using local mirror symmetry. To test the model, we performed an eye-tracking experiment with participants viewing complex photographic images and compared the data with our symmetry model and the contrast model. The results show that our symmetry model predicts human eye fixations significantly better on a wide variety of images including many that are not selected for their symmetrical content. Moreover, our results show that especially early fixations are on highly symmetrical areas of the images. We conclude that symmetry is a strong predictor of human eye fixations and that it can be used as a predictor of the order of fixation. - Humans continuously make eye movements to investigate the visual environment in an efficient manner. Interesting parts of the visual field are focused on and inspected with high acuity. Eye movements are influenced both topdown, for instance based on the task at hand or past experiences, and bottom-up, based on properties of the stimulus. Although both influences play a role, we are only interested in the role of the stimulus in guiding eye fixations. The questions that we address in this paper are the following: what are properties of the stimulus that attract overt visual attention and can we predict human eye fixations with bottom-up models? More specifically, we will investigate the role of local symmetry as an alternative to contrast for the prediction of eye fixations. We propose saliency models that calculate the conspicuousness in an image on the basis of mirror symmetry and discuss the results of comparing these models to human eye fixations recorded in an eye-tracking experiment. The main result shows that mirror symmetry is a better predictor of human gaze than contrast. The paper is organized as follows. We first discuss the backgrounds of the presented research. Then, the symmetrysaliency models are presented, along with the performed eyetracking experiment and the methods to compare the models with the human data. Next, the experiments and results are presented, and we end with a discussion on these results. When we use the word symmetry in the paper, we refer to mirror symmetry, unless explicitly stated differently. In this section, we discuss the backgrounds of the control of eye movements and the prediction of eye fixations using saliency models. We furthermore introduce the role of symmetry in natural vision and computer vision. Bottom-Up Control of Eye Movements There are definitely top-down influences on the control of eye movements [111]. However, in this paper, we focus on bottom-up visual attention. The role of the stimulus in the guidance of eye movements has been pointed out in many studies. Teeuwes [12, 13], for instance, showed that in a search task, a salient distractor could capture attention. Even after extended practice, the irrelevant stimulus influenced the eye movements, and complete top-down guidance was not possible [14]. Also for more complex photographic stimuli, overt attention is attracted toward contrast-manipulated parts of the images [15]. Since the contrast enhancement did not change the meaning of the stimulus, this is a clear bottom-up effect on attention. Mannan et al. [16] concluded that eye movements made during brief presentation of photographic images are a response to the spatial features of the image. We are interested in the role of the stimulus in the guidance of eye movements. We are specifically interested in the visual features that can be used to predict human eye fixations. This gives us insight into the inherent properties of the stimulus that attract attention. To investigate this, we propose a saliency model that determines the salient regions in an image and compare the model to human eye fixations on the same images. Whereas most existing saliency models focus on contrast features to determine parts of the image that stand out from their local environment, we use local symmetry to predict the eye movements. Although saliency models exist that combine bottom-up and top-down factors [1721], in this paper we will focus on saliency models that base their prediction on the stimulus. Most existing bottom-up saliency models use contrast features to determine the saliency in an image. The influential saliency model of Itti, Koch and Niebur, for instance, calculates the saliency of an image on the basis of contrast in three different feature channels: intensity, color and orientation [22, 23]. The model is based on a biologically plausible architecture for visual attention [24] and is an implementation of the feature-integration theory of human visual search [25]. It can correctly predict human behavior in visual pop-out experiments [26]. Parkhurst et al. [27] compared the model to human eye fixations on complex photographic images. They showed that the saliency at the points of human fixation, as measured by the model, is significantly higher than expected by chance. Similarly, Ouerhani et al. [28] found a positive correlation between the resulting saliency maps and human fixations. Other saliency models, like the model of Le Meur et al. [32] are also based on contrast calculations. They found a positive correlation between their model and human data, which was slightly higher than the performance of Itti and Kochs model. The saliency model of Bruce and Tsotsos [33] compares the distribution of features in the center to the surround and defines the saliency based on the contrast between the two. The center-surround structure also emerged as the most representative receptive fields when fitting a non-parametric model to human eye-fixation data [34]. However, the model used was limited in the way that it could not result in the concept of symmetry, as we propose in this paper. Privitera and Stark [35] investigated a set of simpler contrast-saliency operators. These operators were also found to predict human fixation points to some extent. Although contrast has been the dominant feature for saliency models, we can see a clear deficiency in the current visual attention models when we look at Fig. 1. For the images that are shown in the first column, ou (...truncated)