Separability and Commonality of Auditory and Visual Bistable Perception
Cerebral Cortex August 2012;22:1915– 1922
doi:10.1093/cercor/bhr266
Advance Access publication September 30, 2011
Separability and Commonality of Auditory and Visual Bistable Perception
Hirohito M. Kondo1, Norimichi Kitagawa1, Miho S. Kitamura1,2, Ai Koizumi1,3, Michio Nomura1,4,7 and Makio Kashino1,5,6
1
NTT Communication Science Laboratories, NTT Corporation, Atsugi, Kanagawa 243-0198, Japan, 2Research Center for Advanced
Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan, 3Department of Psychology, The University of Tokyo,
Tokyo 113-0033, Japan, 4Division of Human Sciences, Graduate School of Integrated Arts and Sciences, Hiroshima University,
Higashi-Hiroshima, Hiroshima 739-8521, Japan, 5ERATO Shimojo Implicit Brain Function Project, Japan Science and Technology
Agency, Atsugi, Kanagawa 243-0198, Japan and, 6Department of Information Processing, Interdisciplinary Graduate School of
Science and Engineering, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8503, Japan
7
Current address: Department of Cognitive Psychology in Education, Graduate School of Education, Kyoto University,
Kyoto 606-8501, Japan
It is unclear what neural processes induce individual differences in
perceptual organization in different modalities. To examine this
issue, the present study used different forms of bistable perception:
auditory streaming, verbal transformations, visual plaids, and
reversible figures. We performed factor analyses on the number
of perceptual switches in the tasks. A 3-factor model provided
a better fit to the data than the other possible models. These
factors, namely the ‘‘auditory,’’ ‘‘shape,’’ and ‘‘motion’’ factors, were
separable but correlated with each other. We compared the number
of perceptual switches among genotype groups to identify the
effects of neurotransmitter functions on the factors. We focused on
polymorphisms of catechol-O-methyltransferase (COMT) Val158Met
and serotonin 2A receptor (HTR2A) -1438G/A genes, which are
involved in the modulation of dopamine and serotonin, respectively.
The number of perceptual switches in auditory streaming and
verbal transformations differed among COMT genotype groups,
whereas that in reversible figures differed among HTR2A genotype
groups. The results indicate that the auditory and shape factors
reflect the functions of the dopamine and serotonin systems,
respectively. Our findings suggest that the formation and selection
of percepts involve neural processes in cortical and subcortical
areas.
Keywords: awareness, brainstem, consciousness, human, illusion
Introduction
We perceive the world as stable, although sensory inputs are
often ambiguous due to spatial and temporal occluders. This
raises an important question regarding how stable percepts are
formed in the brain. Bistable perception phenomena provide us
with clues enabling us to investigate that issue because
constant physical stimulation leads to spontaneous switching
between different stable percepts. Although in the past, the
formation and selection of percepts have been investigated with
binocular rivalry, reversible figures, and visual plaids in the visual
domain (Kleinschmidt et al. 1998; Tong et al. 1998; CasteloBranco et al. 2002; Haynes et al. 2005; Wunderlich et al. 2005),
more recently, they have been studied with auditory streaming
and verbal transformations in the auditory domain (Gutschalk
et al. 2005; Kondo and Kashino 2007, 2009; Schadwinkel and
Gutschalk 2011). However, individual variation in perceptual
switching has been overlooked in favor of averaging differences
to focus on stimulus-specific commonalities.
Ó The Author 2011. Published by Oxford University Press. All rights reserved.
For permissions, please e-mail:
An early study found a wide range of individual differences in
the rate of binocular rivalry (Pettigrew and Miller 1998), where
monocular images are presented to different eyes. The authors
also pointed out that the rate of binocular rivalry is slow in
patients with bipolar disorder, which is strongly heritable. This
suggests that genetic factors influence the formation and
selection of visual percepts. A large-sample twin heritability study
has demonstrated that an approximately 50% variance in
binocular rivalry rate is accounted for by additive genetic factors
(Miller et al. 2010). In addition, a recent twin study confirmed
that genetic factors affect the switching rate of the reversible
figure as well as that of binocular rivalry (Shannon et al. 2011).
These findings indicate that there is a substantial genetic
contribution to bistable perception, particularly in the visual
domain. However, it is unclear what neural processes are involved
in individual differences in bistable perception and whether
auditory bistability is functionally linked with visual bistability.
The present study elucidated the above issues using different
forms of ambiguous stimuli. Perceptual switches in auditory
streaming and verbal transformations are caused by prolonged
listening to a sound sequence consisting of a triplet tone (van
Noorden 1975; Bregman 1990) and word (Warren and Gregory
1958; Warren 1961). Perceptual switches in visual plaids and
reversible figures are produced by observing moving gratings
(Wallach 1935; Adelson and Movshon 1982; Hupé and Rubin
2003) and static figures (Long and Toppino 2004). There is still
some controversy as to whether spontaneous perceptual
switching is modulated by distributed processes within the
sensory cortices (Pressnitzer and Hupé 2006) or a central
oscillator within the subcortical areas (Pettigrew and Miller
1998). The present study employed 2 different approaches to
clarify the relationship between auditory and visual bistability.
First, we performed factor analyses of the number of the
perceptual switches in the tasks and compared the fit indices
of 1-factor and multifactor models. A factor analysis estimates
the degree to which the variances of the observed variables can
be explained by a small number of latent variables called
factors. Thus, the analysis allows us to specify the underlying
structure among observed and latent variables: The observed
variables are modeled as linear combinations of the common
factors and error terms. If perceptual switching in different
modalities is governed by a single rhythm generator, the 1-factor
model should provide a better fit to the data. Conversely, if
different forms of bistable perception are implemented in
Address correspondence to Hirohito M. Kondo, NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato Wakamiya, Atsugi,
Kanagawa 243-0198, Japan. Email: .
Materials and Methods
Participants
One hundred college students participated in the experiment. They
were right-handed Japanese people with normal or corrected-tonormal vision and with normal hearing. None had any history of
neurological or psychiatric illness. All participants gave written
informed consent, which was approved by the ethics committee of
NTT Co (...truncated)