Temporal mechanisms of multimodal binding

Proceedings of the Royal Society B: Biological Sciences, May 2009

The simultaneity of signals from different senses—such as vision and audition—is a useful cue for determining whether those signals arose from one environmental source or from more than one. To understand better the sensory mechanisms for assessing simultaneity, we measured the discrimination thresholds for time intervals marked by auditory, visual or auditory–visual stimuli, as a function of the base interval. For all conditions, both unimodal and cross-modal, the thresholds followed a characteristic ‘dipper function’ in which the lowest thresholds occurred when discriminating against a non-zero interval. The base interval yielding the lowest threshold was roughly equal to the threshold for discriminating asynchronous from synchronous presentations. Those lowest thresholds occurred at approximately 5, 15 and 75 ms for auditory, visual and auditory–visual stimuli, respectively. Thus, the mechanisms mediating performance with cross-modal stimuli are considerably slower than the mechanisms mediating performance within a particular sense. We developed a simple model with temporal filters of different time constants and showed that the model produces discrimination functions similar to the ones we observed in humans. Both for processing within a single sense, and for processing across senses, temporal perception is affected by the properties of temporal filters, the outputs of which are used to estimate time offsets, correlations between signals, and more.

Article PDF cannot be displayed. You can download it here:

https://rspb.royalsocietypublishing.org/content/276/1663/1761.full.pdf

Temporal mechanisms of multimodal binding

David Burr 1 2 3 Ottavia Silva 0 2 Guido Marco Cicchini 0 2 Martin S. Banks 2 6 Maria Concetta Morrone 2 4 5 0 Faculty of Psychology, Universita` Vita-Salute San Raffaele , via Olgettina 58, Milan 20132 , Italy 1 School of Psychology, University of Western Australia , Nedlands, Western Australia 6009 , Australia 2 Universita` Degli Studi di Firenze , via S. Nicolo` 89, Florence 50125 , Italy 3 Department of Psychology, Universita` Degli Studi di Firenze , via S. Nicolo` 89, Florence 50125 , Italy 4 Scientific Institute Stella Maris , Calambrone, Pisa 56018 , Italy 5 Department of Physiological Sciences, University of Pisa , Via S. Zeno 36, Pisa 56100 , Italy 6 Department of Psychology, School of Optometry, Vision Science Program, University of California , Berkeley, CA 94720 , USA The simultaneity of signals from different senses-such as vision and audition-is a useful cue for determining whether those signals arose from one environmental source or from more than one. To understand better the sensory mechanisms for assessing simultaneity, we measured the discrimination thresholds for time intervals marked by auditory, visual or auditory-visual stimuli, as a function of the base interval. For all conditions, both unimodal and cross-modal, the thresholds followed a characteristic 'dipper function' in which the lowest thresholds occurred when discriminating against a non-zero interval. The base interval yielding the lowest threshold was roughly equal to the threshold for discriminating asynchronous from synchronous presentations. Those lowest thresholds occurred at approximately 5, 15 and 75 ms for auditory, visual and auditory-visual stimuli, respectively. Thus, the mechanisms mediating performance with cross-modal stimuli are considerably slower than the mechanisms mediating performance within a particular sense. We developed a simple model with temporal filters of different time constants and showed that the model produces discrimination functions similar to the ones we observed in humans. Both for processing within a single sense, and for processing across senses, temporal perception is affected by the properties of temporal filters, the outputs of which are used to estimate time offsets, correlations between signals, and more. 1. INTRODUCTION One of the most complex tasks for the brain is to combine the information from the five senses into a single perceptual experience. Several studies have shown that the integration of information between senses increases perceptual precision and accuracy (Ernst & Banks 2002; Gepshtein & Banks 2003; Alais & Burr 2004). However, it is crucial that only appropriate information can be integrated because the integration of information from different environmental sources would be generally detrimental. One cue for when to integrate across modalities could be temporal coincidence: if sensory events (such as a flash and a sound) occur at the same time, there is a good probability that they originated from the same source. But determining simultaneity of external sources is not an easy matter for the brain because the arrival time of neural signals depends on many factors, including variable latencies in sensory transduction and neural transmission, and, for sound, significant physical delays in transmission. Any coincidence detector has to be flexible and adaptable. A good deal of evidence suggests that humans perceive brief auditory and visual events as simultaneous over a moderately wide range of asynchronies. In particular, the system regards auditoryvisual events falling within 5060 ms of one another as simultaneous (Hirsh & Sherrick 1961; Stein & Meredith 1993; Zampini et al. 2003; Arrighi et al. 2006). That window is flexible, which is evidenced by the fact that the nervous system takes into account the time the sound takes to travel from its source (Kopinska & Harris 2004; Alais & Carlile 2005). The simultaneity window is also adaptable. Systematic training with asynchronous audio-visual stimuli shifts the time delay at which sounds and flashes are perceived to be simultaneous ( Fujisaki et al. 2004; see also Vroomen et al. 2004). Indeed, artificially delayed visual feedback during a tapping task can distort perceived simultaneity to the extent that when the delay is removed, subjects believe that their actions are anticipating their intentions (Stetson et al. 2006). While the extent of the window of simultaneity for vision and audition (and other senses) has been examined extensively, little is known about the nature of the mechanisms responsible for these tasks. Fujisaki & Nishida (2005) examined synchrony/asynchrony discriminations with periodic stimuli, finding that the discrimination is not possible for frequencies higher than 4 Hz (confirmed by Arrighi et al. 2006). They suggested that this limit may reflect a cross-correlation mechanism, computing similarities between auditory and visual streams. Their work further suggested that this cross-correlator does not operate on raw inputs, but correlates salient features extracted by the auditory and visual systems. Considering early sensory processing as a cascade of spatial and temporal filters has led to many useful insights, particularly into vision and audition. Here, we use this approach to investigate the filtering properties of auditoryvisual synchrony mechanisms. We measured interval discrimination thresholds where the intervals were marked by visual, auditory and auditoryvisual stimuli. Duration thresholds usually follow Webers law: the required increment in duration is proportional to the base duration ( Fraisse 1984; Mauk & Buonomano 2004). While Webers law is frequently observed in sensory discrimination, there are in fact many important deviations from that behaviour. For example, luminance discrimination departs from Webers law at low luminances, where the thresholds become independent of luminance (Barlow 1957). More interestingly, many discrimination functions exhibit a dipper function, including the functions for discrimination of contrast ( Nachmias & Kocher 1970; Pelli 1985), blur ( Watt & Morgan 1983; Burr & Morgan 1997) and motion (Simpson & Finsten 1995; Gori et al. 2008). Starting with small base values, increment threshold initially decreases with increasing base value reaching the lowest value in the dipper, and then increases monotonically thereafter. For large base values, threshold rises following Webers law ( Nachmias & Kocher 1970; Nachmias & Sansbury 1974; Legge & Foley 1980; Pelli 1985; Foley 1994). Dipper functions have also been observed in visuo-tactile discriminations, where pedestal effects occur between modalities (Arabzadeh et al. 2008; Burr et al. in press). The generally accepted explanation for the dipper is that it results from a transducer function with an early, threshold-like accelerating nonlinearity (Legge & Foley 1980). Spatio-temporal uncertainty has also been implicated (Pelli 1985), but not strongly supported by the evid (...truncated)


This is a preview of a remote PDF: https://rspb.royalsocietypublishing.org/content/276/1663/1761.full.pdf
Article home page: http://rspb.royalsocietypublishing.org/content/276/1663/1761.abstract

David Burr, Ottavia Silva, Guido Marco Cicchini, Martin S. Banks, Maria Concetta Morrone. Temporal mechanisms of multimodal binding, Proceedings of the Royal Society B: Biological Sciences, 2009, pp. 1761-1769, 276/1663, DOI: 10.1098/rspb.2008.1899