Synesthetic congruency modulates the temporal ventriloquism effect

Share Embed


Descrição do Produto

Neuroscience Letters 442 (2008) 257–261

Contents lists available at ScienceDirect

Neuroscience Letters journal homepage: www.elsevier.com/locate/neulet

Synesthetic congruency modulates the temporal ventriloquism effect Cesare Parise ∗ , Charles Spence Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford OX1 3UD, UK

a r t i c l e

i n f o

Article history: Received 10 February 2008 Received in revised form 20 June 2008 Accepted 4 July 2008 Keywords: Temporal ventriloquism Synesthetic associations Temporal order judgement Multisensory integration

a b s t r a c t People sometimes find it easier to judge the temporal order in which two visual stimuli have been presented if one tone is presented before the first visual stimulus and a second tone is presented after the second visual stimulus. This enhancement of people’s visual temporal sensitivity has been attributed to the temporal ventriloquism of the visual stimuli toward the temporally proximate sounds, resulting in an expansion of the perceived interval between the two visual events. In the present study, we demonstrate that the synesthetic congruency between the auditory and visual stimuli (in particular, between the relative pitch of the sounds and the relative size of the visual stimuli) can modulate the magnitude of this multisensory integration effect: The auditory capture of vision is larger for pairs of auditory and visual stimuli that are synesthetically congruent than for pairs of stimuli that are synesthetically incongruent, as reflected by participants’ increased sensitivity in discriminating the temporal order of the visual stimuli. These results provide the first evidence that multisensory temporal integration can be affected by the synesthetic congruency between the auditory and visual stimuli that happen to be presented. © 2008 Elsevier Ireland Ltd. All rights reserved.

It is now well-documented that people are typically unable to direct their behaviour on the basis of the information provided by a single sensory channel without also potentially being influenced, often without their awareness, by whatever stimuli may be being presented to the other modalities at the same time. A classic example of this phenomena is the spatial ventriloquism effect (see [22], for an exhaustive review), where the source of a sound is mislocalized toward the position of a concurrent, and task-irrelevant, visual stimulus (e.g., [1,4]; see also [8]). More recently, a similar phenomenon has also been demonstrated in the temporal domain, whereby the perceived time of occurrence of a visual stimulus can be biased by the presentation of an irrelevant, and slightly asynchronous auditory stimulus ([23]; though see [9]). This phenomenon has been labelled the temporal ventriloquism effect (see [20]). The claim is that the sensitivity of a participant’s judgments concerning the temporal order in which a pair of visual stimuli were presented is enhanced (i.e., the just noticeable difference, JND, is lower) when two auditory stimuli are presented, one shortly before the first visual stimulus and the other shortly after the second visual stimulus. Researchers have interpreted this phenomenon in terms of the auditory capture of vision, with the second visual stimulus shifted temporally toward the time of occurrence of the second auditory stimulus, thus

∗ Corresponding author. Tel.: +44 1865 271307; fax: +44 1865 310447. E-mail address: [email protected] (C. Parise). 0304-3940/$ – see front matter © 2008 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.neulet.2008.07.010

expanding the perceived temporal gap separating the two visual events [20].1 A PET study performed by Bushara et al. identified the neural basis of audiovisual asynchrony perception with the tecto-thalamoinsular pathways ([7]; see also [3]), and a number of subsequent studies have gone on to examine the spatial and temporal constraints on the temporal ventriloquism effect (see [2,5,11,13,20,29]). For example, Morein-Zamir et al. found that auditory stimuli could shift the perceived time of occurrence of visual stimuli that had been presented 200 ms earlier. More recently, using a somewhat different paradigm, Jaekl & Harris [11] observed the temporal crosscapture of audiovisual stimuli that were separated by 125 ms. Interestingly, however [29], have shown that the relative spatial position from which the auditory and visual stimuli are presented does not seem to modulate the size of this particular multisensory (temporal) effect. To date, however, no one has investigated whether the qualitative aspects of the stimuli that are used might influence the strength of this temporal ventriloquism effect. That is, in all of the studies that have been published thus far, just one type of stimulus was presented in each sensory modality (though see [7]). As a con-

1 The temporal ventriloquism paradigm, as proposed by Morein-Zamir et al. [20], only allows one to measure the auditory capture of vision, but it should be noted that other studies [2,9] have also reported that visual stimuli may give rise to a modest capture of audition as well, thus demonstrating that the perceived time of occurrence of asynchronously presented auditory and visual stimuli are both shifted toward each other.

258

C. Parise, C. Spence / Neuroscience Letters 442 (2008) 257–261

sequence, we still do not know whether the temporal ventriloquism effect would be modulated by other multisensory integration phenomena triggered by the intrinsic (or relative) features of the stimuli presented. In particular, no one has as yet investigated the role of synesthetic links in audiovisual temporal capture. Researchers have shown that stimuli presented in different sensory modalities can share a number of phenomenological attributes (e.g., [16]). These similarities, such as, for example, the one between bright visual stimuli and loud sounds [17,18], often referred to as synesthetic associations (or correspondences), can automatically influence our behaviour in a variety of different settings (see [19], for a recent review). This is the case for the association between auditory pitch and visual (or haptic) size of objects, where high pitched tones are synesthetically linked to small objects while lower-pitched tones tend to be linked to larger objects [10,27,30,31]. In the speeded classification of visual size, for example, the audiovisual association between pitch and size has been shown to result in significantly faster reaction times (RTs) on congruent trials – where small (large) visual stimuli are presented together with high (low) pitched sounds – than on incongruent trials ([10]; see also [15]). Given such results, we thought it possible that synesthetic associations might also modulate the strength of the crossmodal temporal capture effect. In particular, it seemed plausible that auditory stimuli that are synesthetically congruent (with visual stimuli) might give rise to a stronger temporal ventriloquism effect than sounds that are synesthetically incongruent with their respective visual stimuli. In order to test this prediction, we capitalized on the previously documented synesthetic association between auditory pitch and visual size. We systematically manipulated the order of presentation of synesthetically linked audiovisual stimuli using the temporal ventriloquism task: That is, the participants in our study had to judge the temporal order in which two asynchronous visual stimuli (one delivered to either side of a computer monitor) were presented, while ignoring two irrelevant auditory stimuli, one presented slightly ahead and the other slightly behind the two relevant visual stimuli. Given that temporal ventriloquism has been interpreted in terms of the auditory capture of vision in the temporal domain [20], we hypothesized that the strength of any such crossmodal capture might be modulated by the synesthetic congruency between the stimuli in the two modalities, with stronger capture taking place between synesthetically congruent stimuli as compared to synesthetically incongruent stimulus pairs. According to this hypothesis, we expected to find that participants’ temporal order judgments (TOJs) might be more sensitive on congruent trials (where the first sound was synesthetically associated to the first visual stimulus and the second visual stimulus was associated with the second sound) than on incongruent trials (where the first sound was synesthetically associated to the second visual stimulus and first visual stimulus was associated with the second sound), as measured by the relative difference in the JNDs. The participants were instructed to make a “Which came second?” visual TOJ given that previous research suggests that it is only the second auditory stimulus that appears to play a role in the temporal capture of vision ([20], Experiments 2 and 3; cf. [24]). Nine paid volunteers (four male and five female) with a mean age of 24 years (range 18–38 years) took part in this study in return for a £5 (UK Sterling) gift voucher or course credit. This study was conducted in accordance to the declaration of Helsinki, and had ethical approval from the Department of Experimental Psychology at the University of Oxford. The participants sat in front of a 21 CRT screen (75 Hz refresh rate) flanked by a pair of loudspeaker cones and responded to the stimuli by pressing one of two buttons on a computer keyboard. A personal computer running Matlab v.7.2 with Psychtoolbox v.2.54 [6,21] was used to control the presenta-

tion of the stimuli and the recording of a participant’s responses. The visual stimuli consisted of two light grey circles, one subtending 5 cm and the other subtending 2 cm (5.2◦ vs. 2.1◦ of visual angle, respectively), presented 5 cm (5.2◦ ) to the left or right of a central red fixation point against a white background. The auditory stimuli consisted of two sine wave tones (frequency of 300 and 4500 Hz) presented for 5 ms each2 (note that the stimuli used in this study are identical to those used by [10], in their speeded classification study). The stimuli were presented approximately 60 cm from the participant’s head. Each trial began with the presentation of the central fixation point. The first auditory stimulus was presented after a random interval of between 520 and 910 ms. The first visual stimulus was presented to the left or right of the fixation point after a further 150 ms, and remained on the screen until the end of the trial. The second visual stimulus was subsequently presented on the other side of fixation (after the SOA) and also remained visible until the end of the trial. The onset of the second visual stimulus was followed, 150 ms later, by the presentation of the second auditory stimulus. The participants had to indicate whether the second visual stimulus had been presented on the left or right by pressing the left or right arrow key of a computer keyboard while trying to ignore the task-irrelevant auditory stimuli (see Fig. 1a and b). We compared participants’ performance on synesthetically congruent and synesthetically incongruent conditions in which the synesthetic association investigated was that between auditory pitch and visual size. Since previous research (using a speeded classification task) has shown there to be a correspondence between higher-pitched auditory tones (H) and smaller visual images (S) and between lower-pitched tones (L) and larger images (B; see [10]), we presented two types of congruent trial (L–B–S–H and H–S–B–L) and two types of incongruent trial (L–S–B–H and H–B–S–L, see Fig. 1c) in the present study. The SOA between the two visual stimuli in each condition was varied using the method of constant stimuli (see [26]). SOAs of ±117, ±78, ±39, ±26, ±13, and 0 ms were used. Negative values indicate that the smaller of the two visual stimuli was presented second while positive values indicate that the larger of the two visual stimuli was presented second. The interval between the first auditory stimulus and the first visual stimulus, and between the second visual stimulus and the second auditory stimulus was 150 ms. In the 0 ms SOA condition, the two visual stimuli appeared simultaneously while the auditory stimuli preceded and trailed their presentation by 150 ms; in half of the trials the first auditory stimulus was high pitched and the second low pitched and in the other half the order was inverted. The congruent and incongruent trials were presented equiprobably in a random order in each participant’s experimental session, which consisted of 480 trials overall.

2 In order to test whether participants could readily perceive the pitch of the auditory stimuli, we ran a control pitch discrimination study. Twelve participants (4 males, 8 females, mean age 23 years) were asked to rate the pitch of the two 5 ms auditory stimuli with a frequency of 300 and 4500 Hz by drawing a mark on a 10 cm line representing a scale going from “low pitch” (left end) to “high pitch” (right end). The two auditory stimuli were presented in succession with an interstimulus interval of 500 ms. The participants listened to the stimuli three times before rating the pitch. The order of presentation was constant for each participant but was balanced across participants. Each participant made a single rating and the value of zero was assigned to the left end (corresponding to “low pitch”) while the value of ten was assigned to the other end (corresponding to “high pitch”). The average rating by participants for the 300 Hz stimulus was 2.08 (S.E. 0.26) whereas the average score for the 4500 Hz stimulus was 7.49 (S.E. 0.37). A two-tailed repeated measures T-test revealed that participants consistently rated the 300 Hz stimulus as being lower pitched than the 4500 Hz stimulus (two tailed T-stat = −10.372, d.f. = 11, p < .001). These results therefore demonstrate that participants could discriminate between the frequency of the two tones used in the present study.

C. Parise, C. Spence / Neuroscience Letters 442 (2008) 257–261

259

Fig. 2. Fitted curves of the cumulative data from the nine participants, plotted as a proportion of ‘larger visual stimulus second’ responses as a function of SOA. The dashed line represents the data from the synesthetically congruent condition, and the dotted light grey line represents the data from the synesthetically incongruent condition. The squares and the circles represent the proportion of ‘larger visual stimulus second’ respectively in the congruent and incongruent condition, error bars represent the standard error. The top-left and the bottom right boxes represent the mean PSS and JND data respectively and their relative standard errors. The line with the asterisk in the JND box indicates significant statistical difference between the synesthetically congruent and synesthetically incongruent condition (p < .05).

Fig. 1. (a) Schematic illustration of the events presented in each trial (A1 = first auditory stimulus; V1 = first visual stimulus; V2 = second visual stimulus; A2 = second auditory stimulus); (b) representation of the temporal succession of events in each trial; (c) symbolic representation of the stimuli presented on each condition (H = high pitched tone; L = low pitched tone; S = small visual stimulus; B = large visual stimulus) and their relative synesthetic links (dashed arrows).

Psychometric functions were calculated for each participant for each condition by fitting a cumulative Gaussian function to the percentage of “larger circle second” responses for the SOAs different from 0 ms.3 The JNDs were then calculated from each psychometric function by subtracting the value of asynchrony where 75% of “larger second” responses were made from the asynchrony where participants made 25% “larger second” responses, and then dividing the result by two. The point of subjective simultaneity (PSS), which indicates the asynchrony at which the participants were maximally uncertain concerning the temporal order of the stimuli, was also calculated from each function as the asynchrony where 50% “larger second” responses were made. A two-tailed paired-samples T-test conducted on the JND data revealed that the JND was significantly smaller for the congruent trials (M = 21 ms) than for the incongruent trials (25 ms; T-stat = −2.367, d.f. = 8, p = .045). No such difference was observed when a similar comparison was conducted on the PSS data (−13 vs. −14 ms, respectively; two tailed T-stat = 0.881, d.f. = 8, p = .404). Therefore, as predicted, the synesthetic congruency between the auditory and the visual stimuli modulated the sensitivity of participants’ TOJ responses while having no effect on the subjective perception of synchrony (see Fig. 2). An analysis of the overall (congruent and incongruent) PSS value (−17 ms) conducted with a single sample T-test shows that it differed significantly from zero, indicating a bias toward “larger second” responses (T-stat = −2.983, d.f. = 8, p = .018; see Appendix A for further analysis). The critical result to emerge from the present study was the significant effect of synesthetic congruency on participants’ sensitivity (as measured by the change in their JNDs). The sensitivity of

3 Given that congruency is defined in term of the relative order of presentation of the visual stimuli, the 0 ms SOA trials, in which the two visual stimuli were presented simultaneously, cannot be coded as either congruent or incongruent, and hence were excluded from the data analysis.

260

C. Parise, C. Spence / Neuroscience Letters 442 (2008) 257–261

participants’ ability to resolve the temporal order in which the two visual stimuli had been presented was significantly better on the synesthetically congruent trials than on the synesthetically incongruent trials. This result therefore provides the first empirical evidence that the crossmodal temporal capture of vision by audition can be modulated by the crossmodal congruency between the stimuli, with more pronounced temporal ventriloquism taking place when the auditory and visual stimuli were synesthetically congruent than when they were synesthetically incongruent (as reflected by the enhanced auditory capture of vision leading to more sensitive visual TOJs by the participants in the present study). Presumably the auditory stimuli on the synesthetically congruent trials exerted a stronger attraction on the temporally adjacent visual stimuli, and hence resulted in a larger temporal shift of the visual stimuli toward the time of onset of the associated auditory stimuli. On a perceptual level, this result implies an illusory expansion of the delay between the two visual stimuli, thus making it easier for participants to reliably judge their correct temporal order of occurrence, and therefore resulting in more sensitive performance on the TOJ task (as highlighted by the lower JNDs in the synesthetically congruent trials than in the synesthetically incongruent trials). The present findings provide the first empirical evidence that the strength of the attraction between (even relatively simple) auditory and visual stimuli in the temporal domain is not fixed: That is, the qualitative features of the stimuli that are presented within each sensory modality (such as the pitch and size of the stimuli in the present study) and the relation between the stimuli presented in each modality (i.e., the synesthetic associations between pitch and size), can modulate the auditory capture of vision, as measured by the difference in the JND between the synesthetically congruent and incongruent conditions reported in our study. The fact that the synesthetic congruency between the auditory and visual stimuli affected the JND but not the PSS rules out any interpretation of the current results in term of criterion change or response bias (see [26]), suggesting instead that the modulation of the precision of participants’ TOJs observed in this study reflects a genuine perceptual effect (cf. [10,15]). Moreover, our results also provide additional evidence for the claim that synesthetic associations exist between the size of visual stimuli and the pitch of auditory stimuli. Importantly, however, the synesthetic congruency between the auditory and visual stimuli was completely irrelevant to the participant’s task in the present study. The different speed at which sound and light propagate through air, as well as the different neural processing latencies associated with visual and auditory pathways [14,25], introduce asynchronies in the time of arrival of visual and auditory information originating from a common event. The ability of our perceptual systems to compensate for such asynchronies is fundamental for many kinds of multisensory integration and is a necessary condition for the construction of a coherent representation of the external world (see [25]). In the audio-visual domain, the outcome of such compensation is the temporal auditory capture of vision, a phenomenon that has been shown to depend on both temporal constraints [20] and, under certain conditions, the “unity assumption” [28]. The results of the present study go beyond previous research by showing that some interactions taking place between the individual features of audiovisual stimuli, namely synesthetic associations, also seem to modulate the temporal ventriloquism effect. Kanai et al. have recently shown that the crossmodal binding of visual and auditory stimuli is a key factor in the reduction of the perceived asynchrony between auditory and visual signals [12]. In light of this claim, we believe that our results are consistent with the suggestion that more pronounced crossmodal binding takes place between synestheti-

cally congruent audiovisual stimuli as compared to synesthetically incongruent stimuli. It will be interesting in future research to investigate the role of synesthetic associations in modulating the spatial ventriloquist effect [22]. It would seem reasonable to expect that synesthetically congruent stimuli would also exert a stronger attraction in the spatial domain (thus resulting in a larger capture of audition by vision; see also [1]), and not only in the temporal domain as tested here. In fact, it is currently an open question as to just how many other phenomena in the field of multisensory perception research might also be modulated by the degree of synesthetic correspondence between the various unimodal stimuli that the experimenters may have happened to incorporate in their study. Appendix A In order to investigate whether the order of presentation of the visual stimuli had any significant effect on TOJ performance, we conducted ANOVAs on both the JND and PSS data that included the additional factors of order of presentation of the visual stimuli as well as congruency. The relevant data and the SOA were recoded in terms of “left-second/right-second” (remember that in our original analyses the data were coded as “large-second/smallsecond”). A two-way repeated measures ANOVA carried out on the JND data once again revealed a significant main effect of both congruency (F(1,8) = 6.089, p = .039, Á2 = .432) and order of presentation (F(1,8) = 6.153, p = .038, Á2 = .435). Interestingly, the interaction term did not reach statistical significance (F(1,8) < 1, ns, Á2 < .001), indicating that the effect of congruency on the JND data was not affected by the order of presentation of the stimuli. A similar analysis of PSS data revealed no significant main effects (congruency: F(1,8) = 1.537, p = .245, Á2 = .164; order: F(1,8) = 1.402, p = .270, Á2 = .149), nor any interaction between these two factors (F(1,8) = 3.431, p = .101, Á2 = .300). These results are in line with those obtained in the T-test, showing a significant effect of synesthetic congruency only on the JND data with no effect on the PSS data. An additional result to emerge from this analysis was the significant effect of the order of presentation of the visual stimuli on the JND. In particular, the JNDs were somewhat lower when the first visual stimulus was the smaller of the two stimuli than when it was the larger of the two (18 vs. 38 ms, respectively). The absence of any interaction between congruency and order in the JND data excludes the possibility that the JND congruency effect observed with the Ttest might have depended upon the order of presentation of the stimuli. References [1] D. Alais, D. Burr, The ventriloquist effect results from near-optimal bimodal integration, Current Biology 14 (2004) 257–262. [2] G. Aschersleben, P. Bertelson, Temporal ventriloquism: crossmodal interaction on the time dimension. 2. Evidence from sensorimotor synchronization, International Journal of Psychophysiology 50 (2003) 157–163. [3] D. Bergmann, C. Spence, H.-J. Heinze, T. Noesselt, Is there a timeline in the brain? The spatial coding of audiovisual timing information in the human brain, in: Poster Presented at the 8th Annual Meeting of the International Multisensory Research Forum, Sydney, Australia, July 4–7, 2007. [4] P. Bertelson, G. Aschersleben, Automatic visual bias of perceived auditory location, Psychonomic Bulletin & Review 5 (1998) 482–489. [5] P. Bertelson, G. Aschersleben, Temporal ventriloquism: crossmodal interaction on the time dimension. 1. Evidence from auditory-visual temporal order judgment, International Journal of Psychophysiology 50 (2003) 147–155. [6] D.H. Brainard, The psychophysics toolbox, Spatial Vision 10 (1997) 433–436. [7] K. Bushara, J. Grafman, M. Hallet, Neural correlates of auditory-visual onset asynchrony detection, Journal of Neuroscience 21 (2001) 300–304. [8] A. Caclin, S. Soto-Faraco, A. Kingstone, C. Spence, Tactile ‘capture’ of audition, Perception & Psychophysics 64 (2002) 616–630. [9] R. Fendrich, P.M. Corballis, The temporal cross-capture of audition and vision, Perception & Psychophysics 63 (2001) 719–725.

C. Parise, C. Spence / Neuroscience Letters 442 (2008) 257–261 [10] A. Gallace, C. Spence, Multisensory synesthetic interactions in the speeded classification of visual size, Perception & Psychophysics 68 (2006) 1191–1203. [11] P.M. Jaekl, L.R. Harris, Auditory-visual temporal integration measured by shifts in perceived temporal location, Neuroscience Letters 417 (2007) 219–224. [12] R. Kanai, B.R. Sheth, F.A.J. Verstraten, S. Shimojo, Dynamic perceptual changes in audiovisual simultaneity, PloS One (2007) e1253. [13] M. Keetels, J. Stekelenburg, J. Vroomen, Auditory grouping occurs prior to intersensory pairing: evidence from temporal ventriloquism, Experimental Brain Research 180 (2007) 449–456. [14] A.J. King, A.R. Palmer, Integration of visual and auditory information in bimodal neurones in the guinea-pig superior colliculus, Experimental Brain Research 60 (1985) 492–500. [15] J. Long, Contextual assimilation and its effect on the division of attention between nonverbal signals, Quarterly Journal of Experimental Psychology 29 (1977) 397–414. [16] L. Marks, The Unity of the Senses: Interrelations among the Modalities, Academic Press, New York, 1978. [17] L.E. Marks, On cross-modal similarity: auditory-visual interaction in speeded discrimination, Journal of Experimental Psychology: Human Perception and Performance 13 (1987) 384–394. [18] L.E. Marks, On cross-modal similarity: the perceptual structure of pitch, loudness, and brightness, Journal of Experimental Psychology: Human Perception and Performance 15 (1989) 586–602. [19] L.E. Marks, Cross-modal interactions in speeded classification, in: G.A. Calvert, C. Spence, B.E. Stein (Eds.), Handbook of Multisensory Processes, MIT Press, Cambridge, MA, 2004, pp. 85–105.

261

[20] S. Morein-Zamir, S. Soto-Faraco, A. Kingstone, Auditory capture of vision: examining temporal ventriloquism, Cognitive Brain Research 17 (2003) 154–163. [21] D.G. Pelli, The VideoToolbox software for visual psychophysics: transforming numbers into movies, Spatial Vision 10 (1997) 437–442. [22] M. Radeau, Auditory-visual spatial interaction and modularity, Current Psychology of Cognition 13 (1994) 3–51. [23] C.R. Scheier, R. Nijhawan, S. Shimojo, Sound alters visual temporal resolution, Investigative Opthalmology and Visual Science 40 (1999) S792. [24] D.I. Shore, C. Spence, R.M. Klein, Visual prior entry, Psychological Science 12 (2001) 205–212. [25] C. Spence, S.B. Squire, Multisensory integration: maintaining the perception of synchrony, Current Biology 13 (2003) R519–R521. [26] C. Spence, D.I. Shore, R.M. Klein, Multisensory prior entry, Journal of Experimental Psychology: General 130 (2001) 799–832. [27] K. Stumpf, Tonpsychologie I [Psychology of the tone], Hirzel, Leipzig, 1883. [28] A. Vatakis, C. Spence, Crossmodal binding: evaluating the “unity assumption” using audiovisual speech stimuli, Perception & Psychophysics 69 (2007) 744–756. [29] J. Vroomen, M. Keetels, The spatial constraint in intersensory pairing: no role in temporal ventriloquism, Journal of Experimental Psychology: Human Perception & Performance 32 (2006) 1063–1071. [30] P. Walker, S. Smith, Stroop interference based on the synaesthetic qualities of auditory pitch, Perception 13 (1984) 75–81. [31] P. Walker, S. Smith, Stroop interference based on the multimodal correlates of haptic size and auditory pitch, Perception 14 (1985) 729–736.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.