Cortical speech processing unplugged: a timely subcortico-cortical framework

July 9, 2017 | Autor: Michael Schwartze | Categoria: Auditory Perception, Speech Processing, Speech, Brain Mapping, Humans, Cerebral Cortex
Share Embed


Descrição do Produto

Opinion

Cortical speech processing unplugged: a timely subcortico-cortical framework Sonja A. Kotz and Michael Schwartze Max Planck Institute for Human Cognitive and Brain Sciences, IRG ‘‘Neurocognition of Rhythm in Communication’’, Stephanstrasse 1a, 04103 Leipzig, Germany

Speech is inherently tied to time. This fundamental quality has long been deemed secondary, and has consequently not received appropriate recognition in speech processing models. We develop an integrative speech processing framework by synthesizing evolutionary, anatomical and neurofunctional concepts of auditory, temporal and speech processing. These processes converge in a network that extends cortical speech processing systems with cortical and subcortical systems associated with motor control. This subcorticocortical multifunctional network is based on temporal processing and predictive coding of events to optimize interactions between the organism and the environment. The framework we outline provides a novel perspective on speech processing and has implications for future studies on learning, proficient use, and developmental and acquired disorders of speech production and perception. The temporal nature of speech Speech essentially conveys patterns of energy distributed over time. However, the temporal nature of speech more or less vanished from linguistics in the wake of structuralist and generative theories of language. This separation renders language a phenomenon independent of temporal and contextual variation. However, neurofunctional data do not consistently support such separation. In the following, we argue that the temporal nature of speech needs to be reappraised to develop a naturalistic model of brain– language function [1]. The speech signal constitutes a rich source of information that is mirrored by sensitive mechanisms for temporal and spectral integration in hearing [2]. The auditory periphery ensures that central processing systems have access to a detailed representation of the acoustic signal. To achieve the main purpose of speech perception (the inference of meaning) and in view of contextual, physiological and temporal variability, it is plausible that speech perception makes immediate and opportunistic use of all information sources available: from sound characteristics to syntax and pragmatics. This perspective implies that nonlinguistic and linguistic processes interact to facilitate this objective. For example, temporal processing mechanisms (i.e. mechanisms underlying the explicit encoding, decoding and evaluation of

Corresponding author: Kotz, S.A. ([email protected]).

392

temporal information) need to be involved in the interpretation of the temporal structure of speech. Timing is fundamental to efficient behavior and originates in evolutionarily primitive brain structures such as the cerebellum (CE) and the basal ganglia (BG) [3]. During speech acquisition their capacity can be used to establish basic routines that advance more sophisticated behavior. Once these routines are acquired, the BG contribution can be reduced to a supplementary and corrective function, whereas the CE remains actively engaged in the computation of sensory information [4]. In this article we argue that temporal structure is used to support well-studied fronto-temporal speech processing networks that therefore need to be extended by temporal processing systems. We propose that a subcortico-cortical network that includes the CE and BG is engaged in constant attempts to detect temporal regularities in sensory input and to predict the future course of events to optimize cognitive and behavioral performance, including speech processing. Speech constitutes events in time Unlike spatial information, acoustic information is entirely dependent on time. Time and temporal structure are coupled to change. In speech, changes generate events – such as vowels, stressed or unstressed syllables, and Glossary Oscillation: An oscillation describes the unfolding of repeating events in terms of frequency, i.e. the number of events repeated in a specific amount of time. Serial order: Succession of events in the temporal dimension. Serial order precludes simultaneity and is thus not identical to temporal structure. Speech event: Speech events manifest as a set of linguistic and paralinguistic categories (e.g. phoneme, syllable, word, voice, stress or phrase) with partly overlapping borders that can be combined or decomposed into other speech events. Speech processing: Analogous to temporal processing, the term speech processing comprises all mechanisms involved in the perception and production of verbal expressions. Synchronization: Temporal alignment of two or more oscillations. Synchronization is achieved when at least one oscillation adjusts its phase and/or period to match that of another. Temporal processing: The neurocognitive mechanisms that underlie the encoding, decoding and evaluation of temporal structure in perception and production. Temporal processing refers exclusively to duration and temporal relations but not to the formal aspect of information. Temporal structure: Arrangement of events in the temporal dimension. An event in the temporal dimension originates from a contrast between static and dynamic changes that result in a subdivision of time. Temporal structure can be characterized in terms of categories such as duration, order, tempo or regularity of events. Duration describes an implicit property of the signal, whereas the latter categories describe temporal relations and can be considered explicit temporal information.

1364-6613/$ – see front matter ß 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2010.06.005 Trends in Cognitive Sciences 14 (2010) 392–399

(Figure_1)TD$IG][ Opinion

Trends in Cognitive Sciences

Vol.14 No.9

Figure 1. Visualizations of the utterance ‘‘Time must flow for sound to exist’’ [5] in the form of a waveform (top) and a spectrogram (bottom) created using the software PRAAT (developed by Paul Boersma and David Weenink of the Institute of Phonetics Sciences of the University of Amsterdam, http://www.praat.org/). This utterance can be decomposed into smaller speech events and categories such as syllables, phonemes or vowels. However, as both visualizations illustrate, neither events nor categories are discrete due to coarticulation, i.e. the anterograde or retrograde effect of continuously moving articulators on the acoustic signal.

phrases – that evolve in time. Depending on the level of analysis, speech events comprise different categories and hierarchies (Figure 1). Speech events can be combined or decomposed into shorter and longer events (e.g. the onset of a vowel, consonants within a syllable, or words in a phrase). However, the arrangement of these events is not incidental. Concatenated events follow an order that converges into specific speech patterns. The order of events in a pattern can be strictly sequential, with one event determining an immediately following event, and/or hierarchical, with one event determining another in the presence of intervening events. However, patterns in speech also imply serial order. Lashley [6] remarked that serial order in behavior relies on the transition of spatially distributed memory representations into temporal sequence. This operation can be attributed to syntax. In the broadest sense, syntax ‘‘denotes the organization of any combinatorial system in the mind’’ ([7] p. 276). Syntax can thus be defined as a ‘‘set of principles governing the combination of discrete structural elements into sequences’’ [8].

Another important ordering principle is recurrence. Similar to serial order, recurrence allows the generation of predictions about when specific events are likely to occur; thereby temporal regularity can facilitate speech and cognitive processing. In its simplest form, recurrence is periodic and can be depicted as an oscillation whose period reflects the temporal relation of successive events. Notably, this form of oscillation can complement gamma and theta band oscillations that constitute an important component of speech processing [9,10]. However, recognition of relational information, such as temporal regularity, necessitates an explicit internal representation of temporal structure generated by the CE and BG temporal processing systems. We propose that speech processing exploits both: temporal structure even if it is not regular, but also temporal regularity to optimize comprehension. This requires an early and fast interaction of auditory and temporal processing in a neurofunctional network that comprises overlapping neural correlates of auditory and temporal processing (Box 1). This approach also provides a more general perspective on the interaction between an 393

Opinion

Trends in Cognitive Sciences Vol.14 No.9

Box 1. Early auditory input to the temporal processing network A recent activation-likelihood-estimation meta-analysis [4] showed that several regions of the CE respond to auditory stimulation. Earlier tracer studies and unit recordings in animals [11–13] identified a neural pathway for rapid auditory transmission [14,15] between the CE and the cochlear nuclei where the fibers of the auditory nerve terminate. This pathway can transmit auditory information to the cerebellar temporal processing system. In turn, the nonmotor part of the cerebellar dentate [16], one of the primary output nuclei to the thalamus, projects to the frontal cortex [17] (preSMA in monkeys) that then connects to the BG. Huang and Liu [18] assume that the CE serves as an interface between the auditory and motor systems, possibly initiating tracking behavior. Although the cerebellar neurons seem unfit to process detailed frequency, intensity or duration information, they display special sensitivity to temporal and intensity differences [19], functions that are both important to signal when an event occurs and to track temporal structure. Click-evoked responses in the frontal monkey cortex after complete removal or destruction of the temporal lobe, the CE and the medial thalamus raise the question whether there are direct projections to the

ever changing external environment and an equally dynamic cognitive environment. A subcortico-cortical framework of speech perception and production If auditory processing is indeed coupled with temporal processing, then auditory and temporal processing systems need to interface to create a representation of auditory temporal structure, the backbone of speech. There is some consensus that classical motor systems are involved in temporal processing. Hence, in speech perception we distinguish two parallel auditory pathways: (i) the preattentive encoding of event-based temporal structure in the CE, which forwards temporal information to the frontal cortex via the thalamus and (ii) the retrieval of memory representations in temporal cortex, which are projected to frontal cortex. The presupplementary motor area (preSMA) binds temporal structure [30] and receives information from the CE [17]. It further transmits information to the dorsolateral prefrontal cortex (DLPFC), which then integrates memory representations and temporal information to optimize comprehension. Additionally, the attention-dependent BG temporal processing system and its thalamocortical connections continuously evaluate temporal relations and support the extraction of temporal regularity. It also engages in reanalysis and resequencing of information when the temporal structure of a stimulus is unfamiliar or incongruent (Figure 2A). In the planning of speech production, the preSMA and the BG in concert with the CE serve as a pacemaker that provides basic temporal structure. An interplay of SMAproper, premotor and primary motor cortex then utilizes this temporal structure to guide articulation (Figure 2B). This structural dissociation of the SMA is in line with evidence for preSMA involvement in word selection, encoding of word form and the control of syllable sequencing, whereas the SMAproper supports overt articulation [31]. The preSMA connects to the rostral striatum and to the superior/inferior frontal gyrus, whereas the SMAproper is connected to the caudal putamen, the precentral gyrus and the corticospinal tract [32–34]. Similar circuitry forms the 394

frontal cortex from the thalamus [20]. The suprageniculate nucleus (SG) is one possible candidate for such transmission. The SG is responsive to auditory stimulation [21] and displays fine temporal tuning characteristics [22]. After injections into the prefrontal cortex, Kobler et al. [23] found labeled cells in the SG of bats, whereas Kurokawa et al. [24] found labeled terminals in the frontal cortex and auditory areas of the temporal cortex after injections into the SG. Reciprocal connections were identified with injections into both the SG and the fastigial nucleus of the CE [25] as well as the superior colliculus [26]. Connections from the SG to the frontal cortex consist of separate neuronal groups of different sizes and shapes [27] with Fr2, the target location in the frontal cortex, corresponding to monkey SMA. Thus, the SG could constitute a relay between the frontal and the cerebellar cortex that is connected to several cortical auditory fields [28]. These are connected to the pontine nuclei, indicating cerebellar involvement in a circular architecture. The CE projects via dentato-thalamo-cortical pathways to areas from which it receives input via the cortico-ponto-cerebellar pathways [29]. These pathways form a link between cerebellar, temporal and fronto-striatal circuitry in which the thalamus plays a pivotal role in earlier and later processing stages.

basis of temporal processing in the BG that depends on ensembles of cortical oscillations conveying a signature of temporal structure to the BG [3]. The CE in turn is functionally connected to the dorsolateral, medial and anterior prefrontal cortex [35], and via the thalamus to the SMA. Evidently, subregions of these classical motor areas (e.g. the nonmotor part of the dentate [16], the rostral striatum or the preSMA) primarily engage in perception, whereas other subregions (e.g. the motor part of the dentate, the caudal putamen and the SMAproper) are involved in production. Crucially, the thalamus mediates information flow in this framework (Box 2). The most basic function of the motor system (to modify body posture to produce proactive and reactive movements) improves with precise timing. Moreover, the mutual influence of motor and cognitive processes such as temporal processing could represent one of the driving forces in the development of sophisticated motor and cognitive skills such as speech processing. On the origin and development of speech processing capacities Ultimately, functional differentiation goes hand in hand with structural change. Converging morphological evidence supports the view that the motor system has reconfigured to meet the challenges posed by developing communicative and cognitive skills [41]. Simultaneous enlargement of the lateral CE and the frontal cortex as well as the formation of a cerebello-cortical loop reflect developing speech processing capacity [42]. The lateral CE engages in cognitive tasks and its increased size in hominids is most likely to be accompanied by a similar increase in the brain’s information processing capacity [43,44]. Contralateral connections from the CE to the cortex and from the cortex to the CE substantiate speculations about lateralization at the subcortical and cortical level as evidenced by functional temporal asymmetry in both hemispheres (Box 3). These reciprocal connections establish a cerebello-thalamo-cortical circuit that is comparable to cortico-striatothalamo-cortical circuits. Together they provide a powerful

(Figure_2)TD$IG][ Opinion

Trends in Cognitive Sciences

Vol.14 No.9

Figure 2. (a) A framework for speech perception. In speech perception, auditory information is transmitted to the auditory cortex via the thalamus (a, blue) and to the CE (b), where the temporal relationship between successive events is encoded (1) and transmitted to the frontal cortex (red). The seminal AST model on speech perception [50] accounts for differences in temporal sensitivity (L = left; R = right; orange letters reflect short temporal windows of integration; blue letters reflect longer windows of integration). Auditory information is mapped onto memory representations (3a) that are transmitted to the frontal cortex (c) to be integrated with temporal event structure (4) that is conveyed via a cerebello-thalamo-preSMA (3b) pathway (red). Temporal information is transmitted to the BG (5) via connections from the preSMA and from the frontal cortex (d). The BG evaluate temporal relations and transmit this information back to the cortex (6) via the thalamus. The CE and the thalamus also provide direct input to the BG (f), thereby possibly modulating BG processing. The descending auditory pathway (g) could modulate processing in the whole network. (b) A framework for speech production. Memory representations are transmitted from the temporal (1) to the frontal cortex (a, blue), where they are mapped onto temporal event structure (b) generated by the preSMA (2) in concert with the CE and the BG (4). Furthermore, the CE (5) is involved in the temporal shaping of syllables (d). The integrated information is then used in motor control of articulation (e, green) interfacing the SMAproper (6), premotor and primary motor cortex. The CE and the thalamus also provide direct input to the BG (f), thereby possibly modulating BG processing.

computational basis because information in these circuits can be processed rapidly and repeatedly. Moreover, progression from simple to increasingly more complex behavior necessitates additional sequencing and patterning capacity, a quality that has been attributed to the BG [55]. Although each of these brain structures has probably contributed to the evolution of speech in isolation, the combination and resulting functional differentiation of subcortical structures provides a novel perspective for speech processing. Most importantly, this differentiation extends findings on the ontogenetic and phylogenetic development and maturation of white matter fiber tracts responsible for information flow between gray matter cortical areas [56]. A prominent example is the left-hemisphere accentuated arcuate fasciculus that connects

Wernicke’s and Broca’s region. This white matter fiber bundle is only fully developed in humans; in chimpanzees, it is nonexistent [57]. The same can apply to macaques [57] or is at most rudimentary and less specified [58]. The function of these white matter tracts is to convey memory representations to the DLPFC where this formal information can be integrated with explicit temporal structure to either comprehend or to produce speech sequences. In speech perception, temporal structure complements formal predictions about upcoming events in a sequence. In speech production, the preSMA/BG pacemaker generates a grid for the temporal alignment of memory representations. In a similar vein, MacNeilage and Davis [59] propose that speech evolved from a simple biomechanical mechanism (i.e. biphasic opening and closing mouth movements). 395

Opinion

Trends in Cognitive Sciences Vol.14 No.9

Box 2. Two thalamic firing modes Sherman and Guillery [36] emphasize that understanding of cortical function depends on knowledge about the nature of thalamic input to the cortex. As auditory information passes through the thalamus the question arises as to how it treats this information. Thalamic cells respond to input in either ‘tonic’ or ‘burst’ mode [36]. Tonic mode preserves input linearity, whereas burst mode is more efficient in detecting input. Consequently, thalamic cells in burst mode send a ‘wake-up call’ to the cortex that is evoked by sudden novel or unexpected input. Importantly, bursts follow the temporal properties of stimulation and enhance sensory event detection [37]. For example, in the visual domain, bursts occur approximately at phase zero of the oscillation underlying a periodic stimulation [36] (Figure I). Furthermore, they can signal salient input because bursts affect the postsynaptic neuron more strongly than single spikes [38]. We speculate that burst-firing marks input events characterized by salient changes at the energy level (e.g. onsets, offsets or more intense parts of an acoustic signal). For instance, in speech, these events might correspond to pauses, vowels or stress correlates. In analogy to visual processing, we consider that bursts preferentially occur at vowel onsets. Hence, thalamic bursts could transmit the temporal relation between events for subsequent cortical processing

[(Figure_I)TD$G]

and also amplify the neural representation of these events. Computational simulations of a bursting neuron [39] show that the biophysical mechanisms of spike generation enable individual neurons to encode different stimulus features into distinct spike patterns. However, burst timing is more precise than the timing of single spikes. Accordingly, the driving input from the cerebellar, event-based temporal processing system to the thalamus could be encoded via burst firing to forward precise temporal markers of events. In parallel, a more linear and continuous stimulus representation, delivered via the auditory pathway, is primarily encoded via tonic firing to preserve detailed spectro-temporal structure. Burst firing is characterized by inter-spike intervals of approximately 100 ms, whereas tonic firing features intervals of around 10– 30 ms. Po¨ppel [40] hypothesized that temporal processing proceeds in an oscillatory fashion in which sensory input registered within 30 ms is treated as co-temporal. One can speculate that at a sampling rate of approximately 30–40 Hz, perception of temporal order is constrained by thalamic ‘packaging’ of information in tonic mode and cortical sensitivity to these packages, (e.g. the phonemes of a syllable). However, a better understanding of thalamic function is clearly necessary to model speech and temporal processing.

Figure I. Responses of lateral geniculate nucleus relay cells of cats during sinusoidal grating in either tonic (a) or burst (b) mode. Adapted from Sherman and Guillery [36]. Tonic firing preserves linearity of the input, whereas burst firing selectively encodes parts of the stimulation that correspond to changes in the energy level of a stimulus.

At a conceptual level, the biphasic movements of the mouth correspond to a blank ‘syllabic frame’ [60]. Consonant and vowel sounds (e.g. /ba/) constitute content elements that are inserted into the syllabic frame during articulation. Furthermore, the temporal structure of concatenated syllables (e.g. /bababa/) rests on input from the SMA. At this stage, sequencing capacity and precise temporal processing should be coupled to establish the serial order of frames and content elements. Temporal structure and temporal alignment in speech processing Which speech events convey suitable temporal structure? One way to classify the temporal structure of speech is to distinguish envelope, periodicity and fine structure levels [61]. In speech production these levels converge into a unitary acoustic signal, whereas in speech perception two aspects are fundamental: (i) the envelope conveys information about duration, rhythm, tempo and stress and (ii) speech can be understood when all other cues besides the slowly varying temporal envelope are degraded [62]. This implies that important speech events are captured by these categories. However, the question remains 396

which part of the signal is used to assess them. Greenberg [63] developed a syllable-centric framework that describes the syllable as an ‘energy arc’ (spectro-temporal profile). Typically, the syllabic nucleus protrudes as it correlates with maximum ‘oral aperture’ in the articulatory gesture. The prominence of the vocalic nucleus is accompanied by a steep rise and a subsequent peak in acoustic energy because it is typically more intense than a consonant. In addition to duration, a relative increase in intensity also distinguishes rhythmically prominent from nonprominent syllables [64]. Greenberg proposes that the nucleus sets the ‘spectro-temporal register’ on which the rest of the syllable is mapped. This notion seems comparable to MacNeilage’s term ‘‘the general-purpose carrier, which we know today as the syllable’’ ([60] p. 105). Importantly, there is a related concept in speech perception. Following the notion of perceptual centres or ‘pcentres’ [65], Port [66] describes the vocalic nucleus as the carrier of a perceptual beat that renders this event particularly salient. Port refers to Dynamic Attending Theory (DAT) [67] by linking the saliency of the vowel to the pulse of a ‘neurocognitive attentional oscillation’. DAT proposes that the allocation of attention depends

Opinion Box 3. Lateralized sensitivity to temporal structure It is well known that the left cortical hemisphere specializes in fine temporal discrimination, whereas the right hemisphere holds information over longer periods of time [45]. In a similar vein, the language-dominant hemisphere differentiates fine temporal input, whereas the complementary hemisphere integrates information across longer time spans [46]. More specifically, at the tonal level, the core bilateral auditory cortices are sensitive to temporal and spectral variation. Spectral variation is weighted towards the right hemisphere; the left hemisphere specializes in rapid temporal processing [47]. In speech (i.e. syllables, words), both auditory cortices engage in a general temporal processing mechanism. Rapid variations in temporal sound structure are preferably processed in the left hemisphere [48], whereas the contour of the speech envelope concurs with right hemisphere areas [49]. In speech perception, ‘Asymmetric Sampling in Time’ (AST) [50] ascribes a short window of integration (20–50 ms) to the left hemisphere, and a long window (150–250 ms) to the right hemisphere. For example, spontaneous power fluctuations of intrinsic oscillations in righthemisphere regions of Heschl’s gyrus correspond to the dominant syllabic rate, between 3 and 6 Hz, and to rates between 28 and 40 Hz in the left hemisphere [10]. Contributions of preceding stages of auditory processing in addition to those at the cortical level need to be considered. A similar temporal dissociation is proposed for the CE. The right CE responds more strongly to high frequency information and speech, whereas the left CE is more sensitive to low frequency information and singing [51]. Lateralization also impacts on auditory information processing at all stages of the auditory pathway including the thalamus and the brain stem [52]. The bilateral auditory cortices use implicit discharge rate codes and explicit temporal codes to represent the temporal structure of auditory signals [53]. Discharge rate codes integrate stimulus features in discrete 30 ms windows that could reflect cortical sensitivity to thalamic ‘packaging’ in tonic mode. Animal evidence [54] indicates a slowdown of temporal response rates along the ascending auditory pathway from the thalamus (10 ms) to the auditory cortex (30 ms). Thus, co-temporal information [40] could correspond to input sampled in a short window of integration within the left hemisphere.

on synchronization between self-sustained, adaptive internal oscillations and external temporal structure. This synchronization results in stimulus-driven allocation of attention. With respect to speech perception, this implies that we can attempt to synchronize an internal attention oscillation with the temporal structure of external speech events such as the rhythmic succession of vocalic nuclei. Thus, the rise in acoustic energy and the resulting energy maximum in sound structure, and the perceived prominence of the p-centre can constitute complementary phenomena. This event can then be used to: guide attention to points in time when important information appears, to set a spectro-temporal register [63] and to open a frame [59] for the subsequent integration of content elements from memory. Information about temporal structure is then used to align memory representations (temporal cortex) with the point in time that an event is maximally salient to ensure temporal coherence and to optimize speech processing. If temporal structure conveys periodicity and allows the extraction of a regular pattern, then one can conceive of such an alignment as an incidence of synchronization between an external stimulus-inherent oscillation and an internal stimulus-driven oscillation. In line with DAT, information about successive events can provide attractors for an attention oscillation that are used to

Trends in Cognitive Sciences

Vol.14 No.9

synchronize attentional resources, and to generate frames for content integration. In other words, speech perception can involve integration of spectro-temporal content or formal information (‘what’) and temporal information delivered via the event-based temporal processing system (‘when’). The formal aspect involves mapping from sound to meaning in the temporal cortex and the transmission of memory content via white matter fiber tracts [68]. This goes hand in hand with the functional differentiation of a ventral and a dorsal stream in speech processing. The dorsal stream maps acoustic or phonological representations to articulatory or motor representations whereas the ventral stream maps sensory or phonological representations onto lexical conceptual representations [50,69]. Formal information transfer via ventral and dorsal pathways is further subdivided into ‘what’, ‘how’ and ‘where’ streams [70]. Furthermore, the cortico-striato-thalamo-cortical attention-dependent temporal processing system adds higherorder processing routines, such as interval estimation and comparison, as well as the extraction of temporal regularity. This information can then be used to generate predictions concerning the temporal locus of important speech events. Once regularity is perceived, the system can tolerate small perturbations whereas strong regularity-based predictions would allow the maintenance of a pattern even in the presence of displaced or omitted events. This function differentiates the proposed role of the motor system in processing temporal aspects such as tracking rhythm, speech rate and turn taking in communication [71,72]. Concluding remarks In this article we have outlined a neurofunctional framework of speech processing that emphasizes two elementary aspects of speech. First, information conveyed in an acoustic signal is entirely time-dependent. Temporal characteristics of the signal and related temporal processing should therefore play a significant role in both speech production and perception. Second, the evolution of speech as a complex motor behavior originates in subcortico-cortical motor systems and their capacity to temporally structure behavioral sequences. We propose that speech production and perception have retained characteristics of this primordial interaction between motor timing and sequencing Box 4. Questions for future research  What are the anatomical and functional commonalities and differences regarding human and animal temporal processing networks?  What is the function of direct and reciprocal subcortico-subcortical pathways besides subcortico-cortical connections and do these circuits interact?  How do temporal processing and predictions generated on the basis of temporal structure relate to other formal and contextual aspects of speech (e.g. phonotactic rules or semantic priming)?  Is the proposed temporal aspect of the framework the same in speech and music? Moreover, is it restricted to the auditory domain or is it involved across sensory and cognitive domains?  Are there therapeutic applications (e.g. overemphasizing the predictive value in the context of pathological speech processing, or to combine specific temporal patterns with movement)?

397

Opinion capacities, and a developing cognitive competence. Neuroanatomically, this fundamental interaction can be retraced in ontogenetic and phylogenetic development in which primitive subcortical structures set in motion basic computational mechanisms in support of refined neocortical functions. In line with a recent proposal on speech production [73] we highlight the necessary contributions of cortical and subcortical brain structures to speech processing. We offer a framework within which to: (i) further investigate how different aspects of uni- and multimodal information converge in time to form unitary percepts [74], (ii) explain how developmental and compensatory mechanisms in speech disorders impact on speech processing (e.g. [75]) and (iii) elucidate how the underlying perspective transfers to other domains such as music (Box 4). Acknowledgments The authors would like to thank D. Yves von Cramon for expert neuroanatomical input and discussion, and Kathrin Rothermich, Iris N. Knierim, Maren Schmidt-Kassow and Anna S. Hasting for continual feedback. Special thanks to Robert F. Port and Angela D. Friederici as well as to the anonymous reviewers for constructive comments on an earlier version of the manuscript, and Richard Ivry for valuable discussion of the concept. Lastly, thanks to Kerstin Flake for graphics support.

References 1 Poeppel, D. and Hickok, G. (2004) Towards a new functional anatomy of language. Cognition 92, 1–12 2 Moore, B.C.J. (2003) Temporal integration and context effect in hearing. J. Phonetics 31, 563–574 3 Buhusi, C.V. and Meck, W.H. (2005) What makes us tick? Functional and neural mechanisms of interval timing. Nat. Rev. Neurosci. 6, 755– 765 4 Petacchi, A. et al. (2005) Cerebellum and auditory function: an ALE meta-analysis of functional neuroimaging studies. Hum. Brain Mapp. 25, 118–128 5 de Cheveigne´, A. (2003) Time-domain auditory processing of speech. J. Phonetics 31, 547–561 6 Lashley, K.S. (1951) The problem of serial order in behavior. In Cerebral mechanisms in behavior (Jeffress, L.A., ed.), pp. 112–136, Wiley 7 Jackendoff, R. (2002) Foundations of language, Oxford University Press 8 Patel, A.D. (2003) Language, music, syntax and the brain. Nat. Rev. Neurosci. 6, 674–681 9 Ghitza, O. and Greenberg, S. (2009) On the possible role of brain rhythms in speech perception: intelligibility of time-compressed speech with periodic and aperiodic insertions of silence. Phonetica. 66, 113–126 10 Giraud, A. et al. (2007) Endogeneous cortical rhythms determine cerebral specialization for speech perception and production. Neuron. 56, 1127–1134 11 Huang, C.M. et al. (1982) Projections from the cochlear nucleus to the cerebellum. Brain Res. 244, 1–8 12 Woody, C.D. et al. (1998) Acoustic transmission in the dentate nucleus I. Changes in activity and excitability after conditioning. Brain Res. 789, 74–83 13 Morest, D.K. et al. (1997) Neuronal and transneuronal degeneration of auditory axons in the brainstem after cochlear lesions in the chinchilla: cochleotopic and non-cochleotopic patterns. Hearing Res. 103, 151–168 14 Wang, X.F. et al. (1991) The dentate nucleus is a short-latency relay of a primary auditory transmission pathway. Neuroreport 2, 361–364 15 Xi, M.C. et al. (1994) Identification of short latency auditory responsive neurons in the cat dentate nucleus. Neuroreport 5, 1567–1570 16 Dum, R.P. and Strick, P.L. (2003) An unfolded map of the cerebellar dentate nucleus and its projections to the cerebral cortex. J. Neurophysiol. 89, 634–639 17 Akkal, D. et al. (2007) Supplementary motor area and presupplementary motor area: targets of basal ganglia and cerebellar output. J. Neurosci. 27, 10659–10673 398

Trends in Cognitive Sciences Vol.14 No.9 18 Huang, C. and Liu, G. (1990) Organization of the auditory area in the posterior cerebellar vermis of the cat. Exp. Brain Res. 81, 377–383 19 Altman, J.A. et al. (1976) Electrical responses of the auditory area of the cerebellar cortex to acoustic stimulation. Exp. Brain Res. 26, 285– 298 20 Bignall, K.E. (1970) Auditory input to frontal polysensory cortex of the squirrel monkey: possible pathways. Brain Res. 19, 77–86 21 Benedek, G. et al. (1997) Visual, somatosensory, auditory and nociceptive modality properties in the feline suprageniculate nucleus. Neuroscience 78, 179–189 22 Paro´czy, Z. et al. (2006) Spatial and temporal visual properties of single neurons in the suprageniculate nucleus of the thalamus. Neuroscience 137, 1397–1404 23 Kobler, J.B. et al. (1987) Auditory pathways to the frontal cortex of the mustache bat, pteronotus parnellii. Science 236, 824–826 24 Kurokawa, T. et al. (1990) Frontal cortical projections from the suprageniculate nucleus in the rat, as demonstrated with the PHAL method. Neurosci. Lett. 120, 259–262 25 Katoh, Y. and Deura, S. (1993) Direct projections from the cerebellar fastigial nucleus to the thalamic suprageniculate nucleus in the cat studied with the anterograde and retrograde axonal transport of wheat germ agglutinin-horseradish peroxidase. Brain Res. 617, 155–158 26 Katoh, Y. et al. (1994) Bilateral projections from the superior colliculus to the suprageniculate nucleus in the cat: a WGA-HRP /double fluorescent tracing study. Brain Res. 669, 298–302 27 Kurokawa, T. and Saito, H. (1995) Retrograde axonal transport of different fluorescent tracers from the neocortex to the suprageniculate nucleus in the rat. Hearing Res. 85, 103–108 28 Budinger, E. (2000) Functional organization of auditory cortex in the mongolian gerbil (meriones unguiculatus). IV. Connections with anatomically characterized subcortical structures. Eur. J. Neurosci. 12, 2452–2474 29 Pastor, M.A. et al. (2008) Frequency-specific coupling in the corticocerebellar auditory system. J. Neurophysiol. 100, 1699–1705 30 Pastor, M.A. et al. (2006) The neural basis of temporal auditory discrimination. NeuroImage 30, 512–520 31 Alario, F. et al. (2006) The role of the supplementary motor area (SMA) in word production. Brain Res. 1076, 129–143 32 Lehe´ricy, S. et al. (2004) 3-D diffusion tensor axonal tracking shows distinct SMA and Pre-SMA projections to the human striatum. Cereb. Cortex 14, 1302–1309 33 Postuma, R.B. and Dagher, A. (2006) Basal ganglia functional connectivity based on a meta-analysis of 126 positron emission tomography and functional magnetic resonance imaging publications. Cereb. Cortex 16, 1508–1521 34 Middleton, F.A. and Strick, P.L. (2000) Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Rev. 31, 236–250 35 Krienen, F.M. and Bruckner, R.L. (2009) Segregated fronto-cerebellar circuits revealed by intrinsic functional connectivity. Cereb. Cortex 19, 2485–2497 36 Sherman, M.S. and Guillery, R.W. (2005) Exploring the thalamus and its role in cortical function, MIT Press 37 He, J. and Hu, B. (2002) Differential distribution of burst and singlespike responses in auditory thalamus. J. Neurophys. 88, 2152– 2156 38 Izhikevich, E.M. (2004) Which model to use for cortical spiking neurons? IEEE T. Neural. Networ. 15, 1063–2004 39 Kepecs, A. and Lisman, J. (2003) Information encoding and computation with spikes and bursts. Netw. Comput. Neural. Syst. 14, 103–118 40 Po¨ppel, E. (1997) A hierarchical model of temporal perception. Trends Cogn. Sci. 1, 56–61 41 Lieberman, P. (2002) On the nature and evolution of the neural bases of human language. Yearb. Phys. Anthropol. 45, 36–62 42 Leiner, H.C. et al. (1993) Cognitive and language functions of the human cerebellum. Trends Neurosci. 16, 444–447 43 MacLeod, C.E. et al. (2003) Expansion of the neocerebellum in hominoidea. J. Hum. Evol. 44, 401–429 44 Weaver, A.H. (2005) Reciprocal evolution of the cerebellum and neocortex in fossil humans. Proc. Natl. Acad. Sci. U. S. A. 102, 3576–3580 45 Allard, F. and Scott, B.L. (1975) Burst cues, transition cues, and hemispheric specialization with real speech sounds. Q. J. Exp. Psychol. 27, 487–497

Opinion 46 Hammond, G.R. (1982) Hemispheric differences in temporal resolution. Brain Cognition 1, 95–118 47 Zatorre, R.J. and Belin, P. (2001) Spectral and temporal processing in human auditory cortex. Cereb. Cortex 11, 946–953 48 Lie´geois-Chauvel, C. et al. (1999) Specialization of left auditory cortex for speech processing in man depends on temporal coding. Cereb. Cortex 9, 484–496 49 Abrams, D.A. et al. (2008) Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech. J. Neurosci. 28, 3958–3965 50 Hickok, G. and Poeppel, D. (2007) The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 51 Callan, D.E. et al. (2007) Speech and song: the role of the cerebellum. Cerebellum 6, 321–327 52 Scho¨nwiesner et al. (2007) Hemispheric asymmetry for auditory processing in the human auditory brain stem, thalamus, and cortex. Cereb. Cortex 17, 492–499 53 Wang, X. et al. (2003) Cortical processing of temporal modulations. Speech Commun. 41, 107–121 54 Wang, X. et al. (2008) Neural coding of temporal information in auditory thalamus and cortex. Neuroscience 154, 294–303 55 Graybiel, A.M. (1997) The basal ganglia and cognitive pattern generators. Schizophrenia Bull. 23, 459–469 56 Friederici, A.D. (2009) Pathways to language: fiber tracts in the human brain. Trends Cogn. Sci. 13, 175–181 57 Rilling, J.K. et al. (2008) The evolution of the arcuate fasciculus revealed with comparative DTI. Nat. Neurosci. 11, 426–428 58 Catani, M. et al. (2005) Perisylvian language networks of the human brain. Ann. Neurol. 57, 8–16 59 MacNeilage, P.F. and Davis, B.L. (2001) Motor mechanisms in speech ontogeny: phylogenetic, neurobiological and linguistic implications. Curr. Opin. Neurobiol. 11, 696–700 60 MacNeilage, P.F. (2008) The origin of speech, Oxford University Press

Trends in Cognitive Sciences

Vol.14 No.9

61 Rosen, S. (1992) Temporal information in speech: acoustic, auditory and linguistic aspects. Philos. T Royal Soc. B. 336, 367–373 62 Shannon, R.V. et al. (1995) Speech recognition with primarily temporal cues. Science 270, 303–304 63 Greenberg, S. et al. (2003) Temporal properties of spontaneous speech – a syllable-centric perspective. J. Phonetics 31, 465–485 64 Kochanski, G. and Orphanidou, C. (2008) What marks the beat of speech? J. Acoust. Soc. Am. 123, 2780–2791 65 Morton, J. et al. (1976) Perceptual centres (p-centres). Psychol. Rev. 83, 405–408 66 Port, R.F. (2003) Meter and speech. J. Phonetics 31, 599–611 67 Large, E.W. and Jones, M.R. (1999) The dynamics of attending: how people track time-varying events. Psychol. Rev. 106, 119–159 68 Glasser, M.F. and Rilling, J.K. (2008) DTI tractography of the human brain’s language pathways. Cereb. Cortex 18, 2471–2482 69 Saur, D. et al. (2010) Combining functional and anatomical connectivity reveals brain networks for auditory language comprehension. NeuroImage 49, 3187–3197 70 Rauschecker, J.P. and Scott, S.K. (2009) Maps and streams in the auditory cortex: non-human primates illuminate human speech processing. Nat. Rev. Neurosci. 12, 718–724 71 Kotz, S.A. et al. (2009) Non-motor basal ganglia functions: A review and proposal for a model of sensory predictability in auditory language perception. Cortex 45, 982–990 72 Scott, S.K. et al. (2009) A little more conversation, a little less action – candidate roles for the motor cortex in speech perception. Nat. Rev. Neurosci. 10, 295–302 73 Gunther, F.H. (2006) Cortical interactions underlying the production of speech sounds. J. Commun. Disord. 39, 350–365 74 Schroeder, C. et al. (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn. Sci. 12, 106–113 75 Corriveau, K.H. and Goswami, U. (2009) Rhythmic motor entrainment in children with speech and language impairment: tapping to the beat. Cortex 45, 119–130

399

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.