LibCat » Книги » Приключения » unrecognised » The Handbook of Speech Perception

The Handbook of Speech Perception

Здесь есть возможность читать онлайн «The Handbook of Speech Perception» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
The Handbook of Speech Perception
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

The Handbook of Speech Perception: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «The Handbook of Speech Perception»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

A wide-ranging and authoritative volume exploring contemporary perceptual research on speech, updated with new original essays by leading researchers Speech perception is a dynamic area of study that encompasses a wide variety of disciplines, including cognitive neuroscience, phonetics, linguistics, physiology and biophysics, auditory and speech science, and experimental psychology.
, Second Edition, is a comprehensive and up-to-date survey of technical and theoretical developments in perceptual research on human speech. Offering a variety of perspectives on the perception of spoken language, this volume provides original essays by leading researchers on the major issues and most recent findings in the field. Each chapter provides an informed and critical survey, including a summary of current research and debate, clear examples and research findings, and discussion of anticipated advances and potential research directions. The timely second edition of this valuable resource:
Discusses a uniquely broad range of both foundational and emerging issues in the field Surveys the major areas of the field of human speech perception Features newly commissioned essays on the relation between speech perception and reading, features in speech perception and lexical access, perceptual identification of individual talkers, and perceptual learning of accented speech Includes essential revisions of many chapters original to the first edition Offers critical introductions to recent research literature and leading field developments Encourages the development of multidisciplinary research on speech perception Provides readers with clear understanding of the aims, methods, challenges, and prospects for advances in the field
, Second Edition, is ideal for both specialists and non-specialists throughout the research community looking for a comprehensive view of the latest technical and theoretical accomplishments in the field.

The Handbook of Speech Perception — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «The Handbook of Speech Perception», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

For all of these reasons, a number of authors, including ourselves, have suggested that less weight be placed on the McGurk effect in evaluating multisensory integration. Evaluation of integration may be better served with measures of the perceptual super‐additivity of visual and audio (e.g. in noise) streams (e.g. Alsius, Paré, & Munhall, 2017; Irwin & DiBlasi, 2017; Remez, Beltrone, & Willimetz, 2017); influences on speech‐production responses (Gentilucci & Cattaneo, 2005; and see Sato et al., 2010); and neurophysiological responses (e.g, Skipper et al., 2007). Such methods may very well be more stable, valid, and representative indexes of integration than the McGurk effect.

Multimodal speech is integrated at the earliest observable stage

The question of where in the speech function the modal streams integrate (merge) continues to be one of the most studied in the multisensory literature. Since 2005, much of this research has used neurophysiological methods. After the aforementioned fMRI report by Calvert and her colleagues (1997; see also Pekkola et al., 2005), numerous studies have also shown visual speech activation of the auditory cortex, often using other technologies, for example, functional near‐infrared spectroscopy (fNIR) (van de Rijt et al., 2016); electroencephalography (EEG; Callan et al., 2001; Besle et al., 2004); intercranial EEG (ECoG; e.g. Besle et al., 2008); magneto‐encephalography (MEG; Arnal et al., 2009; for a review, see Rosenblum, Dorsi, & Dias, 2016). More recent evidence shows that visual speech can modulate neurophysiological areas considered to be further upstream including the auditory brainstem (Musacchia et al., 2006), which is one of the earliest locations at which direct visual modulation could occur. There is even evidence of visual speech modulation of cochlear functioning (otoacoustic emissions; Namasivayam et al., 2015). While it is likely that visual influences on such peripheral auditory mechanisms are based on feedback from downstream areas, that it can occur indicates the importance of visual input to the speech function.

Other neurophysiological findings suggest that the integration of the streams also happens early. A very recent EEG study revealed that N1 auditory‐evoked potentials (known to reflect primary auditory cortex activity) for visually induced (McGurk) fa and ba syllables (auditory ba + visual fa ; auditory fa + visual ba , respectively) resemble the N1 responses for the corresponding auditory‐alone syllables (Shahin et al. 2018; and see van Wassenhove, Grant, & Poeppel, 2005). The degree of resemblance was larger for individuals whose identification responses showed greater visual influences, suggesting that this modulated auditory cortex activity (reflected in N1) corresponds to an integrated perceived segment. This finding is less consistent with the alternative model that separate unimodal analyses are first conducted at primary cortexes, with their outcomes then combined at a multisensory integrator, such as the posterior STS (pSTS; e.g. Beauchamp et al., 2004).

Other findings suggest that visual modulation of the auditory cortex (as it responds to sound) happens too quickly for an additional integrative step to be part of the process (for a review, see Besle et al., 2004). In fact, there is evidence that adding congruent visual speech to auditory speech input speeds up ERP and MEG reactions in the auditory cortex (van Wassenhove, Grant, & Poeppel, 2005; Hertrich et al., 2009). This facilitation could be a result of visible articulatory information for a segment often being available before the auditory information (see Venezia, Thurman, et al., 2016 for a review). This could allow visual speech to partially serve a sort of priming function – or a cortical preparedness – to speed the auditory function for speech (e.g. Campbell, 2011; Hertrich et al., 2009). Regardless, it is clear that, as the neuroscientific technology improves, it continues to show crossmodal influences as early as can be observed. This pattern of results is analogous to recent non speech findings which similarly demonstrate early audiovisual integration (e.g. Shams et al., 2005; Watkins et al., 2006; for a review, see Rosenblum et al., 2016).

The behavioral research also continues to show evidence of early crossmodal influences (for a review, see Rosenblum, Dorsi, & Dias, 2016). Evidence suggests that visual influences likely occur before auditory feature extraction (e.g. Brancazio, Miller, & Paré, 2003; Fowler, Brown, & Mann, 2000; Green & Gerdeman, 1995; Green & Kuhl, 1989; Green & Miller, 1985; Green & Norrix, 2001; Schwartz, Berthommier, & Savariaux, 2004). Other research shows that information in one modality is able to facilitate perception in the other even before the information is usable – and sometimes even detectable – on its own (e.g. Plass et al., 2014). For example, Plass and his colleagues (2014) used flash suppression to render visually presented articulating faces (consciously) undetectable. Still, if these undetected faces were presented with auditory speech that was consistent and synchronized with the visible articulation, then subjects were faster at recognizing that auditory speech. This suggests that useful crossmodal influences can occur even without awareness of information in one of the modalities.

Other examples of the extreme super‐additive nature of speech integration have been shown in the context of auditory speech detection (Grant & Seitz, 2000; Grant, 2001; Kim & Davis, 2004; Palmer & Ramsey, 2012) and identification (Schwartz, Berthommier, & Savariaux, 2004), as well audio visual speech identification (Eskelund, Tuomainen, & Andersen, 2011; Rosen, Fourcin, & Moore, 1981). Much of this research has been interpreted to suggest that, even without its own (conscious) clear phonetic determination, each modality can help the perceiver attend to critical information in the other modality through analogous patterns of temporal change in the two signals. These crossmodal correspondences are thought to be influential at an especially early stage (before feature extraction) to serve as a “bimodal coherence‐masking protection” against everyday signal degradation (e.g. Grant & Seitz, 2000; Kim & Davis, 2004; Schwartz, Berthommier, & Savariaux, 2004; see also Gordon, 1997). The impressive utility of these crossmodal correspondences will also help motivate the theoretical position proposed later in this chapter.

However, other recent results have been interpreted as suggesting that additional linguistic analyses are conducted on the individual streams before, or concurrent with, integration. For example, a literature has emerged showing that the McGurk effect can be influenced by lexicality and semantic (sentence) context (e.g. Brancazio, 2004; Barutchu et al., 2008; but see Sams et al., 1998; Windmann, 2004, 2007). In one example, audio /ba/ paired with visual /va/, is perceived more often as va when presented in the context of the word valve than in the nonword vatch (Brancazio, 2004). This could mean that the analysis of each individual stream proceeds for some time before influencing the likelihood of audiovisual integration.

However, other interpretations of these results have been offered which are consistent with early integration (Brancazio, 2004; Rosenblum, 2008). It may be that lexicality and sentence context does not bear on the likelihood of integration, but instead on how the post‐integrated segment is categorized. As stated, it is likely that syllables perceived from conflicting audiovisual information are less canonical than those based on congruent (or audio‐alone) information. This fact likely makes those syllables less robust, even when they are being identified as visually influenced segments. This could mean that, despite incongruent segments being fully integrated, the resultant perceived segment is more susceptible to contextual (e.g. lexical) influences than audiovisually congruent (and auditory‐alone) segments. This is certainly known to be the case for less canonical, more ambiguous audio‐alone segments as demonstrated in the Ganong effect, that is, when an ambiguous segment equally heard as k or g in isolation will be heard as the former when placed in front of the syllable iss , but as the latter if heard in front of ift (Connine & Clifton, 1987; Ganong, 1980). If the same is true of incongruent audiovisual segments, then lexical context may not bear on audiovisual integration as such, but on the categorization of the post‐integrated (and less canonical) segment (e.g. Brancazio, 2004).