LibCat » Книги » Приключения » unrecognised » The Handbook of Speech Perception

The Handbook of Speech Perception

Здесь есть возможность читать онлайн «The Handbook of Speech Perception» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
The Handbook of Speech Perception
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

The Handbook of Speech Perception: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «The Handbook of Speech Perception»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

A wide-ranging and authoritative volume exploring contemporary perceptual research on speech, updated with new original essays by leading researchers Speech perception is a dynamic area of study that encompasses a wide variety of disciplines, including cognitive neuroscience, phonetics, linguistics, physiology and biophysics, auditory and speech science, and experimental psychology.
, Second Edition, is a comprehensive and up-to-date survey of technical and theoretical developments in perceptual research on human speech. Offering a variety of perspectives on the perception of spoken language, this volume provides original essays by leading researchers on the major issues and most recent findings in the field. Each chapter provides an informed and critical survey, including a summary of current research and debate, clear examples and research findings, and discussion of anticipated advances and potential research directions. The timely second edition of this valuable resource:
Discusses a uniquely broad range of both foundational and emerging issues in the field Surveys the major areas of the field of human speech perception Features newly commissioned essays on the relation between speech perception and reading, features in speech perception and lexical access, perceptual identification of individual talkers, and perceptual learning of accented speech Includes essential revisions of many chapters original to the first edition Offers critical introductions to recent research literature and leading field developments Encourages the development of multidisciplinary research on speech perception Provides readers with clear understanding of the aims, methods, challenges, and prospects for advances in the field
, Second Edition, is ideal for both specialists and non-specialists throughout the research community looking for a comprehensive view of the latest technical and theoretical accomplishments in the field.

The Handbook of Speech Perception — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «The Handbook of Speech Perception», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Other recent research has determined that some of the strongest correlations across audible and visible signals lie in the acoustic range of 2–3 kHz (Chandrasekaran et al., 2009). This may seem unintuitive because it is within this range that the presumed less visible articulatory movements of the tongue and pharynx play their largest role in sculpting the sound. However, the configurations of these articulators were shown to systematically influence subtle visible mouth movements. This fact suggests that there is a class of visible information that strongly correlates with the acoustic information formed by internal articulators. In fact, visual speech research has shown that the presumably “hidden” articulatory dimensions (e.g. lexical tone, intraoral pressure) are actually visible in corresponding face surface changes, and can be used as speech information (Burnham et al., 2000; Han et al., 2018; Munhall & Vatikiotis‐Bateson, 2004). That visible mouth movements can inform about internal articulation may explain a striking recent finding. It turns out that, when observers are shown cross‐sectional ultrasound displays of internal tongue movements, they can readily integrate these novel displays with synchronized auditory speech information (D’Ausilio et al., 2014; see also Katz & Mehta, 2015).

The strong correspondences between auditory and visual speech information has allowed auditory speech to be synthesized based on tracking kinematic dimensions available on the face (e.g. Barker & Berthommier, 1999; Yehia, Kuratate, & Vatikiotis‐Bateson, 2002). Conversely, the correspondences have allowed facial animation to be effectively created based on direct acoustic signal parameters (e.g. Yamamoto, Nakamura, & Shikano, 1998). There is also evidence for surprisingly close correspondences between audible and visible macaque calls, which macaques can easily perceive as corresponding (Ghazanfar et al., 2005). This finding may suggest a traceable phylogeny of the supramodal basis for multisensory communication.

Importantly, there is evidence that perceivers make use of these crossmodal informational correspondences. While the supramodal thesis proposes that the relevant speech information takes a supramodal higher‐order form, the degree to which this information is simultaneously available in both modalities depends on a number of factors (e.g. visibility, audibility). The evidence shows that, in contexts for which the information is simultaneously available, perceivers take advantage of this correspondence (e.g. Grant & Seitz, 2000; Grant, 2001; Kim & Davis, 2004; Palmer & Ramsey, 2012; Schwartz, Berthommier, & Savariaux, 2004; Eskelund, Tuomainen, & Andersen, 2011; Rosen, Fourcin, & Moore, 1981). Research shows that the availability of segment‐to‐segment correspondence across the modalities’ information strongly predicts how well one modality will enhance the other (Grant & Seitz, 2000, 2001; Kim & Davis, 2004). Functionally, this finding supports the aforementioned “bimodal coherence‐masking protection” in that the informational correspondence across modalities allows one modality to boost the usability of the other (e.g. in the face of everyday masking degradation). In this sense, the supramodal thesis is consistent with the evidence supporting the bimodal coherence masking protection concept discussed earlier (Grant & Seitz, 2000; Grant, 2001; Kim & Davis, 2004). However, the supramodal thesis does go further by suggesting that: (1) the crossmodal correspondences are considered to be much more common and complex; and (2) the abstract form of information that can support correspondences is considered the primary type of information which the speech mechanism uses (regardless of the degree of moment‐to‐moment correspondence or specific availability of information in a modality).

General examples of supramodal information

While some progress has been made in identifying the detailed ways in which information takes the same specific form across modalities, more progress has been made to establish the general ways in which the informational forms are similar. In the previous version of this chapter, it was argued that both auditory and visual speech show an important primacy of time‐varying information (Rosenblum, 2005; see also Rosenblum, 2008). At the time that chapter was written, many descriptions of visual speech information were based on static facial feature information, and still images were often used as stimuli. Since then, most all methodological and conceptual interpretations of visual speech information have incorporated a critical dynamic component (e.g. Jesse & Bartoli, 2018; Jiang et al., 2007).

This contemporary emphasis on time‐varying information exists in both the behavioral and the neurophysiological research. A number of studies have examined how dynamic facial dimensions are extracted and stored for purposes of both phonetic and indexical perception (for a review, see Jesse & Bartoli, 2018). Other studies have shown that moment‐to‐moment visibility of articulator movements (as conveyed through discrete facial points) is highly predictive of lip‐reading performance (e.g. Jiang et al., 2007). These findings suggest that kinematic dimensions provide highly salient information for lip‐reading (Jiang et al., 2007). Other research has examined the neural mechanisms activated when perceiving dynamic speech information. For example, there is evidence that the mechanisms involved during perception of speech from isolated kinematic (point‐light) displays differ from those involved in recognizing speech from static faces (e.g. Santi et al., 2003). At the same time, brain reactivity to the isolated motion of point‐light speech does not qualitatively differ from reactivity to normal (fully illuminated) speaking faces (Bernstein et al., 2011). These neurophysiological findings are consistent with the primacy of time‐varying visible speech dimensions, which, in turn is analogous to the same primacy in audible speech (Rosenblum, 2005).

A second general way in which auditory and visual speech information takes a similar form is in how it interacts with – and informs about – indexical properties. As discussed in the earlier chapter, there is substantial research showing that both auditory and visual speech functions make use of talker information to facilitate phonetic perception (for reviews, see Nygaard, 2005; Rosenblum, 2005). It is easier to understand speech from familiar speakers (e.g. Borrie et al., 2013; Nygaard, 2005), and easier to lip‐read from familiar faces, even for observers who have no formal lip‐reading experience (e.g. Lander & Davies, 2008; Schweinberger & Soukup, 1998; Yakel, Rosenblum, & Fortier, 2000).

In these talker‐facilitation effects , it could be that an observer’s phonetic perception is facilitated by their familiarity with the separate vocal and facial characteristics provided by each modality. However, research conducted in our lab suggests that perceivers may also gain experience with the deeper, supramodal talker dimensions available across modalities (Rosenblum, Miller, & Sanchez, 2007; Sanchez, Dias, & Rosenblum, 2013). Our research shows that the talker experience gained through one modality can be shared across modalities to facilitate phonetic perception in the other. For example, becoming familiar with a talker by lip‐reading them (without sound) for one hour allows a perceiver to then better understand that talker’s auditory speech (Rosenblum, Miller, & Sanchez, 2007). Conversely, listening to the speech of a talker for one hour allows a perceiver to better lip‐read from that talker (Sanchez, Dias, & Rosenblum, 2013). Interestingly, this crossmodal talker facilitation works for both old words (perceived during familiarization) and new words, suggesting that the familiarity is not contained in specific lexical representations (Sanchez, Dias, & Rosenblum, 2013). Instead, the learned supramodal dimensions may be based on talker‐specific phonetic information contained in the idiolect of the perceived talker (e.g. Remez, Fellowes, & Rubin, 1997; Rosenblum et al., 2002).