The second development that cannot go without mention as this updated handbook goes to the printing press in 2020 is the startling spread of the Covid‐19 virus through human communities across the globe. Speech sounds, words, ideas, and (unfortunately) viruses are all transmitted from person to person through the air that we breathe during social interactions. With the covering of visible speech gestures by virus‐blocking masks, the awkward turn‐taking of video conferencing tools with single‐track audio channels, and the social distancing that protects us from the Covid‐19 virus, this pandemic is a constant reminder of the multimodal nature of speech perception and of the centrality of in‐person social interaction for seamless speech communication. The arrangement of the first three major sections of this handbook – I: Sensing Speech , II: Perception of Linguistic Properties , and III: Perception of Indexical Properties – provides the scaffold for an understanding of speech perception as far more than perception of a particular auditory signal. Instead, the chapters in these sections, along with the applications and theories covered in the remaining two sections – IV: Speech Perception by Special Listeners , and V: Theoretical Perspectives – develop the overarching argument that the observation, measurement, and modeling of speech perception must be conducted from a vantage point that encompasses its broad cognitive and social context. This central point is brought home in the final chapter by David Pisoni, one of the founders of the field and editors of this handbook:
“… hearing and speech perception do not function as independent autonomous streams of information or discrete processing operations that take place in isolation from the structure and functioning of the whole information‐processing system. While it is clear that the early stages of speech recognition in listeners with normal hearing are heavily dependent on the initial encoding and registration of highly detailed sensory information, audibility and the sensory processing of speech is only half of the story”.
The chapters in this handbook provide a superbly sign‐posted map of the full story.
Any compendium of knowledge on a particular topic represents a body of knowledge that developed in a specific time and place. The contributors to this handbook cover several generations of researchers spread over many academic disciplines working primarily on both sides of the North Atlantic Ocean. Yet, the scientific study of speech perception as presented in this outstanding handbook is still relatively young and localized. Perhaps one of the lasting lessons of the current pandemic is that we are all even more connected than we thought. New ideas and new ways of knowing can circulate as extensively, though maybe not quite as quickly, as a virus. This bodes well for the future of speech perception research.
Ann R. Bradlow
Northwestern University
Foreword to the First Edition
Historically, the study of audition has lagged behind the study of vision, partly, no doubt, because seeing is our first sense, hearing our second. But beyond this, and perhaps more importantly, instruments for acoustic control and analysis demand a more advanced technology than their optic counterparts: having a sustained natural source of light, but not of sound, we had lenses and prisms long before we had sound generators and oscilloscopes. For speech, moreover, early work revealed that its key perceptual dimensions are not those of the waveform as it impinges on the ear (amplitude, time), but those of its time‐varying Fourier transform, as it might appear at the output of the cochlea (frequency, amplitude, time). So it was only with the invention of instruments for analysis and synthesis of running speech that the systematic study of speech perception could begin: the sound spectrograph of R. K. Potter and his colleagues at Bell Telephone Laboratories in New Jersey during World War II, the Pattern Playback of Franklin Cooper at Haskins Laboratories in New York, a few years later. With these devices and their successors, speech research could finally address the first task of all perceptual study: definition of the stimulus, that is, of the physical conditions under which perception occurs.
Yet, a reader unfamiliar with the byways of modern cognitive psychology who chances on this volume may be surprised that speech perception, as a distinct field of study, even exists. Is the topic not subsumed under general auditory perception? Is speech not one of many complex acoustic signals to which we are exposed, and do we not, after all, simply hear it? It is, of course, and we do. But due partly to the peculiar structure of the speech signal and the way it is produced, partly to the peculiar equivalence relation between speaker and hearer, we also do very much more.
To get a sense of how odd speech is, consider writing and reading. Speech is unique among systems of animal communication in being amenable to transduction into an alternative perceptuomotor modality. The more or less continuously varying acoustic signal of an utterance in any spoken language can be transcribed as a visual string of discrete alphabetic symbols, and can then be reproduced from that string by a reader. How we effect the transforms from analog signal to discrete message, and back again, and the nature of the percept that mediates these transforms are central problems of speech research.
Notice that without the alphabet as a means of notation, linguistics itself, as a field of study, would not exist. But the alphabet is not merely a convenient means of representing language; it is also the primary objective evidence for our intuition that we speak (and language achieves its productivity) by combining a few dozen discrete phonetic elements to form an infinite variety of words and sentences. Thus, the alphabet, recent though it is in human history, is not a secondary, purely cultural aspect of language. The inventors of the alphabet brought into consciousness previously unexploited segmental properties of speech and language, much as, say, the inventors of the bicycle discovered previously unexploited cyclic properties of human locomotion. The biological nature and evolutionary origins of the discrete phonetic categories represented by the alphabet are among many questions on which the study of speech perception may throw light.
To perceive speech is not merely to recognize the holistic auditory patterns of isolated words or phrases, as a bonobo or some other clever animal might do; it is to parse words from a spoken stream, and segments from a spoken word, at a rate of several scores of words per minute. Notice that this is not a matter of picking up information about an objective environment, about banging doors, passing cars, or even crying infants; it is a matter of hearers recognizing sound patterns coded by a conspecific speaker into an acoustic signal according to the rules of a natural language. Speech perception, unlike general auditory perception, is intrinsically and ineradicably intersubjective, mediated by the shared code of speaker and hearer.
Curiously, however, the discrete linguistic events that we hear (segments, syllables, words) cannot be reliably traced in either an oscillogram or a spectrogram. In a general way, their absence has been understood for many years as due to their manner of production: extensive temporal and spectral overlap, even across word boundaries, among the gestures that form neighboring phonetic segments. Yet, how a hearer separates the more or less continuous flow into discrete elements is still far from understood. The lack of an adequate perceptual model of the process may be one reason why automatic speech recognition, despite half a century of research, is still well below human levels of performance.
Читать дальше