The scope of the topics encompassed in the Handbook of Speech Perception reflects the wide‐ranging research community that studies speech perception. This includes neighboring fields: audiology, speech and hearing sciences, behavioral neuroscience, cognitive science, computer science and electrical engineering, linguistics, physiology and biophysics, otology, and experimental psychology. The chapters are accessible to nonspecialists while also engaging to specialists. While the Handbook of Speech Perception takes a place among the many excellent companion volumes in the Wiley Blackwell series on language and linguistics, the collection is unique in its emphasis on the specific concerns of the perception of spoken language.
If the advent of a handbook can be viewed as a sign of growth and maturity of a discipline, the appearance of this Second Edition is evidence of the longevity of research interest in spoken language. This new edition of the Handbook of Speech Perception brings the diverse field together for the researcher who, while focusing on a specific aspect of speech perception, might desire a clearer understanding of the aims, methods, and prospects for advances across the field. In addition to the critical survey of developments across a wide range of research on human speech perception, we also anticipate the Handbook facilitating the development of multidisciplinary research on speech perception.
We cannot conclude without acknowledging the many individuals on whose creativity, knowledge, and cooperation this endeavor depended, namely, the authors whose essays compose the Handbook of Speech Perception. A venture of this scope cannot succeed without the conscientious care of a publisher to protect the project, and we have received the benefit of this attention from many people at Wiley, originating with Tanya McMullin who was instrumental at the start of the project, Angela Cohen, Rachel Greenberg, and Clelia Petracca.
With our sincere thanks,
Jennifer S. Pardo
Bedford, New York
Lynne C. Nygaard
Atlanta, Georgia
Robert E. Remez
New York, New York
David B. Pisoni
Bloomington, Indiana
Part I Sensing Speech
1 Perceptual Organization of Speech
ROBERT E. REMEZ
Barnard College, Columbia University, United States
How does a perceiver resolve the linguistic properties of an utterance? This question has motivated many investigations within the study of speech perception and a great variety of explanations. In a retrospective summary over 30 years ago, Klatt (1989) reviewed a large sample of theoretical descriptions of the perceiver’s ability to project the sensory effects of speech, exhibiting inexhaustible variety, into a finite and small number of linguistically defined attributes, whether features, phones, phonemes, syllables, or words. While he noted many distinctions between the accounts, with few exceptions they exhibited a common feature. Each presumed that perception begins with a speech signal, well composed and fit to analyze. This common premise shared by otherwise divergent explanations of perception obliges the models to admit severe and unintended constraints on their applicability. To exist within the limits set by this simplifying assumption, the models apply implicitly to a world in which speech is the only sound; moreover, only a single talker ever speaks at once. Although this designation is easily met in laboratory samples, it is safe to say that it is rare in vivo . Moreover, in their exclusive devotion to the perception of speech the models are tacitly modular (Fodor, 1983), even those that deny it.
Despite the consequences of this dedication of perceptual models to speech and speech alone, there has been a plausible and convenient way to persist in invoking the simplifying assumption. This fundamental premise survives intact if a preliminary process of perceptual organization finds a speech signal, follows its patterned variation amid the effects of other sound sources, and delivers it whole and ready to analyze for linguistic properties. The indifference to the conditions imposed by the common perspective reflects an apparent consensus at the time that the perceptual organization of speech is simple, automatic, and accomplished by generic means. However, despite the rapidly established perceptual coherence of the constituents of a speech signal, the perceptual organization of speech cannot be reduced to the available and well‐established principles of auditory perceptual organization.
Perceptual organization and the gestalt legacy
A generic auditory model of organization
The dominant contemporary account of auditory perceptual organization has been auditory scene analysis (Bregman, 1990). This theory of the resolution of auditory sensation into streams, each issuing from a distinct source, developed empirically in the cognitive era, though its intellectual roots run deep. The gestalt psychologist Wertheimer (1923/1938) established the basic premises of the account in a legendary article, the contents of which are roughly known to all students of introductory psychology. In visible and audible examples, Wertheimer described the coalescence of elementary figures into groups and contours, arguing that sensory experience is organized in patterns, and is not registered as a mere spatter of individual receptor states. By considering a series of hypothetical cases, and without knowing the sensory physiology that would not be described for decades (Mountcastle, 1998), he justified organizing principles of similarity , proximity , closure , symmetry , common fate , continuity , set , and habit . Hindsight suggests that Wertheimer framed the problem astutely, or so it now seems given our contemporary understanding of the functions of the sensory periphery that integrate the action of visual and auditory receptors (Hochberg, 1974).
Setting the indefinitely elastic principle of habit aside, the simple gestalt‐derived criteria of grouping are arguably reducible to two functions: (1) to compose an inventory of sensory elements; and (2) to create contours or groups on the principle that like binds to like. Whether groups occur due to the spectral composition of auditory elements, their common onset or offset, proximity in frequency, symmetry of rate of change in an auditory dimension, harmonic relationship, the interpolation of brief gaps, and so on, each is readily understood as a case in which similarity between a set of auditory sensory elements promotes grouping automatically. A group composed according to these functions forms a sensory contour or perceptual stream. It is a small but necessary extrapolation to assert that an auditory contour consists of elements originating from a single source of sound, and therefore that perceptual organization parses sensory experience into concurrent streams, each issuing from a different sound‐producing event (Bregman & Pinker, 1978).
In a series of ongoing experiments, researchers adopted Wertheimer’s auditory conjectures, and calibrated the resolution of auditory streams by virtue of the historic principles and their derived corollaries. For example, Bregman and Campbell (1971) reported that auditory streams formed when a sequence of 100 ms tones differing in frequency was presented to listeners. According to a procedure that has become standard, the series of brief tones was presented repetitively to listeners, who were asked to report the order of tones in the series. Instead of hearing a sequence of high and low pitches, though, listeners grouped tones into two streams each composed of similar elements, one of high pitch and the other of low pitch (see Figure 1.1). Critically, the perception of the order of elements was veridical within streams, but perception of the intercalation order across the streams was erroneous. In another example, Bregman, Ahad, and Van Loon (2001) reported that a sequence of 65 ms bursts of band‐limited noise were grouped together or split into separate perceptual streams as a function of the similarity in center frequency of the noise bursts. A sizable literature of empirical tests of this kind spans 50 years, and calibrates the sensory conditions of grouping by one or another variant of similarity. A compilation of the literature is offered by Bregman (1990), and the theoretical yield of this research is summarized by Darwin (2008).
Читать дальше