Ecological Psychoacoustics 6 PDF
Ecological Psychoacoustics 6 PDF
Ecological Psychoacoustics 6 PDF
. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
The theoretical hierarchy of importance for a major and minor tonality. Semitones are
numbered 011
Hierarchy level
Tonic tone
Tonic triad
Diatonic set
Nondiatonic set
Major hierarchy
Minor hierarchy
makes use of all of these tonalities, albeit not typically within a single piece of
Thus far what has been described is a music-theoretic hierarchy of importance
of the 12 chromatic tones within a particular tonal context; the existence of this
theoretical hierarchy raises the question of whether or not listeners are sensitive
to this organization. Krumhansl, in some classic tests of this question (Krumhansl
& Shepard, 1979; Krumhansl & Kessler, 1982), provided evidence of the psychological reality of this pitch hierarchy and its importance in musical processing. To examine this question, Krumhansl and Shepard (1979) employed the
probe-tone method (see Krumhansl & Shepard, 1979 or Krumhansl, 1990 for
thorough descriptions of this procedure), in which listeners heard a musical
context designed to instantiate a specific tonality, followed by a probe event. The
listeners then rated how well this probe fit with the preceding context in a musical
sense. Using this procedure, Krumhansl and colleagues (Krumhansl & Shepard,
1979; Krumhansl & Kessler, 1982) demonstrated that listeners perceived hierarchy of stability matched the theoretic hierarchy described earlier. Figure 3
shows the averaged ratings for the chromatic notes relative to a major and a minor
context; these ratings are called the tonal hierarchy (Krumhansl, 1990), with
the tonic functioning as a psychological reference point (e.g., Rosch, 1975) by
which the remaining tones of the chromatic set are judged.
Subsequent work on the tonal hierarchy extended these findings in different
directions. Some research demonstrated that these ratings were robust across
different musical tonalities (Krumhansl & Kessler, 1982) with, for example,
the hierarchy for F# major a transposition of the C major hierarchy. Krumhansl
and Kessler used these hierarchy ratings to derive a four-dimensional map of
psychological musical key space; a two-dimensional representation of this map
appears in Fig. 4. This map is intriguing in that it incorporates different important musical relations, such as fifths, and parallel and relative keys. Other work
generalized this approach outside the realm of tonal music, looking at the
hierarchies of stability in non-Western music such as traditional Indian music
(Castellano, Bharucha, & Krumhansl, 1984) and Balinese gamelan music
(Kessler, Hansen, & Shepard, 1984), or exploring extensions and alternatives to
the Western tonal system (Krumhansl, Sandell, & Sargeant, 1987; Krumhansl &
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Average Rating
9 10 11
9 10 11
Average Rating
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
The idealized tonal hierarchy ratings for a major and minor context. (From
Krumhansl and Kessler, 1982.)
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Perfect Fifth
Parallel Major/Minor
Relative Major/Minor
Krumhansl and Kesslers (1982) map of musical key space, with major tonalities
indicated by capital letters and minor tonalities by lowercase letters. Because this figure represents a
four-dimensional torus, the top and bottom edges, and the left and right edges, designate the same
place in key space. Note that this map incorporates a variety of important musical relations, with
neighboring keys on the circle of fifths close to one another, as well as parallel major/minor (major
and minor keys sharing the same tonic) and relative major/minor (major and minor keys sharing the
same diatonic set but different tonics) near one another.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
(they do, after all, perceive musical keys), it is still reasonable to wonder how
much information is needed for apprehending tonality. And relatedly, what
happens in musical contexts when the tonal material changes over time? Such
change, called modulation, is a typical, indeed expected, component of Western
tonal music. Are listeners sensitive to such tonal modulations and, if so, how
quickly is the tonal sense reoriented? Finally, is it possible to model listeners
developing percepts of tonality, and of tonal modulation, based on the musical
surface information available? These questions have been explored in a variety
of experimental contexts.
Sensitivity to Pitch Distributional Information
The issue of how much information is necessary for listeners to apprehend a
sense of musical key has been explored by Smith and Schmuckler (2000, 2004)
in their studies of pitch-distributional influences on perceived tonality. Specifically, this work examined the impact of varying pitch durations on the perception of tonality by manipulating the absolute durations of the chromatic pitches
within a musical sequence while at the same time maintaining the relative durational pattern across time. Thus, tones of longer duration (relative to shorter duration) remained long, despite variation in their actual absolute duration. This
manipulation, which produces equivalent duration profiles (in a correlational
sense), is called tonal magnitude, appears schematically in Fig. 5, and is produced by raising the Krumhansl and Kessler (1982) profiles to exponents ranging
from 0 (producing a flat profile) through 1 (reproducing the original profile) to
4.5 (producing an exaggerated profile). Smith and Schmuckler also varied the
hierarchical organization of the pitches by presented them either in the typical
hierarchical arrangement (as represented by the tonal hierarchy) or in a nonhierarchical arrangement produced by randomizing the assignment of durations to
individual pitches; examples of randomized profiles are also seen in Fig. 5.
In a series of experiments, Smith and Schmuckler created random melodies in
which note durations were based on the various profiles shown in Fig. 5. Using
the probe tone procedure, listeners percepts of tonality in response to these
melodies were assessed by correlating stability ratings with Krumhansl and
Kesslers idealized tonal hierarchy values. A sample set of results from these
series appears in Fig. 6 and demonstrates that increasing tonal magnitude led to
increasingly stronger percepts of tonality, but only when pitches were organized
hierarchically. Later studies revealed that varying frequency of occurrence while
holding duration constant failed to instantiate tonality (a result also found by
Lantz, 2002; Lantz & Cuddy, 1998; Oram & Cuddy, 1995) and that the cooccurrence of duration and frequency of occurrence led to the most robust tonal
percepts. Interestingly, tonality was not heard until the note duration pattern
exceeded what would occur based on a direct translation of Krumhansl and
Kesslers (1982) ratings. These studies revealed that listeners are not uniformly
sensitive to relative differences in note duration but instead require a divergent
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Ecological Psychoacoustics
Sem 5 6
9 1
0 1
4. 5
3.5 0
Percent Duration
9 1
4 5
3.5 .0
2 0
2.0 .5
1. 5
0.5 0
Percent Duration
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Graphs of the relative durations for the 12 notes of the chromatic scale as a function of changing tonal magnitude. The top figure shows the relative durations when organized hierarchically, based on Krumhansl and Kessler (1982); the bottom figure shows a nonhierarchical
(randomized) organization.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
4.5 5.0
Tonal Magnitude
Findings from Smith and Schmucklers investigations of the impact of tonal magnitude and hierarchical organization manipulations on perceived tonality. The .05 significance level
for the correlation with the tonal hierarchy is notated.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
This model, called the intervallic rivalry model (Brown, 1988; Brown & Butler,
1981; Brown et al., 1994; Browne, 1981; Butler, 1989, 1990; Butler & Brown,
1984, 1994), proposes that listeners determine a musical key by recognizing the
presence of rare intervals that unambiguously delineate a tonality. Results from
studies based on this approach (Brown, 1988; Brown & Butler, 1981) have
demonstrated that listeners can, in fact, use rare interval information (when combined with a third, disambiguating tone) to determine tonality when such information is presented both in isolation and in short melodic passages.
In contrast, the KrumhanslSchmuckler key-finding algorithm (Krumhansl,
1990; Krumhansl & Schmuckler, 1986a) focuses on the pitch content of a musical
passage and not on the local temporal ordering of pitches (but see Krumhansl,
2000b, and Krumhansl & Toiviainen, 2001, for an innovation using temporal
ordering). This algorithm operates by matching the major and minor hierarchy
values with tone duration and/or frequency of occurrence profiles for the chromatic set, based on any particular musical sequence. The result of this comparison is an array of values representing the fit between the relative duration values
of a musical passage and the idealized tonal hierarchy values, with the tonal implications of the passage indicated by the strength of the relations between the
passage and the idealized tonal hierarchies.
Initial tests of the KrumhanslSchmuckler algorithm (see Krumhansl, 1990)
explored the robustness of this approach for key determination in three different
contexts. In its first application, the key-finding algorithm predicted the tonalities of preludes written by Bach, Shostakovich, and Chopin, based on only the
first few notes of each piece. The second application extended this analysis by
determining the tonality of the fugue subjects of Bach and Shostakovich on a
note-by-note basis. Finally, the third application assessed the key-finding algorithms ability to trace key modulation through Bachs C minor prelude (WellTempered Clavier, Book II) and compared these key determinations to analyses
of key strengths provided by two expert music theorists. Overall, the algorithm
performed quite well (see Krumhansl, 1990, pp. 77110, for details), proving both
effective and efficient in determining the tonality of excerpts varying in length,
position in the musical score, and musical style.
In general, the KrumhanslSchmuckler key-finding algorithm has proved successful in key determination of musical scores and has been used extensively by
a number of authors for a variety of purposes (Cuddy & Badertscher, 1987;
Frankland & Cohen, 1996; Huron & Parncutt, 1993; Takeuchi, 1994; Temperley,
1999; Wright et al., 2000). Interestingly, however, there have been few explicit
explorations of the models efficacy in predicting listeners percepts of key,2
although Krumhansl and Toiviainen (2001), in a formalization of this algorithm
Most of the research employing the KrumhanslSchmuckler algorithm has used the model to quantify the tonal implications of the musical stimuli being used in experimental investigations. Such
work, although interesting and informative about the robustness of the model, is not, unfortunately,
a rigorous test of the algorithm itself.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
using a self-organizing map neural network, have provided at least one direct test
of this model.
In an attempt to assess more directly the models ability to predict listeners
percepts of tonality, Schmuckler and Tomovski (1997, 2002a) conducted a series
of experiments patterned after the original applications of the algorithm described
earlier. Using the probe-tone procedure and modeling this work after the first
application of the algorithm, Schmuckler and Tomovski gathered probe tone
ratings for the chromatic set, using as contexts the beginning segments (approximately four notes) of the 24 major and minor preludes of Bach (Well-Tempered
Clavier, Book I) and Chopin (opus 28); Fig. 7 presents some sample contexts
from this study. Listeners ratings for preludes from both composers were correlated with the idealized tonal hierarchy values, with these correlations then compared with the key-finding algorithms predictions of key strength based on the
same segments; these comparisons are shown in Table 2. For the Bach preludes,
both the algorithm and the listeners picked out the tonality quite well. The algorithm correlated significantly with the intended key (i.e., the key designated by
the composer) for all 24 preludes, producing the highest correlation with the
intended key in 23 of 24 cases. The listeners showed similarly good key determination, with significant or marginally significant correlations between probe
tone ratings and the tonal hierarchy of the intended key for 23 of 24 preludes and
producing the strongest correlation with the intended key for 21 of 24 preludes.
Accordingly, both the algorithm and the listeners were quite sensitive to the tonal
implications of these passages based on just the initial few notes.
Table 2 also displays the results for the Chopin preludes. In contrast to the
Bach, both the algorithm and the listeners had more difficulty in tonal identification. For the algorithm, the correlation with the intended key was significant
in only 13 of 24 preludes and was the highest correlation for only 11 of these
cases. The listeners performed even worse, producing significant correlations
with the intended key for 8 of 24 preludes, with only 6 of the correlations with
the intended key being the strongest relation. What is intriguing about these failures, however, is that both the algorithm and the listeners behaved similarly.
Figure 8 graphs the correlations shown in the final two columns of Table 2 and
reveals that the situations in which the algorithm failed to find the key were also
those in which listeners performed poorly and vice versa; accordingly, these two
sets of data were positively correlated. Thus, rather than indicating a limitation,
the poor performance of the algorithm with reference to the Chopin preludes
demonstrates that it is actually picking up on the truly tonally ambiguous implications of these short segments.
A subsequent study looked in more depth at the algorithms modeling of listeners developing tonal percepts, using some of the Chopin preludes in which
listeners did not identify the correct key. Other research, however, mirrored
Krumhansl and Schmucklers third application by exploring the algorithms
ability to track key modulation, or movement through different tonalities, within
a single piece. The ability to track key movement is considered a crucial asset
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
C Major
C Minor
C Minor
G# Minor
F# Major
D Major
F# Minor
Bb Minor
Sample contexts used in the probe tone studies of Schmuckler and Tomovski.
For the Bach contexts, both the algorithm and the listeners correctly determined the musical key. For
the Chopin segments, the first context shown is one in which both algorithm and context correctly
determined the musical key. The second context shown is one in which the algorithm (but not the
listeners) determined the musical key, and the third context shown is one in which the listeners
(but not the algorithm) determined the musical key. Finally, the fourth context is one in which neither
algorithm nor listeners determined the correct key.
for key-finding models and has been identified as a potentially serious weakness
for the KrumhanslSchmuckler algorithm (Shmulevich & Yli-Harja, 2000;
Temperley, 1999).
Two studies looked in detail at the perception of key movement in Chopins
E minor prelude (see Fig. 9), using the probe-tone methodology. In the first of
these studies, eight different probe positions were identified in this prelude.
Listeners heard the piece from the beginning up to these probe positions and then
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Correlations between listeners probe-tone ratings and the algorithms key predictions
with the intended key for the initial segments of preludes by Bach and Chopin
C major
C minor
C/D major
C/D minor
D major
D minor
D/E major
D/E minor
E major
E minor
F major
F minor
F/G major
F/G minor
G major
G minor
G/A major
G/A minor
A major
A minor
A/B major
A/B minor
B major
B minor
*p < .05
rated the various probe tones. By playing the different contexts sequentially (i.e.,
hearing all context and probe pairings up to probe position 1 before probe position 2, and so on; see Schmuckler, 1989, for a fuller discussion), perceived tonality as it unfolded throughout the entire composition was assessed. The second
study in this pair examined whether or not the percept of tonality is a local phenomenon, based on the immediately preceding musical context, or is more global
in nature, taking into account the entire history of previous musical events.
Because the data from these studies are complex, space limitations preclude
a detailed presentation of the findings from this work. However, the results of
these studies did reveal that the key-finding algorithm was generally successful
at modeling listeners tonal percepts across the length of the prelude and thus can
be used to track tonal modulation. Moreover, these studies indicated that the tonal
implications of the piece are a relatively localized phenomenon, based primarily
on the immediately preceding (and subsequent) pitch material; this result has been
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Ecological Psychoacoustics
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
C c C# c# D d D# d# E e F f F# f# G g G# g# A a A# a# B b
The relation between the algorithm and the listeners abilities to determine the tonality of the Chopin preludes.
both suggested and modeled by others (Shmulevich & Yli-Harja, 2000). Overall,
these results provide a nice complement to the third application of the
KrumhanslSchmuckler algorithm (see Krumhansl, 1990) in which the keyfinding algorithm successfully modeled expert music theorists judgments of tonal
strength throughout a complete Bach prelude.
Summary of Tonality and Pitch Organization
The findings reviewed in the preceding sections highlight the critical role of
tonality as an organizing principle of pitch materials in Western tonal music.
Although other approaches to pitch organization, such as serial pattern and linguistic models, underscore important structural relations between pitches, none
of them provides as fundamental an organizing basis as that of tonality. And, in
fact, the importance of tonality often underlies these other approaches. Serial
pattern models (e.g., Deutsch & Feroe, 1981), for example, assume the existence
of a special alphabet of pitch materials to which the rules for creating well-formed
patterns are applied. This alphabet, however, is often defined in terms of tonal
sets (e.g., a diatonic scale), thus implicitly building tonality into the serial patterns. Similarly, many of the bases of the grouping preference rules central to the
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Chopins E minor prelude, opus 28. The eight probe positions are indicated by the
marking PP.
operation of the hierarchical reductions that are the goal of linguistic-type analyses (e.g., Lerdahl & Jackendoff, 1983) are based on notions of tonal movement
and stability, with less tonally important events resolving, or hierarchically subordinate to, more tonally important events.
Thus, tonality plays a fundamental role in the perception and organization of
pitch information. Given its ubiquity in Western music, it might even be reasonable to propose tonalitys pattern of hierarchical relations as a form of invariant
structure to which listeners are sensitive to varying degrees; this point is returned
to in the final discussion. Right now, however, it is instructive to turn to consideration of the organization of pitch in time.
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Pitch events are not only organized in frequency space, they are organized in
time as well. Unlike investigations of frequency, however, relatively little work
has examined the organization of pitch in time. Partly this is due to a different
emphasis in the temporal domain of music. Traditionally, concern with musical
organizations in time has described the metrical structure of musical events (e.g.,
Benjamin, 1984; Cooper & Meyer, 1960; Large & Palmer, 2002; Lerdahl &
Jackendoff, 1983; Lewin, 1984); accordingly, little attention has focused explicitly on how pitch events are organized temporally.
Work that has looked at this issue has tended to examine relations between
pairs of tones, investigating questions such as what tone is likely to follow another
given a particular event and so on. Examples of this approach can be found in
early work on information theory (Cohen, 1962; Coons & Kraehenbuehl, 1958;
Knopoff & Hutchinson, 1981, 1983; Kraehenbuehl & Coons, 1959; Youngblood,
1958) and persist to this day in work on musical expectancy (Carlsen, 1981, 1982;
Carlsen, Divenyi, & Taylor, 1970; Krumhansl, 1995; Schellenberg, 1996, 1997;
Unyk & Carlsen, 1987). The examinations of Narmours implication-realization
model (Cuddy & Lunney, 1995; Krumhansl, 1995; Narmour, 1989, 1990, 1992;
Schellenberg, 1996, 1997; Schmuckler, 1989, 1990), for example, have been most
successful in explaining expectancy relations for single, subsequent events.
One perspective that has tackled the organization of pitch in time for more
extended pitch sequences has focused on the role of melodic contour in musical
perception and memory. Contour refers to the relative pattern of ups and downs
in pitch through the course of a melodic event. Contour is considered to be one
of the most fundamental components of musical pitch information (e.g., Deutsch,
1969; Dowling, 1978) and is the aspect of pitch structure most easily accessible
to subjects without formal musical training.
Given its importance, it is surprising that few quantitative theories of contour
structure have been offered. Such models would be invaluable, both as aids in
music-theoretic analyses of melodic materials and for modeling listeners perceptions of and memory for musical passages. Fortunately, the past few years
have witnessed a change in this state of affairs. Two different approaches to
musical contour, deriving from music-theoretic and psychological frameworks,
have been advanced to explain contour structure and its perception. These two
approaches will be considered in turn.
Music-Theoretic Contour Models
Music-theoretic work has attempted to derive structural descriptions of melodic
contour based on the relative patterning of pitch differences between notes in
melodic passages (Friedmann, 1985, 1987; Marvin & Laprade, 1987; Quinn,
1999). Early work on this topic (Friedmann, 1985, 1987; Marvin & Laprade, 1987)
proposed a set of tools to be used in quantifying the contour of short melodic
patterns. These tools summarized the direction, and in some cases the size, of
relative pitch differences between all pairs of adjacent and nonadjacent tones
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
(called an interval in musical terms) within a pattern (see Schmuckler, 1999, for
an in-depth description of these tools). These summarized interval descriptions
could then be used to identify similarity relations between contours, with these
relations presumably underlying perceived contour similarity. Although these
authors do not provide direct psychological tests of their approaches, this work
does convincingly demonstrate the efficacy of these models in music-theoretic
analyses of short, 20th century atonal melodies.
This approach has culminated in a model by Quinn (1999) based again on the
pitch relations between adjacent and nonadjacent tones in a melody. Extending
the previous work, though, Quinn uses these tools to derive predicted similarity
relations for a set of seven-note melodies and then explicitly tests these predictions by looking at listeners contour similarity ratings. In this work Quinn (1999)
finds general support for the proposed contour model, with similarity driven primarily by contour relations between adjacent (i.e., temporally successive) tones
and to a lesser extent by contour relations between nonadjacent tones.
Psychological Contour Models
Although the previous models clearly capture a sense of contour and can
predict perceived similarity, they are limited in that they neglect an intuitively
critical aspect of melodic contournamely, the contours overall shape. In an
attempt to characterize this aspect of contour, Schmuckler (1999) proposed a
contour description based on time series, and specifically Fourier analyses, of
contour. Such analyses provide a powerful tool for describing contour in their
quantification of different cyclical patterns within a signal. By considering the
relative strengths of some or all of these cycles, such analyses thus describe the
repetitive, up-and-down nature of the melody, simultaneously taking into account
slow-moving, low-frequency pitch changes as well as high-frequency, point-topoint pitch fluctuation.
Schmuckler (1999) initially tested the applicability of time series analyses as
a descriptor of melodic contour, looking at the prediction of perceived contour
similarity. In two experiments listeners heard 12-note melodies, with these stimuli
drawn either from the prime form of well-known 20th century pieces (Experiment 1) or from simple, tonal patterns (Experiment 2); samples of these melodies
appear in Fig. 10. In both experiments listeners rated the complexity of each
contour, and from these complexity ratings derived similarity measures were calculated. Although an indirect index, such measures do provide a reliable quantification of similarity (Kruskal & Wish, 1978; Wish & Carroll, 1974).
The derived similarity ratings were then compared with different models of
contour similarity based on the time series and music-theoretic approaches. For
the music-theoretic models, contour similarity was based on the degree of overlap
between contours in their interval content and the overlap of short contour subsegments (e.g., sequences of 3, 4, 5, and so on notes). For the time series model,
all contours were Fourier analyzed, with similarity determined by correspondences between melodies in the strength of each cyclical component (the amplitude spectra) and by the starting position in the cycle of each component (the
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Experiment 1 Melodies
Experiment 2 Melodies
The lack of consistency for phase spectra is intriguing, in that one of the differences between the
two studies was that the stimuli of Experiment 2 were constructed specifically to contain different
phase spectra similarities between melodies. Hence, this study suggests that phase information can
be used by listeners when melodies explicitly highlight such phase relations. Whether listeners are
sensitive to such information when it is not highlighted, however, is questionable.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
and as an aside, neither experiment found any influence of the tonality of the
melody on similarity. Although a lack of tonal effects seemingly belies the spirit
of the previous section, this result is understandable in that both studies actively
worked to ameliorate influences of tonality by randomly transposing melodies to
different keys on every trial.
Although supporting the Fourier analysis model, this work has some important limitations. First, and most significantly, the stimuli used in this work were
highly specialized, schematized melodies of equal length that contained no
rhythmic variation. To be useful as a model of melodic perception, however, this
approach must be applicable to melodies of differing lengths and note durations.
Second, there is the issue of the use of the derived similarity measure. Although
such a measure is psychometrically viable, it would be reassuring if such results
could be obtained with a more direct measure of perceived similarity. Moreover,
it is possible that even stronger predictive power might be seen if a direct measure
were employed.
Research has addressed these questions, employing naturalistic folk melodies
as stimuli and using a direct similarity rating procedure. Using melodies such as
shown in Fig. 11, listeners rated the perceived similarity of pairs of melodies,
with these ratings then compared with predicted similarity based on the Fourier
analysis model. These melodies were coded for the Fourier analysis in two
formats. The first, or nondurational coding, coded the relative pitch events in 0n
format (with 0 given to the lowest pitch event in the sequence and n equal to the
number of distinct pitches), ignoring the different durations of the notes. This
coding essentially makes the sequence equitemporal for the purpose of contour
analysis. The second format, or durational coding, weights each element by its
duration. Thus, if the first and highest pitch event were 3 beats, followed by the
lowest pitch event for 1 beat, followed by the middle pitch event for 2 beats, the
code for this contour would be 2 2 2 0 1 1; these two types of codings also
appear in Fig. 11. These contour codes were Fourier analyzed, with correspondences in the resulting amplitude and phase spectra used to predict similarity.4
Figure 12 shows the results of a Fourier analysis of the nondurational coding of
two of these contexts.
Generally, the findings of this study confirmed those of Schmuckler (1999),
with contour similarity based on corresponding amplitude (but not phase) spectra
predicting listeners perceived similarity. Interestingly, this study failed to find
differences in predictive power between the two forms of contour coding. Such
a finding suggests that it is not the correspondence of cyclical patterns in absolute
time that is critical (as would be captured in the durational but not the nondurational code), but rather it is the cyclical patterns over the course of the melody
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
1 2 3
2 2 321 2 3
3 2 2 1 2 32 1 0 1
Durational 1122333333332222111122222222332211223333333322221122332211001111
Durational 2223444322210000222344455422333322234443222100002224444554223333
1 2 3 3 11 00 12 33 3 121 2 3 3 1100 13 3 3 2 21
Durational 1112333311110000111233333331222211123333111100001112333322221111
Durational 0001222222203333333222222220111100012222222033345555444322110000
A sample set of folk melodies used in the direct contour similarity study. Also
shown are the nondurational and duration contour codings. For both codings, contours are coded for
analysis in 0n format, with n equal to the number of distinct pitches in the contour.
itself (which is captured equally well by both coding systems) regardless of the
absolute length of the melody that is important. One implication here is that a
very short and a very long melody having comparable contours will be heard as
similar despite the differences in their lengths. Such a finding has not been explicitly tested to date but, if true, implies that the melodies could be represented in
a somewhat abstract fashion in which only relative timing between events is
retained. This idea, in fact, fits well with the intuition that a melody retains its
essential properties irrespective of its time scale, with some obvious limits at the
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
1 2 3
2 2 321 2 3
3 2 2 1 2 32 1 0 1
FIGURE 12 Fourier analysis results for two of the sample contours. This table lists the real
(Am), imaginary (Bm), amplitude (Rm), and phase (Mm) values for each cyclical component of the
contour. Similarity between contours can be assessed by correlating amplitude and phase values, calculating absolute difference scores, and so on; see Schmuckler (1999) for a fuller description.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
One concern raised initially was that, although pitch organization in space and
time could be discussed independently, this division is nevertheless forced. ObviEBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
dimensions in perceiving single tones (e.g., Melara & Marks, 1990); in this case,
however, the question involves the primacy, or integrality versus separability, of
large-scale musical dimensions.
Finally, it is possible to determine how these dimensions function in different
musical contexts. What are the relative roles of tonality and contour in the
perceptual organization of passages, or in auditory object formation, versus, say,
the perceived similarity of musical passages? What are their roles in driving
expectancies or in structuring memory for passages? Clearly, integrating these
two sources of pitch structure provides a new, and potentially powerful, means
for understanding auditory and musical events.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
perceptual apparatus itself (Gibson, 1966) as well as how one describes the environment with reference to the capabilities of the animal.
Along with the contextual nature of pitch perception, there has also been an
emphasis on the perceptual structures formed as a result of the operation of perceptual constraints (e.g., Balzano, 1986). Such constraints may, in fact, be operating to create different invariant structures critical for auditory and musical
processing, with the question then centered around listeners sensitivity to such
invariant information. This review has identified at least two different constraints
in the perception of pitchthat of tonality5 and that of contour. Whether or not
these two aspects are truly ecological invariants, in the same way that the cardiodal strain transformation is for the perception of aging (e.g., Pittenger, Shaw,
& Mark, 1979; Shaw, McIntyre, & Mace, 1974; Shaw & Pittenger, 1977) or the
cross ratio is for perceiving rigidity (e.g., Cutting, 1986; Gibson, 1950; Johansson, von Hofsten, & Jansson, 1980), is an open, and ultimately metatheoretical
question. For the moment it is sufficient to recognize the potential role of such
abstractions as general organizing principles for the apprehension of musical
A final implication of much of the material discussed here, and clearly the
ideas presented in the sections on pitch structures, is that much of this work actually provides support for some of Gibsons more radical ideas, namely the argument for direct perception and for nave realism (Cutting, 1986; Gibson, 1967,
1972, 1973; Lombardo, 1987). Although these notions have a variety of meanings (see Cutting, 1986, or Lombardo, 1987, for discussions), some underlying
themes that arise from these ideas are that perception is not mediated by inference or other cognitive processes and that perception is essentially veridical.
Although it might, at first blush, appear somewhat odd and contradictory, both
the key-finding and contour models fit well with these ideas as they place the critical information for the detection of these structures within the stimulus patterns
themselves (e.g., patterns of relative durations of notes and cyclical patterns of
rises and falls in pitch), without the need for cognitively mediating mechanisms
to apprehend these structures.
Music cognition research in general has been quite concerned with specifying
the information that is available in the stimulus itself, often employing sophistical analytic procedures in an attempt to quantify this structure (e.g., Huron, 1995,
1997, 1999). Moreover, music cognition work has benefited greatly from a close
association with its allied disciplines of musicology and music theory, which have
provided a continuing source of sophisticated, well-developed ideas about the
structure that is actually present in the musical stimulus itself. Intriguingly,
One concern with the idea of tonality as an invariant is that the details of tonal hierarchies vary
from culture to culture. In this regard, it is reassuring to see the results of cross-cultural investigations of tonality (e.g., Castellano et al., 1984; Kessler et al., 1984, reviewed earlier) reaffirm the
general importance of hierarchical organization in perceptual processing of musical passages, irrespective of the specific details of the composition of the hierarchy.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
despite the fact that music research spends an inordinate amount of time and
energy analyzing and specifying its stimulus structure, work in this vein has not,
as with the ecological approach, seen fit to deny either the existence or importance of internal representation. One implication of such a situation, and in a deviation away from traditional orthodoxy in the ecological approach, is that it is at
least conceivable to suggest that simply because the perceptual apprehension of
complex stimulus structure might not require the use of internal representation
does not mean that such representations either do not exist or have no place in
perceptual and cognitive processing. Determining the form of such representations, and the situations in which the information contained within such structures may come into play, is a worthwhile goal.
Of course, the preceding discussions really only scratch the surface of an
attempt to delineate an ecological approach to pitch perception. Audition is typically the weaker sibling in discussions of perception, and hence theoretical innovations that have been worked out for other areas (i.e., vision) are often not easily
or obviously transplantable to new contexts. One possible avenue for future
thought in this regard is to turn ones attention to how ecological theory might
be expanded to incorporate the experience of hearing music, as opposed to the
reinterpretation (and sometimes deformation) of musical experience and research
to fit into the existing tenets of the ecological approach. How, then, might ecological psychology be modified, extended, or even reformulated to be more relevant to the very cognitive (and ecologically valid) behavior of music listening?
The hope, of course, is that extending the scope of both contexts will ultimately
provide greater insights into the theoretical framework of interest (the ecological
perspective) as well as the particular content area at hand (auditory perception).
Much of the work described in this chapter and preparation of the manuscript were supported by
a grant from the Natural Sciences and Engineering Research Council of Canada to the author. The
author would like to thank Katalin Dzinas for many helpful discussions about this chapter and Carol
Krumhansl and John Neuhoff for their insightful comments and suggestions on an earlier draft of
this work.
Adolph, K. E., Eppler, M. A., & Gibson, E. J. (1993). Development of perception of affordances. In
C. Rovee-Collier and L. P. Lipsett (Eds.), Advances in infancy research (Vol. 8, pp. 5198).
Norwood: Ablex.
Balzano, G. J. (1986). Music perception as detection of pitch-time constraints. In V. McCabe and
G. J. Balzano (Eds.), Event cognition: An ecological perspective (pp. 217233). Mahweh, NJ:
Lawrence Erlbaum Associates.
Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology, 41,
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Dawe, L. A., Platt, J. R., & Welsh, E. (1998). Spectral-motion aftereffects and the tritone paradox
among Canadian subjects. Perception and Psychophysics, 60, 209220.
Delige, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoffs grouping preference rules. Music Perception, 4, 325360.
Deutsch, D. (1969). Music recognition. Psychological Review, 76, 300307.
Deutsch, D. (1972a). Effect of repetition of standard and comparison tones on recognition memory
for pitch. Journal of Experimental Psychology, 93, 156162.
Deutsch, D. (1972b). Mapping of pitch interactions in the pitch memory store. Science, 175,
Deutsch, D. (1972c). Octave generalization and tune recognition. Perception and Psychophysics, 11,
Deutsch, D. (1973a). Interference in memory between tones adjacent in the musical scale. Journal of
Experimental Psychology, 100, 228231.
Deutsch, D. (1973b). Octave generalization of specific interference effects in memory for tonal pitch.
Perception and Psychophysics, 11, 411412.
Deutsch, D. (1980). The processing of structured and unstructured tonal sequences. Perception and
Psychophysics, 28, 381389.
Deutsch, D. (1982a). The influence of melodic context on pitch recognition judgment. Perception and
Psychophysics, 31(407410).
Deutsch, D. (1982b). The processing of pitch combinations. In D. Deutsch (Ed.), Psychology of music
(pp. 271316). New York: Academic Press.
Deutsch, D. (1986a). A musical paradox. Music Perception, 3, 27280.
Deutsch, D. (1986b). Recognition of durations embedded in temporal patterns. Perception and
Psychophysics, 39, 179186.
Deutsch, D. (1987). The tritone paradox: Effects of spectral variables. Perception and Psychophysics,
41, 563575.
Deutsch, D. (1991). The tritone paradox: An influence of language on music perception. Music Perception, 8, 335347.
Deutsch, D. (1994). The tritone paradox: Some further geographical correlates. Music Perception, 12,
Deutsch, D. (1997). The tritone paradox: A link between music and speech. Current Directions in
Psychological Science, 6, 174180.
Deutsch, D. (Ed.). (1999). The psychology of music (2nd ed.). San Diego: Academic Press.
Deutsch, D., & Feroe, J. (1981). The internal representation of pitch sequences in tonal music. Psychological Review, 88, 503522.
Deutsch, D., Kuyper, W. L., & Fisher, Y. (1987). The tritone paradox: Its presence and form of distribution in a general population. Music Perception, 5, 7992.
Deutsch, D., North, T., & Ray, L. (1990). The tritone paradox: Correlate with the listeners vocal
range for speech. Music Perception, 7, 371384.
Dewar, K. M., Cuddy, L. L., & Mewhort, D. J. (1977). Recognition memory for single tones with
and without context. Journal of Experimental Psychology: Human Learning and Memory, 3,
Dibben, N. (1994). The cognitive reality of hierarchic structure in tonal and atonal music. Music
Perception, 12, 126.
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies.
Psychological Review, 85, 341354.
Dowling, W. J. (2001). Perception of music. In E. B. Goldstein (Ed.), Blackwell handbook of
perception (pp. 469498). Oxford: Blackwell Publishers.
Dowling, W. J., & Harwood, D. (1986). Music cognition. Orlando, FL: Academic Press.
Dowling, W. J., & Hollombe, A. W. (1977). The perception of melodies distorted by splitting into
octaves: Effects of increasing proximity and melodic contour. Perception and Psychophysics, 21,
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation. Journal
of the Acoustical Society of America, 5, 82108.
Fowler, C. A., & Dekle, D. J. (1991). Listening with eye and hand: Cross-modal contributions to
speech perception. Journal of Experimental Psychology: Human Perception and Performance,
17, 816828.
Fowler, C. A., & Rosenblum, L. D. (1991). Perception of the phonetic gesture. In I. G. Mattingly and
M. Studdert-Kennedy (Eds.), Modularity and the motor theory (pp. 3350). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Frankland, B. W., & Cohen, A. J. (1996). Using the Krumhansl and Schmuckler key-finding algorithm to quantify the effects of tonality in the interpolated-tone pitch-comparison task. Music Perception, 14, 5783.
Friedmann, M. L. (1985). A methodology for the discussion of contour: Its application to Schoenbergs music. Journal of Music Theory, 29, 223248.
Friedmann, M. L. (1987). My contour, their contour. Journal of Music Theory, 31, 268274.
Giangrande, J. (1998). The tritone paradox: Effects of pitch class and position of the spectral envelope. Music Perception, 15, 23264.
Gibson, E. J. (1969). Principles of perceptual learning and development. New York: Appleton Century
Gibson, E. J. (1982). The concept of affordances in development: The renascence of functionalism.
In W. A. Collins (Ed.), The concept of development: The Minnesota symposia on child psychology (Vol. 15, pp. 5581). Hillsdale, NJ: Lawrence Erlbaum Associates.
Gibson, E. J. (1984). Perceptual development from the ecological approach. In M. E. Lamb, A. L.
Brown, and B. Rogoff (Eds.), Advances in developmental psychology (vol. 3, pp. 243286). Hillsdale, NJ: Lawrence Erlbaum Associates.
Gibson, E. J., & Pick, A. D. (2000). An ecological approach to perceptual learning and development.
New York: Oxford University Press.
Gibson, E. J., & Spelke, E. S. (1983). The development of perception. In J. H. Flavell and E. M.
Markman (Eds.), Handbook of child psychology: Vol. III, Cognitive development (pp. 176). New
York: John Wiley & Sons.
Gibson, J. J. (1950). The perception of the visual world. Boston: Houghton Mifflin.
Gibson, J. J. (1960). The concept of the stimulus in psychology. American Psychologist, 15, 694703.
Gibson, J. J. (1961). Ecological optics. Vision Research, 1, 253262.
Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin.
Gibson, J. J. (1967). New reasons for realism. Synthese, 17, 162172.
Gibson, J. J. (1972). A theory of direct visual perception. In J. R. Royce and W. W. Rozeboom (Eds.),
The psychology of knowing (pp. 215240). New York: Gordon & Breach.
Gibson, J. J. (1973). Direct visual perception: A rely to Gyr. Psychological Bulletin, 79, 396397.
Gibson, J. J. (1975). Events are perceivable but time is not. In J. T. Fraser and N. Lawrence (Eds.),
The study of time, II (pp. 295301). New York: Springer-Verlag.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.
Grau, W. J., & Kemler Nelson, D. G. (1988). The distinction between integral and separable dimensions: Evidence for the integrality of pitch and loudness. Journal of Experimental Psychology:
Human Perception and Performance, 117, 347370.
Handel, S. (1989). Listening: An introduction to the perception of auditory events. Cambridge, MA:
MIT Press.
Hartmann, W. M. (1996). Pitch, periodicity, and auditory organization. Journal of the Acoustical
Society of America, 100, 34913502.
Helmholtz, H. L. F. (1885/1954). On the sensations of tone. New York: Dover Publications.
Hirschberg, J. (2002). Communication and prosody: Functional aspects of prosody. Speech Communication, 36, 3143.
House, W. J. (1977). Octave generalization and the identification of distorted melodies. Perception
and Psychophysics, 21, 586589.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Huron, D. (1995). The Humdrum toolkit: Reference manual. Stanford, CA: Center for Computer
Assisted Research in the Humanities.
Huron, D. (1997). Humdrum and Kern: Selective feature encoding. In E. Selfridge-Field (Ed.), Beyond
MIDI: The handbook of musical codes (pp. 375401). Cambridge, MA: MIT Press.
Huron, D. (1999). Music research using Humdrum: A users guide. Stanford, CA: Center for Computer Assisted Research in the Humanities.
Huron, D., & Parncutt, R. (1993). An improved model of tonality perception incorporating pitch
salience and echoic memory. Psychomusicology, 12, 154171.
Idson, W. L., & Massaro, D. W. (1978). A bidimensional model of pitch in the recognition of melodies.
Perception and Psychophysics, 24, 551565.
Irwin, R. J., & Terman, M. (1970). Detection of brief tones in noise by rats. Journal of the Experimental Analysis of Behavior, 13, 135143.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201211.
Johansson, G. (1975). Visual motion perception. Scientific American, 232, 7689.
Johansson, G., von Hofsten, C., & Jansson, G. (1980). Event perception. Annual Review of Psychology, 31, 2766.
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and
memory. Psychological Review, 83, 323345.
Jones, M. R. (1981). A tutorial on some issues and methods in serial pattern research. Perception and
Psychophysics, 30, 492504.
Jones, M. R. (1993). Dynamics of musical patterns: How do melody and rhythm fit together In T.
J. Tighe and W. J. Dowling (Eds.), Psychology and music: The understanding of melody and
rhythm (pp. 6792). Hillsdale, NJ: Lawrence Erlbaum Associates.
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review,
96, 459491.
Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context. Perception and Psychophysics, 32, 211218.
Jones, M. R., Boltz, M., & Klein, J. M. (1993). Expected endings and judged durations. Memory and
Cognition, 21, 646665.
Jones, M. R., & Hahn, J. (1986). Invariants in sound. In V. McCabe, and G. J. Balzano (Eds.), Event
cognition: An ecological perspective (pp. 197215). Mahweh, NJ: Lawrence Erlbaum
Jones, M. R., Maser, D. J., & Kidd, G. R. (1978). Rate and structure in memory for auditory patterns.
Memory and Cognition, 6, 246258.
Justus, T. C., & Bharucha, J. J. (2002). Music perception and cognition. In S. Yantis (Ed.), Stevens
handbook of experimental psychology (3rd ed., vol. 1: Sensation and perception, pp. 453492).
New York: John Wiley & Sons.
Kallman, H. J. (1982). Octave equivalence as measured by similarity ratings. Perception and Psychophysics, 32, 3749.
Kallman, H. J., & Massaro, D. W. (1979). Tone chroma is functional in melody recognition. Perception and Psychophysics, 26, 3236.
Kessler, E. J., Hansen, C., & Shepard, R. N. (1984). Tonal schemata in the perception of music in
Bali and in the West. Music Perception, 2, 131165.
Knopoff, L., & Hutchinson, W. (1981). Information theory for musical continua. Journal of Music
Theory, 25, 1744.
Knopoff, L., & Hutchinson, W. (1983). Entropy as a measure of style: The influence of sample length.
Journal of Music Theory, 27, 7597.
Kraehenbuehl, D., & Coons, E. (1959). Information as a measure of the experience of music. Journal
of Aesthetics and Art Criticism, 17, 510522.
Krumbholz, K., Patterson, R. D., & Pressnitzer, D. (2000). The lower limit of pitch as determined by
rate discrimination. Journal of the Acoustical Society of America, 108, 11701180.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Krumhansl, C. L. (1979). The psychological representation of musical pitch in a tonal context. Cognitive Psychology, 11, 346374.
Krumhansl, C. L. (1990). Cognitive foundation of musical pitch. London: Oxford University
Krumhansl, C. L. (1991). Music psychology: Tonal structures in perception and memory. Annual
Review of Psychology, 42, 277303.
Krumhansl, C. L. (1995). Music psychology and music theory: Problems and prospects. Music Theory
Spectrum, 17, 5380.
Krumhansl, C. L. (2000a). Rhythm and pitch in music cognition. Psychological Bulletin, 126,
Krumhansl, C. L. (2000b). Tonality induction: A statistical approach applied cross-culturally. Music
Perception, 17, 461480.
Krumhansl, C. L., & Iverson, P. (1992). Perceptual interactions between musical pitch and timbre.
Journal of Experimental Psychology: Human Perception and Performance, 18, 739751.
Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89, 334368.
Krumhansl, C. L., Sandell, G. J., & Sargeant, D. C. (1987). The perception of tone hierarchies and
mirror forms in twelve-tone serial music. Music Perception, 5, 3178.
Krumhansl, C. L., & Schmuckler, M. A. (1986a). Key-finding in music: An algorithm based on pattern
matching to tonal hierarchies. Paper presented at the 19th annual Mathematical Psychology
Meeting, Cambridge, MA.
Krumhansl, C. L., & Schmuckler, M. A. (1986b). The Petroushka chord: A perceptual investigation.
Music Perception, 4, 153184.
Krumhansl, C. L., & Shepard R. N. (1979). Quantification of the hierarchy of tonal functions within
a diatonic context. Journal of Experimental Psychology Human Perception & Performance, 5(4),
Nov 579594.
Krumhansl, C. L., & Toiviainen, P. (2000). Dynamics of tonality induction: A new method and a new
model. In C. Woods, B. B. Luck, R. Rochard, S. A. ONeil, and J. A. Sloboda (Eds.), Proceedings of the Sixth International Conference on Music Perception. Keele, Staffordshire, UK.
Krumhansl, C. L., & Toiviainen, P. (2001). Tonal cognition. Annals of the New York Academy of Sciences, 930, 7791.
Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverley Hills, CA: Sage Publications.
Ladd, D. R. (1996). Intonational phonology. Cambridge, England: Cambridge University Press.
Lantz, M. E. (2002). The role of duration and frequency of occurrence in perceived pitch structure.
Queens University, Kingston, ON, Canada.
Lantz, M. E., & Cuddy, L. L. (1998). Total and relative duration as cues to surface structure in music.
Canadian Acoustics, 26, 5657.
Large, E. W., & Jones, M. R. (2000). The dynamics of attending: How we track time varying events.
Psychological Review, 106, 119159.
Large, E. W., & Palmer, C. (2002). Perceiving temporal regularity in music. Cognitive Science, 26,
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press.
Lewin, D. (1984). On formal intervals between time-spans. Music Perception, 1, 414423.
Lombardo, T. J. (1987). The reciprocity of perceiver and environment: The evolution of James J.
Gibsons ecological psychology. Hillsdale, NJ: Lawrence Erlbaum Associates.
Marvin, E. W., & Laprade, P. A. (1987). Relating musical contours: Extensions of a theory for contour.
Journal of Music Theory, 31, 225267.
McAdams, S., & Bigand, E. (Eds.). (1993). Thinking in sound: The cognitive psychology of human
audition. Oxford, UK: Oxford University Press.
McAdams, S., & Drake, C. (2002). Auditory perception and cognition. In S. Yantis (Ed.), Stevens
handbook of experimental psychology (vol. 1: Sensation and perception, pp. 397452). New York:
John Wiley & Sons.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
McCabe, V. (1986). Memory for meaning: The ecological use of language. In V. McCabe, and G. J.
Balzano (Eds.), Event cognition: An ecological perspective (pp. 175191). Mahweh, NJ:
Lawrence Erlbaum Associates.
Melara, R. D., & Marks, L. E. (1990). Perceptual primacy of dimensions: Support for a model of
dimensional interaction. Journal of Experimental Psychology: Human Perception and Performance, 16, 398414.
Michaels, C. F., & Carello, C. (1981). Direct perception. Englewood Cliffs, NJ: Prentice-Hall.
Miller, G. A., & Heise, G. A. (1950). The trill threshold. Journal of the Acoustical Society of America,
22, 167173.
Moore, B. C. J. (1993). Frequency analysis and pitch perception. In W. A. Yost, A. N. Popper, and
R. R. Fay (Eds.), Human psychophysics (pp. 56115). New York: Springer-Verlag.
Moore, B. C. J. (2001a). Basic auditory processes. In E. B. Goldstein (Ed.), Blackwell handbook of
perception (pp. 379407). Oxford, UK: Blackwell Publishers.
Moore, B. C. J. (2001b). Loudness, pitch and timbre. In E. B. Goldstein (Ed.), Blackwell handbook
of perception (pp. 408436). Oxford, UK: Blackwell Publishing.
Nagel, H. N., Shapiro, L., & Nawy, R. (1994). Prosody and the processing of filler-gap dependencies. Journal of Psycholinguistic Research, 23, 473485.
Narmour, E. (1989). The genetic code of melody: Cognitive structures generated by the
implication-realization model. Contemporary Music Review, 4, 4563.
Narmour, E. (1990). The analysis and cognition of basic melodic structures. Chicago: University of
Chicago Press.
Narmour, E. (1992). The analysis and cognition of melodic complexity. Chicago: University of
Chicago Press.
Olsho, L. W., Schoon, C., Sakai, R., Turpin, R., & Sperduto, V. (1982). Preliminary data on auditory
frequency discrimination. Journal of the Acoustical Society of America, 71, 509511.
Oram, N., & Cuddy, L. L. (1995). Responsiveness of Western adults to pitch-distributional information in melodic sequences. Psychological Research, 57, 103118.
Palmer, C., & Krumhansl, C. L. (1987a). Independent temporal and pitch structures in determination
of musical phrases. Journal of Experimental Psychology: Human Perception and Performance,
13, 116126.
Palmer, C., & Krumhansl, C. L. (1987b). Pitch and temporal contributions to musical phrases: Effects
of harmony, performance timing, and familiarity. Perception and Psychophysics, 51, 505518.
Patterson, R. D., Peters, R. W., & Milroy, R. (1983). Threshold duration for melodic pitch. In W.
Klinke, and W. M. Hartmann (Eds.), Hearing: Physiological bases and psychophysics (pp.
321325). Berlin: Springer-Verlag.
Pierce, J. R. (1983). The science of musical sound. New York: Scientific American Books.
Pitt, M. A. (1994). Perception of pitch and timbre by musically trained and untrained listeners. Journal
of Experimental Psychology: Human Perception and Performance, 20, 976986.
Pittenger, J. B., Shaw, R. E., & Mark, L. S. (1979). Perceptual information for the age level of faces
as a higher-order invariant of growth. Journal of Experimental Psychology: Human Perception
and Performance, 5, 478493.
Platt, J. R., & Racine, R. J. (1985). Effect of frequency, timbre, experience, and feedback on musical
tuning skills. Perception and Psychophysics, 38, 345353.
Plomp, R. (1967). Pitch of complex tones. Journal of the Acoustical Society of America, 41,
Preisler, A. (1993). The influence of spectral composition of complex tones and of musical experience on the perceptibility of virtual pitch. Perception and Psychophysics, 54, 589603.
Pressnitzer, D., Patterson, R. D., & Krumbholz, K. (2001). The lower limit of melodic pitch. Journal
of the Acoustical Society of America, 109, 20742084.
Quinn, I. (1999). The combinatorial model of pitch contour. Music Perception, 16, 439456.
Ralston, J. V., & Herman, L. M. (1995). Perception and generalization of frequency contours by a
bottlenose dolphin (Tursiops truncatus). Journal of Comparative Psychology, 109, 268277.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ranvaud, R., Thompson, W. F., Silveira-Moriyama, L., & Balkwill, L.-L. (2001). The speed of pitch
resolution in a musical context. Journal of the Acoustical Society of America, 109, 30213030.
Rasch, R., & Plomp, R. (1999). The perception of musical tones. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 89112). San Diego, CA: Academic Press.
Ritsma, R. J. (1967). Frequencies dominant in the perception of the pitch of complex sounds. Journal
of the Acoustical Society of America, 42, 191198.
Robinson, K., & Patterson, R. D. (1995). The duration required to identify the instrument, the octave,
or the pitch chroma of a musical note. Music Perception, 13, 115.
Rosch, E. (1975). Cognitive reference points. Cognitive Psychology, 7, 532547.
Rossing, T. D. (1982). The science of sound. Reading, MA: Addison-Wesley.
Schellenberg, E. G. (1996). Expectancy in melody: Tests of the implication-realization model. Cognition, 58, 75125.
Schellenberg, E. G. (1997). Simplifying the implication-realization model of musical expectancy.
Music Perception, 14, 29452318.
Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes.
Music Perception, 7, 109150.
Schmuckler, M. A. (1990). The performance of global expectations. Psychomusicology, 9, 122147.
Schmuckler, M. A. (1999). Testing models of melodic contour similarity. Music Perception, 16,
Schmuckler, M. A. (2001). What is ecological validity? A dimensional analysis. Infancy, 2, 419436.
Schmuckler, M. A., & Tomovski, R. (1997, November). Perceptual tests of musical key-finding. Paper
presented at the 38th annual meeting of the Psychonomic Society, Philadelphia.
Schmuckler, M. A., & Tomovski, R. (2002a). Perceptual tests of an algorithm for musical key-finding.
Unpublished manuscript.
Schmuckler, M. A., & Tomovski, R. (2002b). Tonal hierarchies and rare intervals in musical keyfinding. Unpublished manuscript.
Seebeck, A. (1843). ber die Sirene. Annals of Physical Chemistry, 60, 449481.
Selkirk, E. O. (1984). Phonology and syntax: The relation between sound and structure. Cambridge,
MA: MIT Press.
Semal, C., & Demany, L. (1991). Dissociation of pitch from timbre in auditory short-term memory.
Journal of the Acoustical Society of America, 89, 24042410.
Semal, C., & Demany, L. (1993). Further evidence for an autonomous processing of pitch in auditory short-term memory. Journal of the Acoustical Society of America, 94, 13151322.
Sergeant, D. (1982). Octave generalization in young children. Early Child Development and Care, 9,
Sergeant, D. (1983). The octave: Percept or concept. Psychology of Music, 11, 318.
Shaw, R. E., McIntyre, M., & Mace, W. (1974). The role of symmetry in event perception. In R. B.
MacLeod, and H. L. Pick, Jr. (Eds.), Perception: Essays in honor of James J. Gibson (pp.
279310). Ithaca, NY: Cornell University Press.
Shaw, R. E., & Pittenger, J. B. (1977). Perceiving the face of change in changing faces: Implications
for a theory of object recognition. In R. E. Shaw, and J. Bransford (Eds.), Perceiving, acting and
knowing: Toward an ecological psychology (pp. 103132). Hillsdale, NJ: Lawrence Erlbaum
Shepard, R. N. (1964). Circularity in judgments of relative pitch. Journal of the Acoustical Society
of America, 36, 23462353.
Shepard, R. N. (1982a). Geometrical approximations to the structure of musical pitch. Psychological
Review, 89, 305333.
Shepard, R. N. (1982b). Structural representations of musical pitch. In D. Deutsch (Ed.), The psychology of music (pp. 343390). San Diego: Academic Press.
Shepard, R. N. (1999a). Cognitive psychology and music. In P. R. Cook (Ed.), Music, cognition, and
computerized sound: An introduction to psychoacoustics (pp. 2135). Cambridge, MA: MIT
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Shepard, R. N. (1999b). Pitch perception and measurement. In P. R. Cook (Ed.), Music, cognition,
and computerized sound (pp. 149165). Cambridge, MA: MIT Press.
Shepard, R. N. (1999c). Tonal structure and scales. In P. R. Cook (Ed.), Music, cognition, and computerized sound: An introduction to psychoacoustics (pp. 187194). Cambridge, MA: MIT Press.
Shmulevich, I., & Yli-Harja, O. (2000). Localized key finding: Algorithms and applications. Music
Perception, 17, 531544.
Siegel, R. J. (1965). A replication of the mel scale of pitch. American Journal of Psychology, 78,
Simon, H. A. (1972). Complexity and the representation of patterned sequences of symbols. Psychological Review, 79, 369382.
Simon, H. A., & Kotovsky, H. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70, 534546.
Simon, H. A., & Sumner, R. K. (1968). Pattern in music. In K. B. (Ed.), Formal representation of
human judgment (pp. 219250). New York: John Wiley & Sons.
Singh, P. G., & Hirsch, J. J. (1992). Influence of spectral locus and F0 changes on the pitch and timbre
of complex tones. Journal of the Acoustical Society of America, 92, 26502661.
Smith, N. A., & Schmuckler, M. A. (2000). Pitch-distributional effects on the perception of tonality.
In C. Woods, B. B. Luch, R. Rochard, S. A. ONei, and J. A. Sloboda (Eds.), Proceedings of the
Sixth International Conference on Music Perception and Cognition. Keele, Staffordshire, UK.
Smith, N. A., & Schmuckler, M. A. (2004). The perception of tonal structure through the differentiation and organization of pitches. Journal of Experimental Psychology: Human Perception & Performance, 30, 268286.
Stevens, S. S., & Volkmann, J. (1940). The relation of pitch to frequency: A revised scale. American
Journal of Psychology, 53, 329353.
Stevens, S. S., Volkmann, J., & Newman, E. B. (1937). A scale for the measurement of the psychological magnitude of pitch. Journal of the Acoustical Society of America, 8, 185190.
Straub, K., Wilson, C., McCollum, C., & Badecker, W. (2001). Prosodic structure and wh-questions.
Journal of Psycholinguistic Research, 30, 379394.
Suen, C. Y., & Beddoes, M. P. (1972). Discrimination of vowel sounds of very short duration. Perception and Psychophysics, 11, 417419.
Takeuchi, A. (1994). Maximum key-profile correlation (MKC) as a measure of tonal structure in
music. Perception and Psychophysics, 56, 335346.
Temperley, D. (1999). Whats key for key? The Krumhansl-Schmuckler key-finding algorithm reconsidered. Music Perception, 17, 65100.
Toiviainen, P., & Krumhansl, C. L. (2003). Measuring and modeling real-time response to music:
Tonality induction. Perception, 32(6), 741766.
Turvey, M. T. (1977). Preliminaries to a theory of action with reference to vision. In R. E. Shaw &
J. Bransford (Eds.), Perceiving, acting, and knowing (pp. 211265). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Turvey, M. T. (1990). The challenge of a physical account of action: A personal view. In H. T. A.
Whiting, O. G. Meijer, and P. C. Wieringen (Eds.), The natural-physical approach to movement
control (pp. 5793). Amsterdam: Free University Press.
Turvey, M. T., & Carello, C. (1986). The ecological approach to perceiving and acting: A pictorial
essay. Acta Psychologica, 63, 133155.
Turvey, M. T., & Shaw, R. E. (1979). The primacy of perceiving: An ecologic reformulation of perception for understanding memory. In L. G. Nilsson (Ed.) Perspectives on memory research (pp.
167189). Hillsdale, JN: Erlbaum.
Turvey, M. T., Carello, C., & Kim, N.-G. (1990). Links between active perception and the control of
action. In H. Haken & M. Stadler (Eds.), Synergetics of cognition (pp. 269295). New York:
Unyk, A. M., & Carlsen, J. C. (1987). The influence of expectancy on melodic perception. Psychomusicology, 7, 323.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Van den Toorn, P. C. (1983). The music of Igor Stravinsky. New Haven, CT: Yale University Press.
Van Egmond, R., & Butler, R. (1997). Diatonic connotations of pitch-class sets. Music Perception,
15, 131.
Van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Eindhoven
University of Technology, Eindhoven, The Netherlands.
Vos, P. G. (2000). Tonality induction: Theoretical problems and dilemmas. Music Perception, 17,
Vos, P. G., & Leman, M. (2000). Guest editorial: Tonality induction. Music Perception, 17, 401402.
Wagman, J. B., & Miller, D. B. (2003). Nested reciprocities: The organism-environment system in
perception-action development. Developmental Psychobiology, 42(4), 362367.
Wapnick, J., & Freeman, P. (1980). Effects of dark-bright timbral variation on the perception of flatness and sharpness. Journal of Research in Music Education, 28, 176184.
Warren, W. H. J., & Verbrugge, R. R. (1984). Auditory perception of breaking and bouncing events:
A case study in ecological acoustics. Journal of Experimental Psychology: Human Perception
and Performance, 10, 704712.
Warrier, C. M., & Zatorre, R. J. (2002). Influence of tonal context and timbral variation on perception of pitch. Perception and Psychophysics, 62, 198207.
Whitfield, I. C. (1979). Periodicity, pulse interval and pitch. Audiology, 18, 507517.
Wier, C. C., Jesteadt, W., & Green, D. M. (1977). Frequency discrimination as a function of frequency
and sensation level. Journal of the Acoustical Society of America, 61, 178184.
Wish, M., & Carroll, J. D. (1974). Applications of individual differences scaling to studies of human
perception and judgment. In E. C. Carterette, and M. P. Friedman (Eds.), Handbook of perception (vol. 2, pp. 449491). New York: Academic Press.
Wright, A. A., Rivera, J. J., Hulse, S. H., Shyan, M., & Neiworth, J. J. (2000). Music perception and
octave generalization in rhesus monkeys. Journal of Experimental Psychology: General, 129,
Youngblood, J. E. (1958). Style as information. Journal of Music Theory, 2, 2435.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Robert S. Schlauch
Loudness is the subjective magnitude of a sound. It is a concept that has
implicit meaning for nearly everyone, but a formal definition is the purview of
psychoacoustics, the field concerned with relating acoustics and perception.
Loudness has been a topic of considerable scientific inquiry since 1920, when a
renaissance in psychoacoustics followed the invention of the vacuum tube, a
device that enabled the accurate quantification of sound levels (Boring, 1942).
One goal of these early studies was to define the functional dependences among
loudness, intensity, duration, bandwidth, and frequency. Another goal of these
classic studies of loudness was to relate loudness to the underlying physiological processes responsible for perception. The classic approach continues today
as we discover more about the function of the cochlea and the central auditory
Throughout its long history, loudness has almost exclusively been explored
using simple stimulispecifically, stimuli that are simple in their temporal envelope. But listen to the sounds in your environment; virtually nothing we hear
outside the laboratory has the temporal characteristics of those laboratory stimuli.
What characterizes environmental sounds? It is their spectral and temporal complexity and their nearly constant mlange of auditory stimulation. Of course, prior
to the programmable laboratory computer and the capabilities of accurate analogto-digital and digital-to-analog converters, it was virtually impossible to explore
these issues rigorously, but now we can, and in the past 25 plus years we have
made only a modest amount of progress.
The goal of this chapter is to review studies of ecological loudnessthe relation between loudness and the naturally occurring events that they represent
and to contrast them with the more traditional studies. The topics addressing
Copyright 2004 by Academic Press.
rights of on
in any
form reserved.
Publishing : eBook Academic Collection (EBSCOhost) -Allprinted
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
issues in ecological loudness in this chapter are diverse, but so is the auditory
world. The areas of inquiry include loudness constancy, dynamically changing
sounds, source segregation, selective attention, the middle ear and the acoustic
reflex, and uncomfortably loud sounds. Before delving into these topics, I review
some methodological issues related to loudness measurement and the results of
some studies that attempt to relate loudness to the physiological mechanisms
responsible for its percept.
In this method, listeners are presented with two sounds in succession and are
asked to select the one that is louder. Multiple judgments using this method yield
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
useful but limited quantitative information about the relative loudness of sounds
(i.e., the percentage of time that one is judged louder than the other). The experimenter can order sounds according to their relative loudness (an ordinal scale),
but based on this type of measure one cannot conclude that one sound is twice
as loud or four times as loud as another sound. For such comparisons, one needs
to derive a scale of loudness using a method such as magnitude scaling.1
Magnitude scaling (Hellman & Zwislocki, 1963, 1964; Stevens, 1975), which
includes magnitude estimation and magnitude production, is the most widely
accepted method for measurement of loudness in psychoacoustical experiments.
In magnitude estimation, listeners assign numbers to match the perceived stimulus magnitude. In magnitude production, a listener adjusts a stimulus level to
produce a sensation equal in perceived magnitude to a stimulus number. Both of
these methods are sometimes combined to yield an estimate of a loudness function.2 The product of magnitude scaling, under optimal circumstances, is a ratio
scale of sensation.
Figure 1 illustrates loudness functions obtained using magnitude scaling for
long-duration tones at three frequencies. For tones in the most sensitive range of
hearing (0.5 to 10.0 kHz), loudness doubles for every 10-dB increase in the stimulus level once the level exceeds about 30 dB SPL. This region of the loudness
function, between 30 and 90 dB SPL in the most sensitive region of hearing, is
well described as a compressive function of intensity, a power function with an
exponent of 0.3 (Hellman, 1991; Hellman, Takeshima, Suzwki, Ozawa, & Sone,
2001). Hellman (1991) summarized 78 studies of magnitude estimation (ME)
for 1.0-kHz tones and found that the average exponent was 0.3 with a standard
deviation of 0.045.
Paired comparisons obtained with sounds that differ only in their level are used to obtain an estimate of the just-noticeable difference in intensity (JNDI). Fechner (1860) theorized that loudness
could be inferred by summing JNDI, the unit of sensation for his scale. A number of assumptions
were required to derive a loudness function from JNDI, and subsequent studies have proved that
Fechners theory for relating these measures was incorrect (e.g., Hellman, Scharf, Teghtsoonian, &
Teghtsoonian, 1987; Newman, 1933). Nonetheless, the appeal of an objective discrimination measure
has motivated researchers to pursue this approach to generating a loudness function for more than
140 years (e.g., Allen & Neely, 1997; Hellman & Hellman, 1990; Schlauch, Harvey, & Lanthier,
1995). Modern efforts to relate loudness and intensity discrimination still require assumptions (e.g.,
the variance of neural population response as a function of stimulus level), and the results are still
compared with loudness functions obtained using subjective methods, typically magnitude scaling
(Hellman & Hellman, 1990), a method that relates numbers to perceived magnitude to derive a scale
of sensation.
The combined methods of magnitude estimation and magnitude production are believed to reduce
potential judgment bias in the slope of the loudness function due to subjects shortening the metric
that they adjust (Stevens, 1972). This bias labeled the regression effect is well known, but many experimenters base their loudness measures on only magnitude estimation.
Ecological Psychoacoustics
Loudness in Loudness Units
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Sound-pressure Level in Decibels
Loudness functions for 100-, 250-, and 1000-Hz tones obtained with magnitude
scaling. Detection thresholds for the tones are denoted by crosses. (From Hellman & Zwislocki, 1968.)
The loudness functions for sustained broadband noise and for speech are
similar to the ones for tones from sensitive regions of hearing. However, a direct
comparison shows that at low sound levels the loudness of a wideband stimulus,
such as white noise, grows more rapidly than the loudness for a tone in the most
sensitive region of hearing (Scharf, 1978).
verbal descriptors are not precise. Examples of loudness functions obtained with
category rating are shown in Fig. 2. These functions are much different from the
ones obtained with magnitude scaling procedures shown in Fig. 1. Category
rating is believed to yield an ordinal scale of sensation.
0.25 kHZ
2.00 kHZ
0.50 kHZ
4.00 kHZ
1.00 kHZ
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Loudness growth data for narrowband noise obtained from category rating using a
seven-point scale: 0 (inaudible) to 6 (too loud). Average data are shown for 11 listeners with normal
hearing. The center frequency of the noise is shown in the top right corner of each panel. The lower
right panel shows equal-loudness contours derived from the rating data in the other panels. (From
Allen et al., 1990.)
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
works on the basis of the same principle, but for this task, the perceived
magnitude of a sound is compared with the perceived magnitude of a stimulus
from a modality other than hearing, such as the brightness of a light or the length
of a line (Stevens, 1975). Line length judgments and the matching function
between a sound and line length are used to derive a loudness function (Hellman
& Meiselman, 1988).
Loudness-matching and cross-modality matching procedures are often cited
as methods for validating data obtained using category rating and magnitude
scaling.3 Magnitude scaling procedures and loudness functions derived from
cross-modality matching between a sound and line length and line length judgments yield the same result. Loudness matching also shows good agreement with
data from scaling procedures. Loudness matches are published for a variety of
conditions. Loudness matches are often made between tones of different frequencies or between tones presented to an ear with normal hearing and one with
a cochlear threshold shift. When such results are compared with matching functions derived from category rating and magnitude scaling procedures for the same
sounds used in the matching task, they usually yield the same results (e.g.,
Hellmen & Zwislocki, 1964; McCleary, 1993) or, as in the case of Allen, Hall,
and Jeng (1990), results that have the general shape for equal-loudness contours
obtained using a matching procedure (see the lower right panel of Fig. 2).
However, Hellman (1999) cautions that loudness functions derived from category
rating procedures sometimes yield a reduced rate of loudness growth relative
to the actual state of a cochlear-impaired persons auditory system. Given that
auditory nerve disorders often result in shallower loudness functions for tones
than cochlear disorders, this bias observed in loudness functions measured
using category rating possibly could lead to an incorrect labeling of the type of
hearing loss.
There is considerable evidence that loudness measurement methods, including category rating and magnitude scaling, are affected by response biases, and
some critics believe that this limits their value as scaling techniques (Mellers &
Birnbaum, 1982; Stevens & Galanter, 1957). In any event, these methods should
not be thought of as producing a scale that is a direct map to underlying
A loudness summation procedure has also been used to validate loudness functions obtained using
magnitude scaling (Buus, Musch, & Florentine, 1998). In this procedure, loudness matches between
multiple narrowband sounds processed in separate auditory channels and a single narrowband sound
are made. The decibel difference at equal loudness represents the loudness ratio corresponding to the
number of stimuli in the complex sound. This is based on the assumption that loudness grows in proportion to the number of sounds in the stimulus (Scharf, 1978). Loudness functions derived using this
method show excellent agreement with ones obtained with magnitude scaling for low stimulus levels
(below about 40 dB), but for higher levels the assumption of independence of the channels is violated
by the well-known physiological spread of excitation in the cochlea (Buus et al., 1999).
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
studies of the cochlea, such as von Bekesys, and attempted to relate behavioral
masking-detection data for different frequencies to different places along the
basilar membrane (Fletcher & Munson, 1933). This frequency-to-place mapping
along with knowledge about threshold elevation caused by a masking noise
became the basis for a loudness model that was able to predict loudness growth
based on a listeners hearing thresholds and, if present, type of hearing loss. The
pattern of masking produced by a stimulus was assumed to represent closely its
excitation pattern or its internal representation in the cochlea, and the summed
area under this pattern was used as an estimate of loudness.
Steinberg and Gardner (1937), Fletchers colleagues at Bell Laboratories,
compared the predictions of their model of loudness with loudness-matching data
for tones obtained from persons with different causes of hearing loss. Fletchers
model made excellent predictions. The accuracy of these predictions is noteworthy because loudness functions vary with frequency and the degree and type
of hearing loss in a nonlinear manner. Persons with cochlear threshold shifts, due
to hearing loss or the presence of a noise masker, display an increased rate of
loudness growth over a region of their dynamic range between 4 and 30 dB sensation level, a phenomenon known as recruitment (Buus & Florentine, 2002;
Fowler, 1928, 1937; Hellman & Meiselman, 1990; Hellman & Zwislocki, 1964;
Stevens and Guirao, 1967). By contrast, persons with elevated thresholds due to
a middle-ear disorder, which results in a simple attenuation of sound entering the
cochlea, do not show recruitment. Instead, the loudness function is simply displaced along the abscissa by the amount of the loss. Representative loudness functions for normal hearing and cochlear hearing loss are shown in Fig. 3.
Early physiological evidence that the peripheral auditory system plays an
important role in loudness encoding came from measurements of the cochlear
microphonic (CM) (Wever & Bray, 1930).4 The CM is a voltage that can be measured by placing electrodes within the cochlea or by recording from an electrode
placed in close proximity to the cochlea. The CM faithfully reproduces the input
stimulus for stimulus levels as high as 80 dB SPL; for higher levels, saturation is
observed (Moller, 2000). We know now that the CM is produced when the cochlea
converts mechanical to electrical energy. The motile outer hair cells are the source
of this voltage (Moller, 2000).
Stevens and Davis (1936) measured the CM as a function of level in guinea
pig and found good agreement between the magnitude of the CM with level for
a 1000-Hz tone below levels that produced saturated responses and loudness
functions obtained from humans for the same stimulus conditions. They also
reported good agreement between an equal-loudness contour across frequency (a
1000 Hz, 45 dB SPL tone was the standard) and the levels required to produce
equal-magnitude CM responses.
Wever and Brey, at the time of their measurements, thought the auditory nerve was the source of
what we now call the cochlear microphonic.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Loss (dB) 55 65 75
Loudness functions for normal hearing participants (open circles) and groups of listeners with different amounts of cochlear hearing loss (filled symbols). The amount of the hearing
loss is shown beneath each groups loudness function. These functions were fitted using an equation
suggested by Zwislocki (1965). Note that the functions for cochlear hearing impairment are steeper
than the ones for normal hearing, a phenomenon known as loudness recruitment. (From Hellman &
Meiselman, 1990.)
Physiological evidence reinforces the notion that cochlear output shapes loudness responses. Direct and inferred measures of the basilar membrane (BM)
inputoutput (I/O) function (Ruggero, Rich, Recio, Narayan, & Robles, 1997;
Yates, Winter, & Robertson, 1990) for tones in the most sensitive region of
hearing have a compressive shape, as do loudness functions. The striking similarity in these shapes, for the loudness function and for the BM I/O function, is
shown in Fig. 4. This compressive shape is linked to the motility or movement
of outer hair cells (OHCs), the source of the CM. The motility of OHCs is associated with an active process that provides a significant amount of amplification
or gain for low-level sounds that becomes progressively smaller as level is
increased. This nonlinear cochlear amplifier improves the ability to discern
sounds that are similar in frequency. When OHCs are damaged, as in cochlear
hearing loss caused by disease or ototoxic drugs, the BM I/O function shows an
elevated threshold and the response becomes linear (and steeper) (Ruggero et al.,
1997). Changes in the loudness function due to cochlear hearing loss are also
consistent with what is known about the vibration pattern of the basilar membrane (Moore, Glasberg, & Baer, 1997; Schlauch, DiGiovanni, & Ries, 1998).
All of this evidence is consistent with the notion that loudness growth is a compressive function of intensity in normal hearing and that the compressive aspects
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Ecological Psychoacoustics
BM I/O 10 kHz Animal 113
BM I/O 10 kHz CF Animal 110
BM I/O 9 kHz CF Animal 125
BM I/O 8 kHz CF Animal 126
Magnitude Scaling
Loudness data (Hellman)
Basilar Membrane Velocity Squared (mm2/s)
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
are determined by the nonlinear response of the BM (see also, Buus, Musch, &
Florentine, 1998; Buus & Florentine, 2002).
The loudness function measured using magnitude scaling procedures is a compressive function of intensity with an exponent of 0.3 for frequencies between
roughly 500 and 10,000 Hz (Hellman et al., 2001). For frequencies below
350 Hz, where thresholds become higher, the loudness function becomes steeper
(Hellman & Zwislocki, 1968). One explanation for this threshold elevation and
the corresponding steepening of the loudness function is that cochlear gain is
reduced in these lower frequency regions relative to higher frequencies. Estimates
of cochlear characteristics, both physiological (Cooper & Rhode, 1996) and
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
behavioral (Hicks & Bacon, 1999), suggest that the cochlea is more linear in low
frequencies compared with high frequencies, but the frequency at which the
reduction in nonlinearity occurs and the amount of the reduction and its relation
to cochlear gain/amplification are unknown. Unfortunately, direct physiological
measures of basilar membrane compression, which would resolve this issue, are
difficult to obtain from regions of the cochlea responsible for coding low
frequencies, and, even if they were available, the animals used to obtain these
measures might code sounds at these frequencies differently than humans.
Attempts to model loudness based on physiological responses more central
than the cochlea have been made as well. One popular approach relates stimulus
power to peripheral spike counts in the auditory nerve, but this method fails even
for pure tones according to an analysis of neural data by Relkin and Doucet
(1997). Other researchers report a relation between loudness and brainstem
evoked potentials (Serpanos, OMally, & Gravel, 1997) and cortical evoked
potentials (Stevens, 1970) obtained by placing electrodes on a persons scalp.
A model for predicting loudness from physical spectra of long-duration sounds
was proposed by Moore et al. (1997). This model represents an extension to
the masking-excitation models proposed earlier (Steinberg & Gardner, 1937;
Zwicker, 1970). In the current model, the stimulus is shaped initially by the
transfer functions for the external and middle ears. Then, in the next two stages,
physiological response is altered by the gain and frequency selectivity of the
cochlea. The transformation of cochlear excitation to loudness is accomplished
using the compressive function of intensity, as observed in BM I/O functions.
The integral of the loudness excitation pattern predicts the overall loudness for
a given stimulus. This physiologically weighted, spectral-based model accounts
well for many loudness phenomena, including monaural and binaural threshold
and loudness effects, the loudness of complex sounds as a function of bandwidth,
and equal-loudness measures across frequency (equal-loudness contours).
Perceptual constancy is the ability to perceive a physical property of a stimulus as the same even as the proximal stimulus varies (Goldstein, 2002). For
instance, retinal images of objects vary depending on the lighting, angle, and distance of the object from the retina of the observer. Despite this instability of the
proximal stimulus under everyday conditions, some perceived properties of the
distal stimulus remain stable. In vision, size constancy is frequently cited as an
example of perceptual constancy. For example, observers are able to infer or estimate the actual size of an object by taking into account the distance of an object
from the retina. If this did not occur, persons would appear to become smaller as
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Most prior studies of loudness have used steady-state stimuli that were turned
on and off abruptly. That is, they reach their maximum amplitude immediately
after being turned on (rise time) and their minimum amplitude immediately
after being terminated (decay time). In some instances, experimenters employed
more gradual onsets and offsets of a few milliseconds to limit spectral splatter.
This technique enabled accurate quantification of loudness as functions of
stimulus duration and frequency, the goal of the traditional psychoacoustical
Many natural sounds change dynamically in frequency and intensity. For
instance, an approaching sound source produces a rising intensity and the sensation of a rising pitch even though frequency is falling, a phenomenon known as
the Doppler illusion (Neuhoff & McBeath, 1996). In this case, loudness influences pitch. A study by Neuhoff, McBeath, and Wanzie (1999) demonstrated that
dynamic frequency change also influences loudness and that this change occurs
centrally in the auditory system. In one of their experiments, changes in the loudness of a noise presented to one ear were influenced by a harmonic tonal complex
presented to the other ear. Given that equal loudness and equal pitch contours are
determined by the peripheral auditory system (Stevens & Davis, 1936), the interaction of pitch and loudness of dynamically varying sounds is not accounted for
using traditional approaches.
Marks (1988, 1992, 1994, 1996) and Marks and Warner (1991) have reported
other examples in which loudness judgments of a tone are influenced by stimulus context in a sequence of tones of different frequencies. When a relatively loud
sound at one frequency precedes a soft sound at a different frequency, the soft
sound is judged louder than it is in isolation. The loudness change induced by
this contextual effect is equivalent to a level change of 24 dB for some stimulus
combinations. Marks (1996) showed that ipsilateral contextual effects are larger
than contralateral ones. Further, the effect occurs only for stimuli outside the
analysis bandwidth of the cochlea (i.e., in separate channels) (Marks, 1994;
Marks & Warner, 1991). These results are argued to be consistent with a perceptual phenomenon as opposed to a response bias (Marks, 1996). Parker and
Schneider (1994) believe that this effect has potential ecological significance. For
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
"Time" (auditorium)
FIGURE 5 Examples of waveforms for naturally occurring damped sounds. The top waveform
is for a single strike of a snare drum. The middle and lower waveforms are for the word time spoken
in a near-anechoic environment and in a simulated, highly reverberant environment, respectively. The
amount of reverberation corresponds to that of an auditorium.
in the response consistent with their results for low-frequency tones. The
model predicted no difference for high-frequency tones even though one was
Stecker and Hafter (2000) proposed a cognitive explanation for loudness and
other perceptual differences between ramped and damped sounds. According to
their explanation, the auditory system parses damped sounds into segments representing direct sound, which provides information about the source, and reverberant sound, which provides information about the listening environment. This
would provide the listener with more precise detail about the distal stimulus, a
form of perceptual constancy. The framework for this approach was developed
by Rock (1997), who proposed two modes for interpreting features of stimuli. In
the literal mode, participants judge a stimulus according to the proximal stimulus. In constancy mode, the stimulus is reinterpreted to recover information about
the distal stimulus. Applied to the current situation, listeners judge ramped sounds
in the literal mode because with an abrupt decay there is no information about
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
reverberation in the stimulus. By contrast, damped sounds are judged in the constancy mode to maintain accurate information about the sound source. This
implies that listeners ignore a portion of the decay of a damped sound, a finding
consistent with studies of the subjective duration that show that damped sounds
are judged to be about half as long as ramped sounds and sounds that begin and
end abruptly (rectangular gated sounds) (Grassi & Darwin, 2001; Schlauch, Ries,
& DuGiovanni, 2001). These findings from studies of subjective duration are also
consistent with the loudness studies; that is, damped sounds that are perceived to
have a shorter duration are perceived as less loud.
Schlauch, DiGiovanni, & Donlin (submitted) presented evidence that supports
Stecker and Hafters (2000) idea that listeners may use two modes for judging
the perceptual attributes of ramped and damped sounds. Two groups of subjects
participated in a subjective duration experiment and each group was instructed
differently when asked to match the duration of ramped and damped sounds to
rectangular gated sound (abrupt onset and offset). The key aspects of the two
instruction sets were (1) simply match the duration and (2) include all aspects of
the sounds. Judgments of damped sounds were affected significantly by these
instruction sets. Asking listeners to include all aspects of the sounds in their judgments increased significantly the perceived duration of damped sounds, consistent with the idea that listeners are able to switch between literal mode and
constancy mode, as predicted by the explanation offered by Stecker and Hafter
(2000) for the rampeddamped perceptual differences.
Neuhoff (1998) has taken a different approach to the study of perceptual differences between ramped and damped sounds. In one study, Neuhoff (1998) asked
listeners to estimate the change in loudness of ramped and damped sounds rather
than to estimate their overall loudness. The stimuli were tones, a harmonic
complex, and broadband noise. The level of these sounds changed by 15 dB from
their onset to their offset. Listeners reported that ramped sounds changed in loudness more than damped sounds for the sounds with tonal components. The broadband noise did not show an asymmetry in the loudness. In a second study, rhesus
monkeys oriented longer to a rising intensity harmonic complex than a falling
intensity harmonic complex (Ghazanfar, Neuhoff, & Logothetis, 2002). An
orienting preference was not shown for rising and falling intensity white noise.
These findings represent a perceptual bias for rising level tones, a bias that
Neuhoff argues has ecological import. For instance, an increase in intensity is
one cue that indicates an approaching sound source, and overestimation of this
intensity change could provide the listener with a selective advantage for earlier
preparation for the arrival of the source. It was argued that the effect was not seen
with noise because noise is often associated with background stimuli, such as the
rustling of leaves in the wind. This line of work is discussed in more detail in
Chap. 4, localization and motion.
Whether rampeddamped differences in perception are under complete cognitive control or mediated by a more automatic, subcortical mechanism is a topic
requiring further study. The work of Schlauch et al. (submitted) with different
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
instruction sets is consistent with the idea that listeners can perceive damped
sounds in their entirety or ignore the decay portion, which is the natural bias subjects have when neutral instructions are given. This explanation is also consistent with Neuhoffs finding that listeners judging the loudness change of ramped
and damped sounds perceive damped sounds to change less in loudness than
ramped sounds. That is, if listeners ignore a portion of the decay of a damped
sound, it would not change as much in loudness as a ramped sound that was
judged over its entire duration or a damped sound that adapted over time to a
subaudible level. On the other hand, neural responses from subcortical and cortical regions of the auditory pathway respond differently to ramped and damped
sounds. For instance, two studies in the auditory brainstem of guinea pigs (Neurt,
Pressnitzer, Patterson, & Winter, 2001; Pressnitzer, Winter, & Patterson, 2000)
and the auditory cortex of monkeys (Lu, Liang, & Wang, 2001) found asymmetries in the response of neurons to ramped and damped sounds. In work relevant
to Neuhoffs hypothesis about a bias for looming auditory motion with its accompanying increase in level as a sound approaches, the brainstem of the echolocating moustache bat shows a greater response for sounds approaching its receptive
field than for those leaving it (Wilson & ONeill, 1998). A neural imaging study
(functional magnetic resonance imaging) of ramped and damped sounds in
humans also showed an asymmetry in response to sounds with ramped and
damped envelopes; cortical regions of the brain responsible for space recognition, auditory motion perception, and attention were more active for ramped
sounds than for damped sounds (Seifritz et al., 2002). Although physiological
measurements in the auditory pathways show asymmetries in the response for
ramped and damped sounds, the relation to psychophysical measures is not
straightforward (Neurt et al., 2001). Additional work, combining physiological
and psychophysical approaches, needs to be done in this area.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
frequency separation, there is no peripheral physiological explanation for threshold elevation for the tone due to the presence of the noise. However, in all conditions with noise, childrens thresholds for the tone were elevated compared with
the quiet condition. Adults thresholds were unaffected by the noise. This finding
is evidence that sound-source segregation and selective listening is an ability that
develops with age and perhaps experience.
Leibold and Werner (2002a, 2002b) have presented evidence that infants listen
less selectively than adults and that this behavior affects loudness growth. Loudness growth rates for infants were inferred from reaction time measures. Reaction time measures in adults show an inverse relation with level, and the infant
loudness functions inferred from reaction time measures show a steeper rate of
growth than the functions from adult listeners. To demonstrate that nonselective
listening is consistent with a steeper loudness function, they had adult listeners
perform a competing attention task while measuring loudness growth using magnitude estimation (Leibold & Werner, 2001). They found that the adults, when
forced by task demands to monitor frequency regions other than the one of the
target tone that was scaled in loudness, yielded steeper loudness functions. In
other words, when the competing task diverted their attention from the target
tones frequency, the loudness function grew at a faster rate. This result is consistent with the steeper functions measured in infants and the idea that infants
listen nonselectively under normal listening conditions.
The middle ear of mammals, an air-filled cavity in the temporal bone that
contains the eardrum and ossicles, also contains a tiny muscle that contracts in
the presence of loud sounds. In humans this occurs for the stapedius muscle; in
some other mammals it is the tensor tympani muscle. Numerous studies have
demonstrated a close correspondence between stimulus conditions, loudness
behavior, and middle-ear muscle contractions (Hellman & Scharf, 1984). Ecological considerations for this relation have been proposed, but some are more
convincing than others.
The primary role of the middle ear of mammals is to convert acoustic energy
into mechanical energy with a minimal loss in gain as the ossicles provide an
efficient transmission path for vibrations of the eardrum caused by sound to enter
the cochlea. Although the middle-ear muscles may play a secondary role in this
efficient transmission system by keeping the ossicles in proper alignment for
precise functioning of the middle ear, other theories have been proposed that
suggest a function independent of this role (Borg, Counter, & Rosler, 1984).
To gain an appreciation of the theories proposed to account for the presence
of middle-ear muscles, it is important to understand the conditions under which
they contract. The middle-ear muscles contract (1) prior to and during vocalization, (2) randomly and intermittently, (3) in the presence of loud sounds, (4)
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
during mastication, and (5) in some cases, volitionally (Moller, 2000; Simmons,
1964). The primary effect of this contraction is the attenuation of low-frequency
sounds entering the cochlea (Moller, 2000).
A number of theories for the existence of the acoustic reflex have been proposed, but the theory receiving the most attention is protection from loud
sounds (Borg et al., 1984). According to this theory, the delicate sensory cells
in the inner ear are protected from overstimulation when the middle-ear muscles
contract and attenuate the level of sound entering the cochlea. There are several
criticisms of this theory that call into question its viability (Borg et al., 1984).
For instance, it is only within the last 100 years or so that industrialization has
produced levels of sound that make noise-induced hearing loss a major concern,
so it is difficult to identify evolutionary forces that might have produced such a
mechanism. Also, prior to industrialization, intense sounds that could result in
inner ear damage occurred infrequently and were probably impulsive in nature,
such as thunder. Given that the middle-ear muscles are slow to contract, contraction of the muscles would offer little or no protection from impulsive sounds.
Furthermore, the attenuation afforded by these muscle contractions affects the
amplitude of low-frequency vibrations entering the cochlea, whereas it is the
regions of the cochlea responsive to high frequencies that are most susceptible
to damage from intense sounds.
Simmons (1964) proposed a perceptual theory of middle-ear muscle function that focuses on its ecological importance. He noted that there are a number
of conditions that result in a contraction of the middle-ear muscles that are not a
result of stimulation by intense, external sounds and that these situations may be
the ones that provide insight into the role of these muscles. For instance, he noted
that these muscles contract during mastication and prior to and during vocalization. This would render chewing and self-vocalizations less loud. In the case of
an animal grazing, the reduction of low-frequency noise could increase the likelihood of detecting an approaching predator. In some specialized mammals, such
as the bat, the attenuation of self-vocalizations used for echolocation would help
the bat distinguish among self-produced sound and echoes that may represent its
next meal (Simmons, 1964).
Simmons (1964) presented evidence for another facet of his perceptual theory
from measurements made in cats. In his study, he presented high-frequency
sounds to cats while monitoring their middle-ear muscle activity along with the
cochlear microphonic, a voltage produced in the cochlea that provides an estimate of the level of sound entering the cochlea. He found that presentation of a
novel stimulus resulted in contractions of the middle-ear muscles, which in turn
modulated the level of high-frequency sound entering the cochlea. Middle-ear
muscle contraction normally results in the attenuation of low-frequency sounds,
but cats have a middle ear composed of two cavities, and these two cavities result
in an antiresonance that causes a significant dip in the transfer function of the cat
middle ear at around 5.0 kHz. The contraction of middle-ear muscles changes the
frequency of this antiresonance, which Simmons suggests a cat might use to
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Volume is a colloquial term signifying the strength or loudness of a sound (e.g., the volume control
on a radio), but in the psychophysical literature loudness and volume are not synonyms (Stevens,
Guirao, & Slawson, 1965).
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
& Prosek, 1987). Although this method seems straightforward, the result is highly
dependent on the instructions presented to the participants in studies as well as
on the psychophysical procedures (Hawkins et al., 1987). Loudness discomfort
levels across studies range from 88 to nearly 120 dB SPL, the level reported by
Silverman (1947) after multiple sessions. Most studies find average levels nearer
to the lower end of this range. Silvermans results are anomalous, perhaps due to
his instructions (Skinner, 1980). He asked listeners to indicate when they would
remove their headphones rather than when the level reached a value to which the
listener would not want to listen for a long time, a common instruction in the
more recent studies.
The physical properties of sounds influence their loudness and also the levels
of sounds that are judged uncomfortably loud. Scharf (1978) reports that
an increase in stimulus bandwidth results in increased loudness when overall
stimulus power is held constant, and there is some empirical evidence that this
summation of loudness affects discomfort levels as well (Bentler & Pavlovic,
1989). There is also evidence that peaks in the stimulus waveform influence
loudness. For instance, a series of harmonically related tones added in cosine
phase, which has a high crest factor (the ratio of peak to root-mean-square [rms]
value), is judged louder than noise bands or harmonically related tones added
in random phase when these stimuli have equal rms levels (Gockel, Moore, &
Patterson, 2002).
Filion and Margolis (1992) attempted to test the validity of laboratory
measurements of loudness discomfort levels. They recruited one study group
from a nightclub and compared their laboratory judgments of loudness discomfort levels with their responses on a questionnaire regarding the comfort or loudness levels in the nightclub. Actual levels recorded in the nightclub at the
listeners ears were judged to be comfortably loud even though they exceeded the
levels found uncomfortably loud in the laboratory setting more than 90% of the
time. This finding shows the possible influence of cognitivesocial factors in
judgments of loudness discomfort.
The role of cognitivesocial factors has been documented extensively in
studies of sound annoyance (e.g., Berglund & Lindvall, 1995), which is correlated with loudness judgments (Hellman, 1982, 1985). Annoyance, which might
play a role in loudness discomfort measures, is influenced by culture, expectation, and risk or fear. For instance, some cultures are more tolerant of loud sounds
than others (Jonsson, Kajland, Paccagnella, & Sorensen, 1969; Namba, Kuwano,
& Schick, 1986). Also, listeners often report that sound levels that are annoyingly
loud indoors are acceptably loud outdoors. Outdoors there is more flexibility on
the part of the listener to move away from the annoying sound environment.
Finally, the perceived risk or fear associated with a particular sound influences
its annoyance (Borsky, 1979). For example, aircraft noise levels would be more
annoying following a recent crash than just prior to the crash.
Although there is a large cognitive component that influences measures of
uncomfortable loudness levels, there are physiological limitations on the levels
EBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
of exposure that our ears can process without harm. Listening to speech at levels
judged uncomfortably loud often corresponds to a reduction in speech recognition ability compared with conditions in which speech is presented at levels
judged comfortably loud (Skinner, 1980; Studebaker, Sherbecoe, McDaniel, &
Gwaltney, 1999). Also, prolonged exposure to levels exceeding 90 SPL can result
in permanent damage to the delicate sensory cells in the inner ear. The Occupational Safety and Health Administration (OSHA, 1983) mandates that the average
level for an 8-hour workday not exceed 90 dBa and that no single exposure be
greater than 115 dBa, except for impulsive sounds. The risk increases for shorter
exposures as the level increases. There is evidence that sound levels of 125 dB
SPL presented for only 10 seconds can result in permanent hearing loss (Hunter,
Ries, Schlauch, Levine, & Ward, 1999). Although these high levels of exposure
are sometimes accompanied by dizziness, a symptom reported by some of
Silvermans (1947) participants in his study of the threshold of pain, the hearing
loss caused by long periods of exposure to less intense sounds is often insidious.
There are no pain receptors in the inner ear, and the hearing loss associated with
exposure to intense sounds often happens without discomfort. The loss usually
remains undetected by the victim until an extensive amount of damage has
occurred. It is well known that rock-and-roll musicians suffer from hearing loss
due to exposure to sound produced by the powerful amplifiers used during their
concerts (Palin, 1994; Speaks, Nelson, & Ward, 1970), but members of orchestras also suffer from losses due to exposure to the playing of their own instruments (Royster, Royster, & Killion, 1991), which one might expect would
unlikely to be judged uncomfortably loud. Evidently, persons can adapt to, or
in some extreme cases even become addicted to, listening in environments with
intense sound levels that are potentially harmful to hearing (Florentine, Hunter,
Robinson, Ballou, & Buus, 1998). The role of cultural or generational factors in
this process is captured in the spirit of the statement promoted by some rock-androll musicians in the 1970s by the statement if its too loud, youre too old.
What this statement does not convey is that intense sounds, even enjoyable ones
that are judged comfortably loud, can result in a permanent hearing loss.
Traditional models of loudness based on excitation in the auditory periphery
make accurate predictions for many laboratory sounds. The limitations of these
models become apparent when the loudness of natural sounds is examined. For
instance, the traditional models fail in cases in which loudness is judged in a
reverberant environment, in the context of varying pitch, and when the loudness
of sounds being judged has asymmetrical temporal envelopes. Future studies
should explore the relation between the loudness of natural sounds and their
representation in the central auditory pathways as well as the relation between
the loudness of natural sounds and reverberation in sound-field or virtual soundEBSCO Publishing : eBook Academic Collection (EBSCOhost) - printed on 8/11/2014 8:21
AN: 117193 ; Neuhoff, John G..; Ecological Psychoacoustics
Account: rug
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
field environments. Such studies will help to resolve some of the shortcomings
of the current body of loudness theory to deal with the realities of listeners ecological listening experiences.
Allen, J. B., Hall, J. L., & Jeng, P. S. (1990). Loudness growth in 1/2-octave bands (LGOB)a procedure for the assessment of loudness. Journal of the Acoustical Society of America, 88, 745
Allen, J. B., & Neely, S. T. (1997). Modeling the relation between the intensity just noticeable difference and loudness for pure tones and wide-band noise. Journal of the Acoustical Society of
America, 102, 36283646.
von Bekesy, G. (1960). Experiments in hearing (pp. 404429). New York: McGraw-Hill. Original
German version appeared in 1928.
von Bekesy, G. (1960). Experiments in hearing (pp. 257267). New York: McGraw-Hill. Original
German version appeared in 1936.
Bentler, R. A., & Pavlovic, C. V. (1989). Comparison of discomfort levels obtained with pure tones
and multitone complexes. Journal of the Acoustical Society of America, 86, 126132.
Berglund, B., & Lindvall, T. (1995). Community noise. Stockholm: Center for Sensory Research.
Borg, E., Counter, S. A., & Rosler, G. (1984). Theories of middle-ear function. In S. Silman (Ed.),
The acoustic reflex: Basic principles and clinical applications (pp. 6399). New York: Academic
Boring, E. G. (1942). Sensation and perception in the history of experimental psychology. New York:
Appleton-Century Crofts.
Borsky, P. N. (1979). Sociopsychological factors affecting the human response to noise exposure.
Otolaryngologic Clinics of North America, 12, 521535.
Bregman, A. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA:
MIT Press.
Broadbent, D. E. (1958). Perception and communication. Oxford: Pergamon Press.
Buus, S., & Florentine, M. (2002). Growth of loudness in listeners with cochlear hearing loss:
Recruitment reconsidered. Journal of the Association for Research in Otolaryngology, 3, 120
Buus, S., Musch, H., & Florentine, M. (1998). On loudness at threshold. Journal of the Acoustical
Society of America, 104, 399410.
Cooper, N. P., & Rhode, W. S. (1996). Fast travelling waves, slow travelling waves and their interactions in experimental studies of apical cochlear mechanics. Auditory Neuroscience, 2, 289299.
Fechner, G. (1966). Elements of psychophysics. Translation edited by D. H. Howes, and E. C. Boring
(Eds.), H. E. Adler. Originally published in 1860. New York: Holt, Reinhart, & Winston.
Filion, P. R., & Margolis, R. H. (1992). Comparison of clinical and real-life judgments of loudness
discomfort. Journal of the American Academy of Audiology, 3, 193199.
Fletcher, H. (1995). The ASA edition of speech and hearing in communication (J. Allen, Ed.).
Woodbury, NY: Acoustical Society of America. Reissue of Fletchers (1953) Speech and hearing
in communication.
Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation. Journal
of the Acoustical Society of America, 5, 82108.
Florentine, M., Hunter, W., Robinson, M., Ballou, M., & Buus, S. (1998). On the behavioral characteristics of loud-music listening. Ear and Hearing, 19, 420428.
Fowler, E. P. (1928). Marked deafened areas in normal ears. Archives of Otolaryngology, 8, 151155.
Fowler, E. P. (1937). The diagnosis of diseases of the neural mechanisms of hearing by the aid of
sounds well above threshold. American Otological Society, 27, 207219.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Ecological Psychoacoustics
Ghazanfar, A. A., Neuhoff, J. G., & Logothetis, N. K. (2002). Auditory looming perception in rhesus
monkeys. Proceedings of the National Academy of Sciences of the United States of America, 99,
Gockel, H., Moore, B. C. J., & Patterson, R. D. (2002). Influence of component phase on the loudness of complex tones. Acta Acustica united with Acustica, 88, 369377.
Goldstein, E. B. (2002). Sensation and perception (6th ed.). Belmont, CA.: Wadsworth.
Grassi, M., & Darwin, C. J. (2001). Perception of the duration of ramped and damped sounds with
raised cosine amplitude. In E. Summerfield, R. Kompass, and T. Lachmann (Eds.). Proceedings
of the Seventeenth Annual Meeting of the International Society of Psychophysics (pp. 385390).
Hawkins, D. B., Walden, B. E., Montgomery, A., & Prosek, R. A. (1987). Description and validation
of an LDL procedure designed to select SSPL90. Ear and Hearing, 8, 162169.
Hellman, R. P. (1976). Growth of loudness at 1000 Hz and 3000 Hz. Journal of the Acoustical Society
of America, 60, 672679.
Hellman, R. P. (1982). Loudness, annoyance, and noisiness produced by single-tone-noise complexes.
Journal of the Acoustical Society of America, 72, 6273.
Hellman, R. P. (1985). Perceived magnitude of two-tone-noise complexes: Loudness, annoyance, and
noisiness. Journal of the Acoustical Society of America, 77, 14971504.
Hellman, R. P. (1991). Loudness measurement by magnitude scaling: Implications for intensity
coding. In S. J. Bolanski, Jr., and G. A. Gescheider (Eds.), Ratio scaling of psychological magnitude (pp. 215227). Hillsdale, NJ: Lawrence: Erlbaum.
Hellman, R. P. (1999). Cross-modality matching: A tool for measuring loudness in sensorineural
impairment. Ear and Hearing, 20, 193213.
Hellman, W. S., & Hellman, R. P. (1990). Intensity discrimination as the driving force for loudness:
Application to pure tones in quiet. Journal of the Acoustical Society of America, 87, 12551265.
Hellman, R. P., & Meiselman, C. H. (1988). Prediction of individual loudness exponents from crossmodality matching. Journal of Speech and Hearing Research, 31, 605615.
Hellman, R. P., & Meiselman, C. H. (1990). Loudness relations for individuals and groups in normal
and impaired hearing. Journal of the Acoustical Society of America, 88, 25962606.
Hellman, R. P., & Scharf, B. (1984). Acoustic reflex and loudness. In S. Silman (Ed.), The acoustic
reflex: Basic principles and clinical applications (pp. 469510). New York: Academic Press.
Hellman, R. P., Scharf, B., Teghtsoonian, M., & Teghtsoonian, R. (1987). On the relation between
the growth of loudness and the discrimination of intensity for pure tones. Journal of the Acoustical Society of America, 82, 448453.
Hellman, R. P., Takeshima, H., Suzuki, Y., Ozawa, K., & Sone, T. (2001). Equal-loudness contours
at high frequencies reconsidered. Journal of the Acoustical Society of America, 109, 2349.
Hellman, R. P., & Zwislocki, J. (1961). Some factors affecting the estimation of loudness. Journal of
the Acoustical Society of America, 33, 687694.
Hellman, R. P., & Zwislocki, J. (1963). Monaural loudness function at 1000 cps and interaural summation. Journal of the Acoustical Society of America, 35, 856865.
Hellman, R. P., & Zwislocki, J. (1964). Loudness function of 1000-cps tone in the presence of masking
noise. Journal of the Acoustical Society of America, 36, 16181627.
Hellman, R. P., & Zwislocki, J. (1968). Loudness determination at low sound frequencies. Journal
of the Acoustical Society of America, 43, 6063.
Helmholtz, H. L. F. (1954). On the sensations of tone. New York: Dover. Original German edition
appeared in 1862.
Hicks, M. L., & Bacon, S. P. (1999). Psychophysical measures of auditory nonlinearities as a function of frequency in individuals with normal hearing. Journal of the Acoustical Society of America,
105, 326338.
Hunter, L. L., Ries, D. T., Schlauch, R. S., Levine, S. C., & Ward, W. D. (1999). Safety and clinical
performance of acoustic reflex tests. Ear and Hearing, 20, 506514.
Jesteadt, W., Luce, R. D., & Green, D. M. (1977). Sequential effects in judgments of loudness. Journal
of Experimental Psychology: Human Perception and Performance, 3, 92104.
Copyright 2004. Elsevier Academic Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted
under U.S. or applicable copyright law.
Jonsson, E., Kajland A., Paccagnella, B., & Sorensen, S. (1969). Annoyance reactions to traffic noise
in Italy and Sweden. Archives of Environmental Health, 19, 692699.
Lawless, H. T., Horne, J., & Spiers, W. (2000). Contrast and range effects for category, magnitude
and labeled magnitude scales in judgments of sweetness intensity. Chemical Senses, 25, 8592.
Leibold, L. J., & Werner, L. A. (2001). The effect of listening strategy on loudness growth in normalhearing adults. Paper presented at the annual meeting of the Association for Research in
Otolaryngology, St. Petersburg Beach, FL.
Leibold, L. J., & Werner, L. A. (2002a). The relationship between intensity and reaction time in
normal-hearing infants and adults. Ear and Hearing, 23, 9297.
Leibold, L. J., & Werner, L. A. (2002b). Examining reaction time (RT)-intensity functions in normalhearing infants and adults. Paper presented at the annual meeting of the Association for Research
in Otolaryngology, St. Petersburg Beach, FL.
Lu, T., Liang, L., & Wang, X. (2001). Neural representation of temporally asymmetric stimuli in the
auditory cortex of awake primates. Journal of Neurophysiology, 85, 23642380.
Marks, L. E. (1981). What (good) are scales of sensation? Behavioral and Brain Sciences, 4, 199200.
Marks L. E. (1988). Magnitude estimation and sensory matching. Perception and Psychophysics, 43,
Marks, L. E. (1992). The contingency of perceptual processing: Context modifies equal-loudness
relations. Psychological Science, 3, 285291.
Marks, L. E. (1994). Recalibrating the auditory system: The perception of loudness. Journal of
Experimental Psychology: Human Perception and Performance, 20, 382396.
Marks L. E. (1996). Recalibrating the perception of loudness: Interaural transfer. Journal of the
Acoustical Society of America, 100, 473480.
Marks, L. E., & Warner, E. (1991). Slippery context effect and critical bands. Journal of Experimental
Psychology: Human Perception and Performance, 17, 986996.
McCleary, E. A. (1993). The ability of category rating and magnitude estimation to predict loudness
growth in ears with noise simulated hearing loss. Unpublished masters thesis, University of
Minnesota, Minneapolis.
Mellers, B. A., & Birnbaum, M. H. (1982). Loci of contextual effects in judgment. Journal of
Experimental Psychology: Human Perception and Performance, 8, 582601.
Moller, A. R. (2000). Hearing: Its physiology and pathophysiology. Boston: Academic Press.
Moore, B. C. J., Glasberg, B. R., & Baer, T. (1997). A model for the prediction of thresholds, loudness, and partial loudness. Journal of Audio Engineering Society, 45, 224240.
Namba, S., Kuwano, S., & Schick, A. (1986). A cross-cultural study on noise problems. Journal of
the Acoustical Society of Japan, 7, 279289.
Neuert, V., Pressnitzer, D., Patterson, R. D., & Winter, I. M. (2001). The responses of single units in
the inferior colliculus of the guinea pig to damped and ramped sinusoids. Hearing Research, 159,
Neuhoff, J. G. (1998). Perceptual bias for rising tones. Nature, 395, 123124.
Neuhoff, J. G. (2001). An adaptive bias in the perception of looming auditory motion. Ecological
Psychology, 13, 87110.
Neuhoff, J. G., & McBeath, M. K. (1996). The Doppler illusion: The influence of dynamic intensity
change on perceived pitch. Journal of Experimental Psychology: Human Perception and Performance, 22, 970985.
Neuhoff, J. G., McBeath, M. K., & Wanzie, W. C. (1999). Dynamic frequency change influences
loudness perception: A central, analytic process. Journal of Experimental Psychology: Human
Perception and Performance, 25, 10501059.
Newman, E. B. (1933). The validity of the just-noticeable difference as a unit of psychological
magnitude. Transactions of the Kansas Academy of Sciences, 36, 172175.
Occupational Safety and Health Administration (OSHA). (1983). 1910.95 Occupational noise exposure. 29 CFR 1910.95 (May 29, 1971). Federal Register, 36, 10466. Amended March 8, 1983.
Federal Register, 48, 97769785.