Intonation Accent and Rhythm
Intonation Accent and Rhythm
Intonation Accent and Rhythm
Advisory Board
Irena Bellert, Montreal
Maria-Elisabeth Conte, Pavia
Teun A. van Dijk, Amsterdam
Wolfgang U. Dressler, Wien
Peter Hartmann, Konstanz
Robert Ε. Longacre, Dallas
Roland Posner, Berlin
Hannes Rieser, Bielefeld
Volume 8
wDE
G
Walter de Gruyter · Berlin · New York
1984
Intonation, Accent
and Rhythm
Studies in Discourse Phonology
Edited by
Dafydd Gibbon and Helmut Richter
wDE
G
Walter de Gruyter · Berlin · New York
1984
Library of Congress Cataloging in Publication Data
Main entry under title:
Manfred Krause
Recent Developments in Speech Signal Pitch Extraction 243
D. Robert Ladd
English Compound Stress 253
Hans-Heinrich Lieb
A Method for the Semantic Study of Syntactic Accents 267
Helmut Richter
An Observation Concerning Intensity as a Predictable Feature of
Intonation 283
Mitsou Ronat
Logical Form and Prosodic Islands 311
Peter Winkler
Interrelations Between Fundamental Frequency and Other Acous-
tic Parameters of Emphatic Segments 327
Name Index 339
Subject Index 342
The Authors and their Affiliations
Janet Bing
Department of English,
Old Dominion University,
Norfolk, Virginia,
USA
L. Boves
Instituut voor Fonetiek,
Katholieke Universiteit,
Nijmegen,
Netherlands
David Brazil
Dept. of English Language
and Literature,
University of Birmingham,
England
Alan Cruttenden
Dept. of General Linguistics,
University of Manchester,
England
Anne Cutler
Medical Research Council
Applied Psychology Unit,
Cambridge,
England
Grzegorz Dogil
Fakultät für Linguistik
und Literaturwissenschaft,
Universität Bielefeld,
Federal Republic of Germany
Anthony Fox
Dept. of Linguistics
& Phonetics,
University of Leeds,
England
VIII The Authors and their Affiliations
Anna Fuchs
Seminar für Deutsche Philologie,
Universität Göttingen,
Federal Republic of Germany
Dafydd Gibbon
Fakultät für Linguistik
und Literaturwissenschaft,
Universität Bielefeld,
Federal Republic of Germany
J o h a n ' t Hart
Instituut voor Perceptie Onderzoek,
Eindhoven,
Netherlands
B. L. ten Have
Instituut voor Fonetiek,
Katholieke Universiteit,
Nijmegen,
Netherlands
D. R. Hill
Department of Acoustics,
University of Calgary,
Canada
Wiktor Jassem
Acoustic Phonetics Research Unit,
Polish Academy of Science,
Poznafi, Poland
Gerald Knowles
School of English,
University of Lancaster,
England
Manfred Krause
Technische Universität Berlin,
Federal Republic of Germany
D. Robert Ladd
Department of Experimental Psychology,
University of Sussex,
Brighton,
England
T h e Authors and their Affiliations
Hans-Heinrich Lieb
Fachbereich Germanistik,
Freie Universität Berlin,
Federal Republic of Germany
Helmut Richter
Fachbereich Germanistik,
Freie Universität Berlin,
Federal Republic of Germany
Mitsou Ronat
Centre National de Recherche Scientifiqi
Paris,
France
Wilhelm Η. Vieregge
Instituut voor Fonetiek,
Katholieke Universiteit,
Nijmegen,
Netherlands
Peter Winkler
Sozialwissenschaftliche Fakultät,
Universität Konstanz,
Federal Republic of Germany
I. H. Witten
Department of Acoustics,
University of Calgary,
Canada
DAFYDD GIBBON A N D HELMUT RICHTER
in general, but other papers (cf. Cutler, Gibbon) use this aspect to extend
the descriptive power of linguistic descriptions. Indeed, a concern with
temporal organisation (e. g. pitch contours as a function of time, rhythm,
tempo) and therefore, implicitly or explicitly, with 'processes' might be
considered an a priori condition for any treatment of suprasegmental,
prosodic, discourse phonetic or discourse phonological matters.
The contributors (except Cutler and Gibbon) are mainly concerned with
the 'post-production' phases of phonetics and phonology: signal process-
ing and the experimental phonetics papers on the one hand, and interpret-
ative linguistic analysis on the other. Issues in articulatory phonetics are
not dealt with; on the signal processing side, an introductory overview
(Krause) is included in view of the increasing methodological importance
of computer supported acoustic analysis of pitch.
There are three issues which stand out particularly, in the set of contrib-
utions taken as a whole, as being of lasting concern in this field:
i. The nature of intonational meanings, with a number of different ap-
proaches crystallising out. Two of the main lines might be thought of as
the 'basic meaning' approach in various forms (cf. Bing, Cruttenden), with
a 'system relative' version proposed by its critics (cf. Knowles), on the one
hand, and the 'configurational' approaches on the other, including a 'pat-
tern indexing' approach (cf. Fox, Gibbon) and a 'cohesion marking' ap-
proach with categories of 'focus', 'anaphora' and the like (Fuchs, Ladd,
Ronat). A different framework is provided by Lieb in his formal recon-
struction of 'speaker attitudes'.
ii. The concept of 'normal intonation\ 1normal accentuation' (especially
Fuchs, Winkler; also Ladd, Ronat, Lieb), in particular with respect to the
position of an utterance within discourse (e.g. initial; 'first instance' vs.
'second instance') or in a specific discourse type (e. g. quotation; reading
aloud). Clarification of this notion takes place here by careful study of,
among other things, a range of different 'non-normal' forms: contrastive
accent, the hitherto poorly understood 'default accent' and other types
(cf. Fuchs, Ronat); emphasis (Winkler).
iii. The autonomy of discourse phonological systems relative to specific locu-
tionary domains such as the sentence (Gibbon, Knowles, Ronat; most
others, implicite et passim). While not all would agree on details, Fuchs'
statement would probably not be contested too hotly by the contributors:
"The syntactic hierarchy of a sentence does not determine accent choice,
but plays an important role as a framework for the choices to be effected."
Generative phonology, which applied a non-autonomy hypothesis to pros-
odic features in the first two decades of its existence, has also adopted var-
ious versions of the autonomy hypothesis during the past decade (Dogil,
Ronat).
It is perhaps the last of these issues which will ultimately provide a key
to the solution of the first two. The question whether intonation struc-
Phonology and Discourse 3
1 Richter, Η., & D. Wegner. "Die wechselseitige Ersetzbarkeit sprachlicher und nichtsprach-
licher Zeichensysteme", in: Posner, R. und H.-P. Reinecke (eds.), Zeichensysteme, Wies-
baden, Athenaion, 1977.
Phonology and Discourse 5
Note that our condition for a linguistic structure to have acoustic corre-
lates is not a strong one. A surjection is postulated (since none of the ele-
ments of a structure which is said to possess correlates may be allowed to
have no inverse image). However, it is not required that the order of the
structural elements et be pre-established in the organisation of the @y. Nor
is it required that Corr be an injection (where every value el is assigned to
only one argument Qj, because the realisation of a structural element can
also be conceived as the joint effect of several (provided that these are
discernable as its correlates, i.e. that there is a mapping relation at all).
Notwithstanding a fairly liberal understanding of 'having acoustic
correlates', it would appear to be grossly and ineffectively simplistic to in-
terpret the ©y in terms of the raw output of the equipment (which is itself
derived rather than raw). With regard both to the complexities of 'etic-
emic' relationships and to the evolving computer-aided practice of pro-
cessing the speech signal to phonetic ends, the Sy seem instead to presup-
pose several steps of derivation.
6 D. Gibbon and H . Richter
Figure 1
TN
πι m
\,\ m
hJ
(I,J;M,N
m = eIN)
π/ m m m
I,l u I,N
m m m
%m M, 1 M,J M,N
mj = Ä*I· Tj)
(x, y > 0), we can mark the case that the set of scalars ν can be partitioned
into k subsets containing η elements each and that the ν can be arranged
(cf. Figure 2) into a matrix SB with elements Vy such that
Figure 2
h 1 tn
J
Λ vn vxj V\n
SB =
Pi ViX Vij Vin
Pk Vkj Vkn
Phonology and Discourse 7
Vij = f(P»t·)
can be said to be values of parameters P{ at valuation times tj. While the en-
tries of SR are to be understood as the results of procedures such as ^ - e x -
traction or itensity measurement, SB represents a variable accounting for
the speech signal being in one respect or another problem-oriented. With
the exception of a trivial transition from SR to 30, this implies that parame-
ters are no longer the dimensions of measurement themselves hut varying bun-
dles of these activated with varying 'density' along the temporal axis (cf. Fi-
gure 3).
Figure 3
It is obvious that the ambiguity of sentence (1) occurs only in the written
language.
When the sentence is spoken, the ambiguity disappears because the into-
nation contour on the final word will indicate whether Eunice is being
spoken to or about. When sentence (2) is spoken or read, the interpreta-
tion is that Eunice is my sister.
When (3) is read, the interpretation is that the sentence is being addressed
to someone named Eunice.
I will propose in this paper that the contour found on the final word of (3)
has a special discourse function. This contour indicates that certain parts
of the sentence do not contribute to the truth conditions of the sentence,
but deal instead with speaker-listener relationships. The contour, which I
will label the D contour, has in the past usually been identified either as a
continuation of a previous contour or as the low rising contour, a contour
which is also found on questions and non-final intonation phrases.
1
I would like to thank Dwight Bolinger, Alan Cruttenden, Bruce Downing, Dafydd Gib-
bon, Kathleen Houlihan, and Michael Kac for helpful comments on an earlier version of
this paper. The problems which remain are solely my responsibility.
2
I have used Pike's (1945) notation for marking intonation rather than the more commonly
used notation which is in parentheses after examples (2)-(7). The notation commonly used
by British linguists does not distinguish between the D contour and the C contour, as I
have defined them in this paper. The contours are simplified, and usually show only the
nuclear and sentence-final rises and falls.
A Discourse Domain 11
As Ladd (1978) has made clear, there are a limited number of intonation
contours in English which are recognised by everyone who studies intona-
tion. These 'consensus contours' include those on sentences (4)-(7). The
intonation patterns marked on these and subsequent examples are simpli-
fied, and show only nuclear and phrase-final rises and falls.
3
The names of the contours are based on the labels given to pitch accents by Bolinger
(1957).
4
For readers who find it difficult to distinguish this contour from the others, the following
suggestions might help. This contour sometimes is used on sentences where there seems to
be some type of implication. For example, sentence (5) might have the implication, "but he
doesn't have something else." The Α-rise contour is sometimes used on sentences in order
to communicate a "just any" interpretation:
We won't invite/ärfyone.^ (Meaning: We won't invite just anyone, only certain people.)
5
The nuclear rise refers to the last rise or fall adjacent to or on the nuclear syllable. Ameri-
can linguists usually refer to the nuclear syllable as the syllable with sentence stress. The
fact that the Α-rise contour and the C contour have the same configuration is not acciden-
tal. In Bing (1979) I argue that these two contours do not contrast, although the common
assumption is that they do.
12 J. Mueller Bing
Notice that if the C contour occurs on the vocative after (6), the resulting
sentence is ill-formed.
' Dwight Bolinger suggests that (11) might be normal as a protest in the following context:
i. H e has the money, Eunice! If we kick him out now, how are we going to pay the tab?
For me, this is one of the possible interpretations for (9), which can also be pronounced:
The D contour is the only one which can occur on vocatives in sentence-
medial and sentence-final position, as the following sentences illustrate:
Eunice can occur with the A contour, as in (2) or with the Α-rise contour,
but in these cases, it is not interpreted as a vocative.
If the D contour is a continuation of the previous contour, why con-
sider it an independent contour at all? Why not simply assume that it is a
'tail' or continuation of the contour which precedes it? The strongest ar-
gument for considering the D contour an independent entity is the fact
that it is clearly set off by phrase boundaries. Although there is not always
an actual pause before a phrase which has a D contour, there is always
lengthening of the syllable before it. Because of this and other rhythmic
differences 7 , listeners can reliably distinguish between sentences which
have the same fundamental frequency but different phrase boundaries as
in the pairs of examples which follow.
However, if one consciously supplies a real pause between the sentence and the vocative, it
is possible to get this interpretation on (iii), which is a variant of (ii) and (9), but not possi-
ble to get this interpretation on (iv).
7
Liberman (1975: 292-293) provides very specific data about the intonation and timing of
the sentences in (15).
8
Notice that there is no phrase-final rise on (15b) or on (26). In sentence-final position, the
rise is always optional My informal observation is that men tend to omit the rise more than
women do in American English.
14 J. Mueller Bing
The fact that the D contour is always set off by phrase boundaries is re-
flected in English orthography by the fact that the phrases which have the
D contour are always set off by commas. In both of the (b) examples in
(15)—(16) there is a potential for pause at the place marked by a comma.
This is in contrast to the (a) examples in which pause is impossible be-
tween the corresponding pairs of words.9
9 Alan Cruttenden disagrees with my judgement that there is always potential pause before
the D contour. Specifically, pause is impossible f o r him on sentence (10). For me it is not.
The lack of agreement may arise from the fact that I am considering pre-boundary length-
ening as one of the cues to the presence or absence of intonation boundaries.
10 I am using the phrase 'truth conditions' informally here to mean those conditions in the
real world which are necessary and sufficient to determine whether a sentence is true or
false.
A Discourse Domain 15
One can even make a fairly innocuous phrase into an epithet by using the
D contour.
The D contour on (23) contrasts quite clearly with the A-rise contour on
(24).
In (24) the sentence is true only if the neighbors who drank the beer were
the Finks and not the Steins. In (23) the sentence can be true if the neigh-
bors drank all the beer regardless of whether they were the Finks or the
Steins, and regardless of whether they were 'finks' or wonderful people.
Another class of expressions which always have the D contour in sen-
tence-medial and sentence-final position are the expressions which occur
with direct quotations and include 'verbs of saying.'
" Dwight Bolinger has provided a context in which (23) can be ambiguous:
(i) M y neighbors, the finks/Finks, and my friends, the finks/Finks, have drunk all my
beer.
Similarly, (3) could be ambiguous in the context given in (ii).
(ii) This is my sister, Eunice, and this is my cousin, Eunice.
In spite of these counterexamples, in most contexts, one tends to get unambiguous read-
ings f o r both (23) and (3). In addition, it is possible to get a sentence-final rise on (ii) only
with the vocative reading. There also seem to be rhythmic differences between the t w o
readings, although they are subtle.
16 J. Mueller Bing
Notice that in (25) and (26) the truth conditions for the whole sentence
are quite different from the truth conditions for the part which is quoted.
Compare the truth conditions for (25) and (27).
(27) This party is a bore.
Sentence (27) is true if there is a party, and it is a bore, and false other-
wise. This assertion in (25) is not affected by the phrase, she whispered,
and its truth value is not changed if, in fact, the speaker shouted. The
truth conditions for (25) are different. If the party is a bore, but the
speaker shouted, the sentence is false.
Although the expressions which introduce quotations are different from
vocatives, expletives and epithets in the way in which they affect the truth
value of the sentence, the D contour functions in a similar way in all three
cases. It marks part of the sentence as a domain independent of the rest.
This domain, which I will call a 'discourse domain' is speaker/listener-
oriented rather than message-oriented.
Allerton and Cruttenden (1976) note that certain adverbials such as
frankly are speaker/listener oriented. They also claim that frankly can oc-
cur only with a low rising contour. The following example is theirs:
In each of the (b) sentences the adverbial with the D contour expresses the
speaker's attitude, and does not contribute to the truth value of the sen-
tence. In each of the (a) sentences the adverbial does contribute to the
12
Allerton and Cruttenden (1974: 51).
A Discourse Domain 17
truth value. For example, in (30a) the sentence is true if she married him
and was happy; it is false if she married him and was not happy. In (30b)
happily may or may not correctly represent the speaker's attitude towards
the wedding, but this is not relevant to the truth or falsity of the state-
ment.15
If, as I am claiming, at least one function of the D contour is to mark a
domain separate from the sentence, a domain concerned with speaker/lis-
tener relationships, one would expect to get this interpretation consist-
ently even when some ambiguity is possible. This seems to be the case, as
(32) and (33) illustrate, in (32) the phrase, the poor child\ is not a descrip-
tion of Jenny; in (33) it is.
In (32) the message of the sentence is only that someone is named Jenny
and that Jenny fell down. The phrase, the poor child, expresses the
speaker's attitude towards her. Since the primary message of the phrase is
one of expressing sympathy, it is not necessary for Jenny to be poor and a
child for the sentence to be true, as it would be for (33).
Based on the hypothesis I am proposing, one might expect that any ex-
pressions which deal only with speaker-listener relationships would occur
with the D contour. This seems to be the case. Polite expressions let the
listener know that the speaker is sensitive to his feelings, but add nothing
substantial to the message. These expressions always occur with the D
contour in medial and final position.
13 It is also possible to find examples of adverbial expressions which occur with the D con-
tour and do affect the truth value of a sentence. For example:
taining the epistemic verbs such as think, know, and suppose, seem to re-
flect a judgment about the truth or falsity of a statement.
In fact, the opinion of the speaker about the probability that the sentence
is true does not in any way affect the truth conditions for the sentences.
Sentences (36) and (37) are true if Claude went to the party and false if he
did not, regardless of the opinion of the speaker. This is also true for sen-
tences with adverbials such as possibly and probably.
ions, and speaker-listener relationships are dealt with. The phrases which
have the D contour are interpreted independently of the rest of the sen-
tence and do not contribute to the truth value of the sentence.
References
Allerton, D . J . and Cruttenden, A. (1974). English sentence adverbials: their syntax and their
intonation in British English. Lingua 34: 1-30.
Bing, J. Mueller (1979). Aspects of English Prosody. Ph. D. dissertation. University of Massa-
chusetts.
Bolinger, D. (1957). A Theory of Pitch Accent in English. Word 14: 109-149.
Ladd, D. Robert (1978). The Structure of Intonational Meaning. Bloomington: Indiana Uni-
versity Press.
Liberman, M. (1975). The Intonational System of English. Ph. D. dissertation. Μ. I. T.
Pike, K. (1945). The Intonation of American English. Ann Arbor: University of Michigan
Press.
L. BOYES, Β. L. T E N HAVE, W. Η . V I E R E G G E
1. Introduction
Table 1: Inventory of perceptually relevant pitch movements in Dutch. The position of the
movements in the syllable is defined with respect to the vowel onset. For a further
explanation of the column 'prominence lending' see text.
Automatic Transcription 23
ing stressed. Equally important: in its present form, the transcription sys-
tem does not provide any means for distinguishing primary (pitch) accents
from secondary stresses; as far as the transcription system is concerned,
stress is a strictly binary feature - a syllable is stressed (carries a pitch ac-
cent) or it is unstressed; there are no half- or weak-stressed syllables.
c) The physical properties of the F 0 movements underlying the percept-
ually relevant pitch movements are defined rather precisely, the more so if
compared with other transcription systems known from the literature.
This precision, which would seem to make objective verification possible,
has been a major impetus for the present investigations.
d) The twelve perceptually relevant pitch movements do not have equal
probabilities of occurrence. The movements of type 1, 2, A, Β, 0 and 0 oc-
cur more frequently than the movements of type 3, 4, C and D; move-
ments of type 5 and Ε seem to be rather rare. The order in which the
movements have been discovered corresponds to the frequency of their
usage; our knowledge of both the applicability and the properties of the
underlying F c movements is accordingly greater for the 'older' move-
ments.
does not really disable the perception of a high-ending pitch). The mini-
mum method, on the other hand, shows a very great distance between the
cluster formed by the contours 10 and 13 and the contours 11 and 15.
This suggests that the prominence lending pitch rises of type 1 and type 3
are perceptually different indeed.
The close association in both solutions between the contours 4 and 9 is
very remarkable if one realizes that the addition of the non-prominence-
lending rise of type 2 in the last syllable turns contour 9 into a question,
whereas contour 4 is a neutral declaration. This supports the reasonable-
ness of the decision b y ' t Hart and his co-workers not to use syntactic and
semantic categories in the design of their transcription system.
Another striking result is the dissociation between plain and dented
hat-patterns (1 and 2, 3 and 4, 5 and 6), all being neutral declarative into-
nations, especially the distance between contours 5 and 6. This can only be
explained by supposing that not only the number and place of pitch ac-
cents count, but also the manner in which they are realized. Accentuation
by means of a pitch fall seems to be clearly different from an accent
caused by a combined rise and fall on one syllable. This seems to under-
score the perceptual importance of the otherwise inconspicuous movement
of type B. The close association in both methods between the contours 5
and 14 should also be pointed out; it suggests that under certain circum-
stances the sequence of two falls of type Ε might easily be mistaken for a
single fall of type A.
For the rest, the results seem to resist any simple and consistent expla-
nation. The number and placement of pitch accents seem to be of influ-
ence, but it is not quite clear exactly how. Nevertheless, the results have
enhanced our confidence in the reality of the perceptually relevant pitch
movements as defined b y ' t Hart et al.
both are prominence-lending, but rises of type 1 are said to occur early in
the syllable whereas rises of type 3 occur late. There are more instances of
movements that differ primarily in their position within the syllable, but
then one is prominence-lending and the other is not. Therefore we
thought that a proof of the possibility of discriminating between instances
of rises of type 1 and rises of type 3 solely on the basis of acoustic registra-
tions would strongly support our hypothesis that it will eventually be fea-
sible to base the entire system on an interpretation of physical registra-
tions.
Obviously in natural speech two sources of variation in F 0 exist, viz. in-
ter· speaker and intra-speaker variations. In order to estimate if inter-
and/or intra-speaker differences in each type of pitch rise would be so
great at to exceed the differences between the movements, we needed a
great number of 'sure' tokens of both movements produced by a number
of speakers.
We decided to try to collect our material by asking subjects to imitate
simple sentences containing clear examples of the movements under inves-
tigation.
Four sentences, each spoken with five different intonation contours,
were recorded onto magnetic tape by J. 't Hart. Three sentences and three
intonation contours were meant as distractor items, the remaining sen-
tence with two different intonation contours served as test items. Ten sti-
mulus tapes were constructed by copying the five tokens of each sentence
in random order, but blocked per sentence. All ten stimulus tapes were im-
itated once by two subjects in one session. Next, nine stimulus tapes were
imitated by the same two subjects one tape a day on nine successive work-
ing days. This design enables the comparison of relatively short-term fluc-
tuations in the properties of pitch movements (fluctuations within a single
recording session) with long-term fluctuations, e. g. fluctuations between
recording sessions on different days.
As test sentence, the almost meaningless but at the same time almost en-
tirely voiced clause Wie naait moet een naald gebruiken (Who sews has to
use a needle) was used. The intonation contours under test are fairly well
established, viz. a hat pattern and a cap pattern ('t Hart & Collier, 1975)
and are shown in Fig. 2. The hat pattern has a prominence-lending rise of
type 1 on the second syllable, where the cap pattern has an - equally
prominence-lending - rise of type 2 instead. As will be clear from Fig. 3,
the patterns show considerable difference in the second part, the hat pat-
tern having a prominence-lending fall of type A on the fifth syllable,
whereas the cap pattern remains on the high declination until the fall of
type C at the very end of the utterance. The hat pattern occurs extremely
often and gives an utterance the character of a neutral declaration. The
cap pattern occurs somewhat less frequently and is also less neutral, giving
more or less the impression of a surprised exclamation.
Automatic Transcription 29
Figure 2: Two intonation patterns for the Dutch sentence 'Wie naait moet een naald gebrui-
ken'.
A: Hat pattern with prominence lending rise of type 1 on second syllable
B: Cap pattern with prominence lending rise of type 3 on second syllable
From all imitations and from the two example utterances simultaneous
registration of F 0 , intensity and the oscillogram were made, using the
same apparatus as described in experiment III below. On the basis of the
oscillogram and the intensity curve the onset of the vowel in the second,
stressed syllable (naait) was established. The vowel onset was defined as
L. Boves, Β. L. ten Have, H. Vieregge
3c
50 100
Automatic Transcription 31
Figure 3a-e: F c contours in the first two syllables of the sentence "Wie naait moet een naald
gebruiken".
The upper pannels contain averages from 10 imitations during a single session;
the middle pannels show averages for 9 imitations on 9 consecutive days. The
leftmost collumn pertains to speaker IS, the rightmost to speaker TR.
Vertical bars denote standard deviations. The arrows mark syllable boundaries.
The lower pannel shows the F 0 contours in the corresponding syllables in the
example utterance, spoken by J. 't Hart.
Curves labelled '3' pertain to the rise taken from the Cap-pattern, whereas
curves labelled Ί ' are taken from the Hat-pattern.
32 L. Boves, Β. L. ten Have, H. Vieregge
other two (IS and LB) had some previous experience in using the system.
The remaining subjects knew the system only in an abstract way, from the
literature.
Judge JH LB IS TR W PW
Stimuli
Speaker IS 100 100 100 0 30 0
10 imitations
in one session 30 80 85 15 10 15
Speaker IS - 100 100 5 — 0
9 imitations on
consecutive days - 94 100 0 - 0
Speaker TR 100 100 100 35 50 15
10 imitations
in one session 20 90 80 30 30 15
Speaker TR - 100 100 17 — 17
9 imitations on
consecutive days - 83 83 11 - 17
Table 2: Percentage correct scores on judging the type of pitch rise in the second syllable of
the Dutch sentence Wie naait moet een naald gebruiken.
The upper part of the rows refers to scores on complete sentences, the lower part to
scores on the gated segments consisting of the first and second syllable only.
From these results we may only conclude that all imitations constituted
acceptable realizations of hat or cap patterns, as all three 'trained' judges
reported that they had based their decisions entirely on the contour as a
whole. The confusions made by the untrained subjects show that they can
distinguish both contours (for speaker IS somewhat more easily than for
speaker TR), but cannot label them correctly.
In order to prevent our subjects from taking recourse to the differences
in the second halves of the contours and forcing them instead to give real
'early-late' judgements, we prepared stimulus tapes that contained only
the first two syllables of all imitations. These tapes were given to the same
subjects with the same instruction: indicate whether the pitch rise is late or
early in the second syllable. The results are summarized in the lower half
of the rows of Table II, again in terms of percentages 'correct' score; the
quotes indicate that correct assignments in terms of actual timing are
hardly to be expected; the best we can expect being correct labelling in
terms of rises intended as specimens of type 1 or type 3. The results look
rather strange at first sight, especially the fact that now the 'moderately
trained' subjects LB and IS outperform the 'highly trained' subject J H
34 L. Boves, Β. L. ten Have, H. Vieregge
whose correct scores were reduced to the same percentage as that of the
'untrained' subjects. This result can, however, be explained by the fact
that the 'moderately trained' judges continued judging in terms of rises of
types 1 and 3 (and have succeeded to a large extent in doing so) whereas
the 'highly trained' judge J H really used the position of the rise in the syl-
lable as his criterion. This explanation is supported by a detailed analysis
of the individual imitations and the corresponding judgments. As can be
seen from Fig. 3, speaker IS begins both types of rises rather early in the
syllable; in perfect accord with this, J H denotes the great majority of rises
produced by this speaker as type 1. Only those rises that indeed occur re-
latively late are judged to be of type 3 by JH. With speaker TR, who be-
gins all rises in the middle of the syllable, no consistent explanation for
correct scores can be given. From the fact that there is no relation between
the correct scores of J H and those of the untrained subjects it must be hy-
pothesized that a great amount of training is necessary in order to be able
to determine the position of a pitch movement within a syllable correctly.
The better than chance performance of subjects IS and LB suggests that
the first two syllables contain much information as to the type of contour
the rise is taken from, but it is not clear what exactly this information is.
Because we did not see ways to separate the effects of speaker idiosyn-
crasies and problems in the imitation of non-neutral intonation contours,
we decided to discard both the imitation technique and the use of pitch
movements that are likely to be used in non-neutral utterances.
' A , Β C j L__ F
Fo
(Hz)
t
Af
t
Δf
^t Δf
/
V At
_L
•v \
u «ι \
krl
F b
-Fo
(arbitrary units)
Figure 4: On defining the slope of F c movements.
A: Purely real Fo movement; slope is Δ ί / Δ ί
Β: Purely virtual F 0 movement; no sensible definition of slope possible
C: Partly virtual, partly real F 0 movement; formal definition of slope is very diffi-
cult. An ad hoc definition that appears to work well in practice is slope is A f c o .
talMtreai
Automatic Transcription 37
set (i.e. the width of the movements in Hz, their slopes as defined in
Fig. 4, and their position in the syllable) were compared with the tran-
scription given b y ' t Hart in a very detailed manner. We paid special atten-
tion to the syllables that were assigned one or two prominence-lending
pitch movements in the transcription o f ' t Hart, checking whether these
syllables showed corresponding F 0 movements and if so, collecting their
properties in tables. In this way we tried to formulate a set of explicit crite-
ria which a F 0 movement must fulfil in order to be acceptable as a rise or
fall, and similar sets of criteria for rises and falls separately that must be
met for a movement to be prominence-lending. Movements that are not
rises or falls are, by consequence, declinations; whether a declination is
high (0) or low (0) depends primarily on its being preceded by a rise or
not.
There are various ways to assess the quality of a set of rules that map F
movements onto a transcription of the intonation. An obvious possibility
is to compare the transcription by rule on a per movement or per syllable
basis with the transcription b y ' t Hart. For two reasons we did not settle
on this possibility. T h e first reason is that we had in mind the comparison
of the transcription by rule with stress scores by listener subjects as an ulti-
mate criterion; the other reason is that we do not know how to interpret
possible shifts by one or two syllables of non-prominence-lending move-
ments.
Assuming that all main stresses are pitch accents, we can easily derive a
classification of all syllables as stressed or unstressed from the transcrip-
tion b y ' t Hart. A similar classification follows directly from the stress
scores of the listener subjects. We therefore decided to use the number of
syllables correctly assigned prominence-lending or non-prominence-lend-
ing pitch movements by applying the rules as a measure for the quality of
the rules. We can then use a very simple formula, due to Miller (1969), for
the assessment of the quality, viz.:
2*Τ
Ο = J
V C + R
where Q ( 0 < Q < 1) is the quality measure, R the number of syllables as-
signed a prominence-lending pitch movement by our rules, C the number
of syllables stressed according to the criterion (a prominence-lending pitch
movement in the transcription o f ' t Hart or stressed according to the ma-
jority of the listener subjects), and J the number of syllables jointly classi-
fied as stressed by rules and criterion.
criterion applies equally to both types of falls. If a fall ist not of type A it
must consequently be of type B.
Note that the rules described so far preclude intonation contours tran-
scribed as Ό10Γ or '011' and '0ΒΑ' '0ΑΒ' or '0ΑΑ'. The intonation gram-
mar allows for contours that begin on a high declination (e. g. '00AO') but
utterances having an intonation like this do not sound neutral most of the
time. Therefore we have excluded this possibility in our system of rules.
We must, however, account for utterances with a rise of type 1 on their
first syllable, realized by means of a virtual F 0 movement, in fact by means
of an F 0 starting at a high level. This rather frequent situation necessitated
the introduction of an extra rule. The occurrence of ari intonation of the
form '00AO' will also invoke the application of that rule, thus incorrectly
assigning a rise of type 1 to the first syllable.
The classification of pitch movements proceeds from left to right
through the utterance. Having reached the end of the utterance, we return
to the first syllable in order to see whether the extra 'virtual rise of type Γ
rule applies. This extra rule states that the first syllable of an utterance
contains a virtual rise of type 1 if it reaches a pitch value at some point
which exceeds a value of 1.8 times the starting value of that rise of type 1
later on in the utterance which has the lowest starting point.
5.6. Results
The first criterion for the quality of the rules is, of course, the agreement
between the output of the rules and the transcription b y ' t Hart. The sec-
ond criterion for testing the quality of the rules derives from the fact that
assigning pitch movements to syllables implies a classification into two
categories, viz. stressed and unstressed syllables. This classification can be
compared with stressed-syllable scores by a number of subjects.
We have tested the quality of the rules in three different situations. The
first one was identical with the situation in the design set, i.e. weather
forecasts read by professional speakers. The second test situation con-
sisted of the reading of a magazine article by non-professional speakers
whereas the third situation, a spontaneous interview, differed consider-
ably from the design set. By comparing results obtained with different
speakers we can see whether or not the results are speaker-dependent.
As stated above, the comparison of the output of the rules and the tran-
scription b y ' t Hart will be confined to prominence-lending movements.
In the first situation, the weather forecasts, both design set and test set
are treated as homogeneous entities, despite the fact that the design set
consits of renderings by three different speakers and the test set of two.
The test set contains 432 syllables, of which 122 were assigned a promi-
nence-lending pitch movement by 't Hart, 106 received such a movement
from our rules (of which 101 were also considered stressed b y ' t Hart) and
89 syllables were stressed according to our panel of four subjects. From
Automatic Transcription 41
the 249 syllables in the test set 73 were judged stressed b y ' t Hart, 66 re-
ceived a prominence-lending pitch movement from our rules, and 68 were
considered stressed by our subjects. Using Miller's formula, we can com-
pute quality measures for all possible combinations of stress scores. The
results are shown in Table 3. It is surprising that the agreement is always
better for the test set than for the design set.
Table 3: Quality scores derived using Miller's formula for the agreement between several
pairs of stress-assignments for weather forecasts read by professional speakers.
5.7. Discussion
It seems worthwile to discuss the results of the first test in some detail. A
comparison of the transcription b y ' t Hart and the output of the rules has
shown that the rules are somewhat more successful in detecting promi-
nence-lending rises of type 1 than in finding prominence-lending falls of
type A. From the 21 syllables in the design set that are stressed according
t o ' t Hart, but not according to the rules, 10 apply to a missed A and 11 to
a missed 1. These figures should be compared with a total of 99 rises of
type 1, 4 falls of type A (of which 3 are wrongly assigned to syllables
w h e r e ' t Hart has transcribed a non-prominence-lending fall of type B)
and 3 instances of the combination l&A in the output of the rules. In the
transcription b y ' t Hart, we have 93 rises of type 1, 7 falls of type A, 11
combinations l&A, 3 times the combination A&2 and 6 times the - not ex-
pected - combination 5&A; the remaining two prominence-lending move-
ments were isolated rises of type 5 that could also be considered as some-
what high-starting rises of type 1 (according t o ' t Hart's comments).
Because falls of type A typically occur at the end of an utterance or
clause (which gave them the name "final fall"), missing them does not
strongly influence the overall transcription. Although we have no formal
means to compare the details o f ' t Hart's transcription and the output of
our rules, the correspondence seems to be quite good, at least except for
the falls of type A. In the test set our rules find 55 instances of a rise of
type 1, 3 falls of type A and 3 combinations l&A. The transcription b y ' t
Hart shows also 55 rises of type 1, 4 falls of type A, 10 combinations l&A,
42 L. Boves, Β. L. ten Have, H. Vieregge
1 instance of the combination A&2 and 3 times the combination 5&A. For
reasons that are not quite clear, our rules seem to be more successful in
finding falls of type A in the test set. This explains the slightly better
correspondence with the transcription o f ' t Hart.
As stated before our four listener subjects clearly assigned fewer stresses
t h a n ' t Hart, and also fewer than our rules. As the rules were designed to
mimic't Harts transcription as closely as possible, the latter result is not
surprising. We have not been able to find a simple and consistent explana-
tion for the details of the behaviour of our subjects. They are somewhat
reluctant to assign stress scores to adjacent syllables, an aversion which is
clearly not shown b y ' t Hart and - consequently - also not by our rules,
but this refusal to accumulate great numbers of stress scores in small
stretches of speech cannot explain the whole discrepancy. The behaviour
of our subjects is, by the way, in good correspondence with results of
Bouwhuis (1973), who found that subjects are not likely to consider more
than approximately 25% of the syllables in a neutrally read text as
stressed. This suggests that a trained phonetician must be expected to hear
more stresses than a naive listener. Because we are dealing with a system
which treats stress as a binary feature, we cannot say whether or not this is
due to a greater chance of weak stresses being heard by a trained subject.
It must be kept in mind t h a t ' t Hart did not give stress scores like our
small panel of subjects. Instead, he gave a transcription of the pitch con-
tours of the utterances, from which we have derived stress assignments.
Given the extraordinary capability of this single subject to correctly de-
scribe the pitch movements we cannot exclude the possibility that he has
transcribed a number of pitch movements as prominence-lending because
their physical properties were in accordance with those of 'standard prom-
inence-lending movements' whilst other parameters usually contributing
to the perception of prominence had values typical of non-prominent syl-
lables. Such a selfcontradictory set of parameter values could have re-
strained our panel from considering a syllable as stressed, but n o t ' t Hart.
It is interesting to note that there are virtually no syllables called
stressed by our panel but not b y ' t Hart. This means that the professional
speakers under study do indeed use pitch as the primary cue to signal
stress; virtually all accents are pitch accents, or, at least, also pitch accents.
This statement cannot be refuted by the hypothesis that 't Hart hears
stresses first and then assigns a prominence-lending pitch movement to
the stressed syllables, because of the very good correspondence between
his transcription and the output of our physically based rules.
Reading a Spontaneous
magazine article speech
text 6 text 7 text 8 text 9
(LB) (TR) (LB) (TR)
total number of syllables 240 240 174 216
number of syllables stressed ac-
cording to rules 47 54 17 35
number of syllables stressed ac-
cording to subjects 38 49 31 36
quality of the rules computed
by means of Miller's formula,
using subject's scores as crite-
rion .81 .93 .66 .93
Table 4: Summary of the results for the tests with non-professional speakers
For the readings of a magazine article the figures show the pattern
which is by now familiar: whilst the overall correspondence between the
output of the rules and the stress scores of the subjects is reasonable, the
rules are more liberal in assigning stress judgments than the subjects. For
speaker LB the quality of the rules can be said to be fair, for speaker T R it
is excellent.
If we turn to the spontaneous speech, the picture changes dramatically.
For both speakers the subjects consider more syllables stressed than the
rules, for speaker LB, indeed, almost twice as much. According to the
scores of the subjects both speakers use slightly less than 20 % stressed syl-
lables, not an unusual figure for spontaneous speech.
The F 0 curves in the spontaneous speech of speaker LB are virtually
flat, confirming that his speech, although quite normal after all, is almost
monotonous. Nevertheless it appears to contain clearly perceptible ac-
cents, that must have been signalled by otherwise concomitant parameters
as duration and loudness. The role of those parameters in cueing stress
has been confirmed in an informal experiment. Three utterances taken
from the spontaneous speech of LB were processed by means of an LPC
vocoder modified to become an Intonator. From all utterances a number
of different versions were produced, among which an original (standard
LPC vocoder action), a fully monotonized version and two or more ver-
44 L. Boves, Β. L. ten Have, H. Vieregge
6. Conclusion
The ultimate aim of our research was to prove that it is possible to develop
a set of rules for converting registrations of F 0 , intensity and oscillogram
of normal Dutch speech into a transcription of the intonation according
to the system due t o ' t Hart et al. Characteristic of that system is that it
describes the pitch of utterances as moving between two levels, while
transitions from one level to the other will give rise to the perception of
stress in some well-defined conditions and will not do so in all other con-
ditions. Considering only two levels, it is clearly the movements which are
important, not the levels.
We have succeeded in developing those rules. Using the twelve percept-
ually relevant pitch movements (or rather a subset thereof) as a back-
ground it is possible to stylize and describe the raw F 0 tracings in terms of
one or two movements per syllable.
The output of the rules appeared to agree quite well with the transcrip-
tions made by an expert transcriber. Also it appeard to explain the stress
scores given by a small panel of subjects, at least as long as neutrally read
texts were considered. This must be taken to prove the conjecture that un-
der normal conditions specific pitch movements are the most important
factor in signalling prominence in Dutch.
The rules proved to be able to explain the stress scores of our subjects
in the spontaneous speech of one speaker too. For the spontaneous speech
of a second speaker they failed almost completely. This speaker, who is
judged as normal although rather exceptional, seems to shift to another
strategy for signalling prominence if he changes his style of speech from
neutral reading to conversation. The very fact that he is considered as ex-
ceptional and that the rules perform fairly well even for him when read-
ing, suggests that at least for more or less formal Dutch it is possible to
derive a meaningful description of intonation by means of applying a sim-
ple set of rules to registrations of F 0 , intensity and oscillogram.
Acknowledgement
This research was supported by the Dutch Organisation for the Advance-
ment of Pure Science (Z.W.O.).
Automatic Transcription 45
We are specially indebted to J. 't Hart, who put much time and effort
not only into making numerous transcriptions of not very exciting texts
but also into fruitful and above all encouraging discussions.
References
Atkinson, J. E. (1976). Inter and intra speaker variability in fundamental voice frequency.
J. Acoust Soc. Am. 60: 440-445.
Bouwhuis, D. (1973). Het klemtoonoordeel, persoonlijk of eenstemmig? (Stress-judgements,
personal or unanimous?). IPO Rapport 253 (in Dutch). Eindhoven.
Cohen, A. & ' t Hart, J. (1967). On the anatomy of intonation. Lingua 19: 177-192.
Collier, R. (1970). The optimum position of prominence lending pitch rises. IPO Annual
Progress Report 5: 82-85.
Collier, R. (1975). Physiological correlates of intonation patterns. J. Acoust. Soc. Am. 58:
249-255.
Crystal, D. (1969). Prosodic systems and intonation in English. Cambridge: Cambridge Uni-
versity Press.
't Hart, J. & Cohen, A. (1973). Intonation by rule - a perceptual quest. Journal of Phonetics 1:
309-327.
't Hart, J. & Collier, R. (1975). Integrating different levels of intonation analysis. Journal of
Phonetics 3: 235-255.
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika 32: 241-254.
Katwijk, A. F. V. van (1974). Accentuation in Dutch. Assen: van Gorcum.
Lieberman, P. (1965). On the acoustic basis of the perception of intonation by linguists.
Wordy. 40-51.
Miller, G. (1969). A psychological method to investigate verbal concepts. /. Math. Psycholo-
gy 6: 169-191.
Rietveld, T. & Boves, L. (1979). Automatic detection of prominence in the Dutch language.
Proceedings Inst, of Phonetics, Nijmegen 3: 72-78.
Romportl, M. (1974). Zur Synthese der Intonation. In: Romporti, M. & Janota, P. (Eds.),
Acta Universitatis Carolinae, Philologica, 2/1974 ( = Phonetica Pragensia IV): Prague.
15-29.
Rossum, N. van & Boves, L. (1978). An analog pitch-period extractor. Proceedings Inst, of
Phonetics, Nijmegen 2: 1-17.
Vögten, L. L. M. & Willems, L. F. (1977). The formator, a speech analysis-synthesis system
based on formant extraction from linear prediction coefficients. IPO Annual Progress Re-
port, 2: 47-54.
Willems, L. F. The intonator (1966). IPO Annual Progress Report, 1: 123-125.
DAVID BRAZIL
1
The research on which this paper is based was carried out as part of a programme spon-
sored by the Social Sciences Research Council during the period 1975-78. The paper is a
revised version of one of seven that were appended to the Final Report on that programme.
Sentences Read Aloud 47
2 The original transcriptions were made in accordance with the conventions used by Halli-
day (1967), but for the sake of consistency I have used my own conventions here.
48 D. Brazil
Similarly, certain adverbials elicited a falling tone while others had a fall-
rise:
/ / ρ in any case / / (you'd better go)
/ / r in that case / / (you'd better go)
In interpreting these results, it was assumed that any attempt to character-
ise the significance of such co-occurrences of lexico-grammatical choice
and intonation choice would be set in a theoretical framework which han-
dled the semantics of the sentence as a complete and separate event. Sub-
sequent work on discourse intonation has made it possible to reinterpret
these earlier findings in a different way. It does not seem proper to assume
that, because sentences are presented as isolated entities in test material,
they are necessarily processed as such by readers. Indeed, it is an important
implication of the approach that only by adopting an oblique stance can a
reader avoid making decisions about intonation choices that presuppose
some view of a putative context of interaction for the item he is reading.
The application of this principle illuminates the readings of the sentences
quoted above. In the case of
/ / r the po/icemen / / (had taken him away)
the presence of the definite article in the subject could well be expected to
encourage a reader to behave as if its referent were somehow present to
the consciousness of a hearer and so available for reference. There would
be no corresponding encouragement to regard A policeman . . . in this
light, a fact which one might connect with the tendency of readers to not
assign referring tones to subjects which have indefinite articles. Alongside
these two we may place sentences like 'Policemen were taking him away',
where the fact that readers show no clear preference for either referring
or proclaiming tone can be connected with the absence of anything to sug-
gest that the referent is, or is not, present in the common ground of the
context of interaction.
We may ask what significance attaches to the fact that, in some but by no
means all cases, it is possible to predict certain intonation choices a reader
will associate with an item presented to him for reading aloud. A working
hypothesis would be that whenever there is general agreement among
readers about such choices (as, for instance, about the assignment of refer-
ring tone to a certain constituent) it is because the presented item is
strongly evocative of just those contextual features on which the choice
depends. The possibility then suggests itself that pairs of sentences might
be constructed, members of each pair being contrasted with respect to a
single lexical or grammatical feature in such a way that:
(i) they will elicit consistently contrasted readings with respect to one of
the phonological oppositions a description postulates; and
(ii) it will be possible to say with some measure of confidence just what
are the imputed discourse conditions that motivate the different re-
sponses.
Sentences Read Aloud 49
Method
Two lists of sentences were prepared, those in the first list differing from
those in the second with respect to some feature which it was expected
would result in different choices of tone. Informants were told that the ex-
perimenter was interested in discovering how they would 'normally' read
aloud certain sentences and what degree of conformity one could expect
among readers. They were asked to read aloud each sentence from the
first list, having first read the whole list silently to themselves. Other read-
ing material was then provided for similar treatment - a prose passage and
a piece of dialogue required for a slightly different experiment - in order
to reduce the danger of remembered patterns influencing informants' per-
formances when they were asked to read sentences from the second list.
They were warned that they would probably be aware of minor differ-
ences between otherwise similar sentences in the two lists, a precaution
which was found to reduce the distorting effects of contrastive intonation
in second members of pairs. The readings of both lists were recorded and
subsequently transcribed.
To be useful, pairs of sentences had to satisfy two conditions.
(i) There had to be a difference in the intonation treatment of the two
members.
(ii) At least one member of the pair had to elicit a high measure of agree-
ment as to the intonation treatment of the differentiating feature.
So, for instance, having established that readers regularly select r tone
for a certain constituent of one member of a pair, one might propose a
discourse explanation of the fact that in the other member ρ tone is se-
lected regularly for the same constituent, or of the fact that in the second
member ρ tone or r tone appears to be selected indiscriminately.
The testing procedure was planned with a view to assembling a battery
of pairs which satisfied the above demands. Anticipating which sentences
would elicit agreed readings of a particular kind proved to be a difficult
business, and a large number of items that were tried out had to be aban-
doned. After I had obtained five readings of the first pair of lists I ex-
amined the results, so that items which were evidently not going to be use-
ful could be eliminated and replaced by others in new lists. The process
was continued until patterns of similarities and differences had been main-
tained with reasonable consistency through twenty readings of any parti-
cular pair.
It was evident that consistency was dependent on at least two variables:
the ease with which relevant features of a hypothetical discourse setting
could be 'recovered' from the printed item; and the reader's skill and ex-
perience in reading aloud. The filtering procedure I used was taken to
have resulted in the selection of items in which relevant contextual fea-
tures are relatively accessible to a reader. With regard to the second var-
50 D . Brazil
iable, informants were all people who could be expected to have a higher-
than-average level of verbal facility — language teachers and students of
English at University. By using a selected population I sought to reduce
the effect of differential skill to a minimum. There is clearly scope, once a
set of 'norms' has been established, for investigating differences among
readers, including non-native readers, but the immediate focus of interest
was on the construction to be placed upon differential readings of pairs of
sentences when readers were agreed in producing them. What is taken as
'agreement' is made clear in relation to each pair of sentences as it is dis-
cussed. Different sets of informants were used for each revised list of
items, so that the analysed results of readings of 61 sentences reflected the
performances of a total of 50 different readers.
however, say, that when R tone occurs in readings of this particular sen-
tence it is always associated with 'what you want'. An important consider-
ation in constructing examples to investigate tone choice was to find sen-
tences in which most readers did select R tone at some point. In this
particular case, sixteen out of the twenty readers satisfied the condition.
The analysis which follows begins with an examination of the elicited
data.
Analysis
1. When readers associate a referring tone with one of two tone units in a sen-
tence, a change in the order of the constituents results in a corresponding
change in the distribution of tones.
Example:
(1) / / R what you want / / Ρ is a new car / /
(2) / / P a new car / / R is what you want / /
Interpretation: When either of these items is presented in isolation the
thematisation arrangements exert a powerful influence on the reader to
treat 'what you want' as an assumed focus of interest and 'a new car' as
matter introduced as new into a hypothetical context of interaction. Sen-
tence (2) can be interestingly compared with the very different contextual
implications of
(3) / / R but a new car / / Ρ is just what you want / /
where the situation reflected in the tone choice is one in which the possi-
bility of someone having 'a new car' must have been introduced into the
conversation in order for it to be rejected.
Measure of agreement:
Sentence (1) : 18 informants assigned R tone to the first tone unit
Sentence (2) : 17 informants assigned R tone to the second tone unit
The remaining five readings had Ρ tone in both tone units and would thus
be explicable in terms of our postulated oblique orientation, the reading-
out process being carried out without reference to any reader-hypothesis
as to the sentence's interactive implications. There are no counter-exam-
ples, among the 40 readings examined, to our prediction that when R tone
is selected it will occur in a certain constituent.
Other pairs of sentences in which the ordering of constituents seems to
Sentences Read Aloud 53
2. When readers associate a referring tone with one of two tone units in a sen-
tence, a change other than a change in the order of constituents results in the R
tone occurring in a different constituent.
Example:
(12) // R if it rains // Ρ we shall expect you //
(13) // Ρ even if it rains // R we shall expect you //
Interpretation: Sentence (12) is treated rather like sentence (10) above. The
if-clause is assumed to represent a possibility already foreseen in a hypo-
thetical context of interaction and so available for reference. By contrast,
the 'Even if. . .' of (13) would usually introduce a possibility as though it
had hitherto not been considered. There would seem to be a strong possi-
bility that such a new consideration would be introduced in relation to
something that was already conversationally in play, a prediction we can
associate with the tendency for readers to associate R tone with 'we shall
expect you' in (13).
Measure of agreement: Sentence (12): 17 informants assigned R tone to the
first tone unit
Sentence (13): 10 informants assigned R tone to the second tone unit
There were no counter-examples, that is to say no readings having R tone
in the 'wrong' constituent. Two had an additional R tone in 'even'. Ex-
cluding these, and also those in which there is no occurrence of R tone
(and therefore, we would argue, no evidence of the matching of the sen-
tence with a hypothetical discourse setting) the presence or absence of
'even . . .' influences all readers in the same way.
Other examples that can be compared with the pair (12) and (13) are:
(14) // R she was trying to convince me // Ρ she was only twenty //
(15) // Ρ it was hard to believe // R she was only twenty //
(16) // R to tell you the truth // Ρ i don't like it //
(17) / / P i can assure you // R i don't like it //
(18) // R where it came from // Ρ is a mystery //
Sentences Read Aloud 55
Measure of agreement .Table 2 shows the incidence of R tone in the six pairs
of sentences considered above.
Comment: The recording of three counter-examples for (19) can probably
be related to the fact that a number of informants failed to read this sen-
tence fluently at the first attempt. Hesitations, misreadings and self-cor-
rection are likely to be due to the opposing pressures of thematisation and
the late occurrence of a definite article.
The consistently lower figures in the last column of Table 2 suggest that
readers find it more difficult to contextualise an item in which the ex-
pected R tone is in the second of two tone units, and so tend to make a
noncommittal choice of Ρ tone in both tone units. It is evident from the
variation in the figures, moreover, that the relative probability of such
noninitial R tones occurring in different sentences could only be investi-
gated by a large-scale statistical study.
3. When some readers associate R tone with the first, but none with the sec-
ond, of two tone units, a change in the sentence affects the likelihood of the R
tone occurring.
Example:
(24) // R the parcel of books // Ρ was lying on the table //
(25) // R a parcel of books // Ρ was lying on the table //
Interpretation .-The presence of the definite article in the subject gives grea-
ter encouragement to the reader to regard 'the parcel of books' as being
conversationally present.
Measure of agreement:
(24) 17 informants assigned R tone to the first tone unit
(25) 2 informants assigned R tone to the first tone unit
In none of the 40 readings was R tone associated with the second tone
unit.
Sentences Read Aloud 57
4. In cases where there is more variation among readers, regularity can be ob-
served in the way changes in the sentence affect the tone assigned to a particu-
lar word.
Example:
(28) / / R i HOPE he'll like it / /
/ / Ρ i hope 11 R he'll like it / /
(29) / / Ρ i woNDer whether he'll like it / /
/ / R i woNDer whether / / Ρ he'll like it / /
/ / R i wonder / / R whether / / Ρ he'll like it / /
In spite of the variation among readings of these two sentences, they can
be compared on the basis of the tone choice associated with 'like':
(28) / / R (i hope he'll) like it / /
(29) / / Ρ (i wonder whether he'll) like it / /
Interpretation: Sentences like (28) seem likely to present the question of
whether he'll like it as a matter already established as of common concern
among the participants in a conversation. It is assumed that the hearers
share the speaker's interest in whether he does or not. Whether he as-
sumes also that his hoping so is also shared determines whether 'hope' is
included in the same tonic segment, and since this depends on more spe-
cific knowledge of the situation it is to be expected that the intonation
treatment of 'hope' will be less predictable. Sentence (29) differs in that it
seems likely to occur either as thinking aloud, or as a y e s / n o elicitation
equivalent to ' D o you think he will like it?'. In either case 'like' would be
located outside the area of convergence between speaker and hearer, and
so would have Ρ tone.
58 D. Brazil
Measure of Agreement:
(28) 17 informants assign R tone to 'like'
(29) No informants assign R tone to 'like'.
Other pairs of sentences that can be compared on a similar basis are
(30) / / R (we can't afford) both of them / /
(31) / / Ρ (we can't afford) either of them / /
(32) / / R (i) could (have been mistaken) / /
(33) / / Ρ (i) couldn't (have been mistaken) / /
(34) / / R (he could) /»Oisibly (be right) / /
(35) / / Ρ (he couldn't) poisibly (be right) / /
(36) / / R (i shouldn't try to go in) august II
(37) / / Ρ (i should try to go in) august II
Interpretation: In (30) 'both' is likely to be a substitute for two things pre-
viously mentioned; but 'either' in (31) serves usually to correct a misappre-
hension that one or other can be afforded and is thus presented with Ρ
tone as a potential alteration of shared assumptions.
'Could' in sentences like (32) typically concedes a possibility already in-
troduced or implied by another speaker. In uttering it one would normally
be acknowledging the assumed possibility as a basis for proceeding.
'Couldn't', on the other hand, denies an implied possibility and so has Ρ
tone. Similar considerations apply to sentences (34) and (35), where the
relevant tone choice occurs in the (situationally redundant) word 'possi-
bly'.
(36) would normally make retrospective reference to the hearer's ex-
pressed intention of going in August, while (37) encourages the reader to
treat 'August' as if it were a (new) suggestion.
Measure of agreement: Table 3 shows the choice of tone in the word speci-
fied in each of the sentences (28) to (37).
(33) COULDN'T — 18
(34) POSSIBLY 18 2
(35) POSSIBLY — 20
(36) AUGUST 18 1
(37) AUGUST - 20
Sentences Read Aloud 59
Comment: The ten readings not accounted for in this table are those in
which readers did not assign a tonic syllable to the word in question. Of
the 83 occurrences of R tone, none occurred where Ρ tone would have
been 'expected'. On the other hand there are nine substitutions of Ρ tone
for the predicted R tone, a fact which is in line with our hypothesis that
failure to contextualise a sentence may result in orientation change and
consequent assignment of Ρ tone to any constituent.
Table 4: Tone choices in first and second tone units. Expected tones are boxed; sequence R/P was
expected in the upper group, P/R in the lower.
1
3
4
6
8
10
12
14
16
18
20
22
24
26
27
32
34
36
38
42
43
48
50
54
55
56
57
58
59
60
61
17 5 12
17 13 3
15 10 5
20 3 17
Sentences Read Aloud 61
11 17 11 6
13 10 4 6
15 16 13 3
17 8 Q
0
19 6 2
21 3 2
23 8
33 6 5 1
39 13 5 8
40 17 14 3
41 10 2 8
51 3 2 1
stituent will be equally well satisfied in so far as they lead him not to select ρ
tone by another reader's choice of r+.
There is much variation among sentences in the way readers tend to
choose between r tone and r+ tone. There are a number of cases where
our group of informants has selected r tone exclusively. Perhaps signifi-
cantly, there are no sentences which lead to a similarly marked preference
for r + tone, a fact which is the more interesting in view of our deliberate
attempts to elicit r + tones for contrastive purposes (see below). In fact
there is no instance of r+ tone being associated with a particular sentence
by some informants and r tone being associated with the same sentence by
none. These observations are consistent with our working hypothesis that
a reader who sees cause to represent a particular constituent as having re-
ferring status will select r tone unless he also perceives role-implications in
the sentence of a kind that lead to him to assume dominance. That the lat-
ter implications are far less predictably set up by the prepared sentences
we have used is shown by the great variation in the proportion of r tone to
tone choices.
To try to show what motivates choice of r+ tone specifically, pairs of
sentences were constructed in which it was expected that opposing tenden-
cies (towards r and r+) could be associated with grammatical and/or lexi-
cal alternatives. It was found that (cf. also second part of Table 4):
Example:
(38) / / R when we'd passed the traffic lights / / (we took the next street
on the left)
(39) / / R when you've passed the traffic lights / / (take the next street
on the left)
Interpretation: Sentence (39) would be most likely to occur when the
speaker was giving directions, an activity in which it is customary to as-
sume dominant role. There is a sense in which the enquirer accepts the 'in-
ferior' conversational role by the simple act of enquiring, and the other
party responds by adopting the role of one who, for the time being, ex-
pects to be listened to. Sentence (38) could be a response, from a non-
dominant position, to a request for an explanation of how the speaker
reached a certain destination. It could, for instance, be part of an apology
for having lost one's way and arrived late for an appointment.
Comment: This pair of sentences illustrates a problem that seems to be in-
separable from an attempt to use the sentence-citation technique to inves-
tigate this particular tone contrast. We have already noted that r tone is al-
ways likely to be substituted for the 'expected' r+ tone. Eight readers
actually preferred it in the case of (39). Moreover, our prediction that (38)
Sentences Read Aloud 63
Table 5
Realisation of R tone
r r+
38 15 4
39 8 11
r r+
64 6 14
speaker assumes the dominant role of one who determines what the other
shall do, an implication that is absent from (40).
Adding the logically redundant second part of '. . . whether he can or
whether he can't' would seem generally, because of its suggestion of
something like truculence or irritation, to be the prerogative of the 'supe-
rior' party to a conversation, or one who can properly assert superiority.
By contrast, 'he' and 'she' in sentence (42) are non-transparent alternatives
and require no special role in the interaction to make their use acceptable.
The inclusion of 'but' and 'just' in (3) has already been shown to result
in a different distribution of R and Ρ tones from (2) Ά new car is what you
want'. We may now note that, by suggesting an act of contradiction or dis-
pute, they incline the reader towards choice of the dominant option in an
initial R tone unit.
The differential treatment of initial and final 'apparently' is slightly less
easy to explain. If 'apparently' is regarded as being largely social in its im-
plications, as insinuating a generalised sense of solidarity rather than qual-
ifying the associated assertion, then it seems appropriate that an assertion
followed by the consolidatory gesture should be placed by a reader in a
context of assumed speaker superiority. When the gesture precedes the as-
sertion, it seems more tentative and accommodating. To put it another
way, a relationship which leads one to invoke social common ground be-
fore a statement might be judged more deferential than one which permits
adding on the social gesture as an afterthought. The treatment of sen-
tences (8) and (9) can usefully be compared with readings of
(53) Tom's got one, hasn't he.
Like other assertion + tag sentences, this includes among its possible reali-
sations:
(a) / / R tom's got one / / Ρ hasn't he / /
(b) / / Ρ tom's got one / / R hasn't he / /
In (a) the speaker introduces the assertion as if it were common ground
and then asks to have it confirmed as one who does not know. In (b) the
assertion is made as if from a vantage point outside the area of assumed
convergence and the hearer is then asked to agree that it is now common
ground. Readers who followed the pattern of (b) reflected the fact that
Table 7: Readings of sentence 53
Realisation of R tone
Sequence r r+
R/P 6 _
P/R - 7
Others 7
Sentences Read Aloud 65
Sentence r r+
38 15 4
39 8 11
40 16 4
41 3 14
42 17 1
43 10 9
1 11 7
3 4 11
8 17 3
9 3 17
Comment .With the exception of the sentence pair 42/43, the alteration in
the presented item results roughly in a reversal of the proportion of read-
ers selecting tones r and r + . The figures for sentence (43) indicate an up-
setting of the very strong preference for r tone in (42).
References
Brazil, D. (1972). An investigation of the relationship between intonation and grammar in the
reading aloud of citation items. Unpublished M. A. Thesis, University of Birmingham.
Brazil, D. (1975). Discourse intonation. Discourse analysis monographs 1. University of Bir-
mingham.
Brazil, D. (1978). Discourse intonation II. Discourse analysis monographs 2. University of Bir-
mingham.
Brazil, D., M. Coulthard, and C. Johns (1980). Discourse intonation and language teaching.
London: Longman.
Halliday, M.A.K., (1967). Intonation and grammar in British English, The Hague: Mouton.
ALAN CRUTTENDEN
Any theory of context has to make a basic division between linguistic con-
text and situational context. For example, Halliday (1967:23) has a basic
division between two types of 'given' elements: an element 'has either been
mentioned before or is present in the situation'. An element that has been
mentioned before is typically referred to by anaphoric reference involving
the use of a pronoun or the definite article or by what Ladd (1979:105)
calls 'default accent', i. e. moving the accent forward away from its most
common position on the last lexical item, e. g.
(1) JOHN didn't complain about the result.
But such formal features (pronouns, definite article, default accent) do
not only refer to elements that have been mentioned before verbatim.
They can also be used in cases of 'associative anaphora' (Hawkins, 1978;
Cruse, 1980), e.g.
(2) (Bill didn't achieve his aim) but JOHN succeeded.
The other type of context, being 'present in the situation', usually refers to
the immediate situation, e.g.
(3) [Taking a snake out of a bag.]
That's a POISONOUS snake.
But anaphoric features may possibly refer to a wider situation, e. g.
(4) (Did anything happen while I was out?)
THE DOCTOR called.
In this example the use of the definite article constitutes one anaphoric
usage and the occurrence of a non-final accent on 'doctor' is a type of
usage which has also often been explained as a type of anaphora. 'Call' is
what doctors typically do and is therefore said to be 'culturally given' or
predictable (Bolinger, 1972). There are however other explanations of this
type of example, which explain it either in terms of nouns being innately
more accentable than verbs (Bing, 1979; Ladd, 1979), or in terms of a spe-
cial pattern for 'event' sentences (Allerton and Cruttenden, 1979; also cf.
Fuchs, 1980). Any influence of wider situational expectations on accentua-
tion (and intonation generally) is uncertain.
In addition to textual/associative anaphora and immediate/?wider situ-
ation, there are two additional types of context which are specially rele-
vant to intonation. Firstly, any immediately preceding intonation is rele-
vant. For example, in English the low falling tune is one of the most
68 A. Cruttenden
Preceding Co-occurring
(a) intonation (c) lexis/grammar
(b) lexis (d) immediate situation
?wider situation
seem very different, but in fact the two attitudes often correlate with
whether the speaker is agreeing or disagreeing with the preceding speaker,
e.g.
(10) (He's got two wives.)
I"know.
(11) (I don't like the man.)
You've never even "spoken to him.
Any one tune can occur in a wide variety of contexts and hence result in a
wide variety of conveyed attitudes. The basic meaning associated with a
particular tune must therefore be of a much more abstract kind than any
of these individual conveyed attitudes. A description of intonation should
at least try to extract the meaning common to all or most occurrences of a
particular tune, however abstract that meaning may be.
In associating different tunes with different lexical items, it is generally
true that any tune may be associated with almost any co-occurring gram-
matical structure and lexical items. In order to factorize out an abstract in-
tonational meaning for one tune, it is therefore necessary to examine a
large number of varied examples of the use of that tune. However a more
economical procedure may be possible precisely because the possibilities
of combination between intonation and grammar/lexis are only almost
free. In other words, the few occasions where particular tunes are impossi-
ble may be expecially revealing. They may indicate something about the
meaning of a tune by indicating what is completely disharmonious with it.
Of course the charge might be made that there are as many negative
aspects to the meaning of a tune as there are positive aspects but prima fa-
cie this seems unlikely when the cases where particular tunes are impossi-
ble are so much more limited than those where they are possible. At any
rate, it seems probable that a systematic examination of such intonational
misfits might reveal something about the meaning of some tunes (and pos-
sibly also something about the characteristics of particular grammatical
structures and/or lexical items or sets of lexical items). As an initial con-
tribution to the analysis of such intonational misfits, I will examine four
such areas in English: (a) tag sentences; (b) adverbials; (c) negatives; (d)
superlatives.
(Β) Adverbials
meanings of rises and falls in English (at least in R. P. and in many other
dialects). And quite apart from the intonational relevance of this division
among sentence adverbials, where in any syntactic or semantic description
would we find any classification scheme which would separate regularly
and usually? In this case intonational misfits have been informative both
about intonation and about the semantics of adverbs.
It is not only sentence adverbials which show this division into a rising
type and a falling type. Adverbials which function as adjectival modifiers
of degree show a similar division, e. g.
(40) He's "partially wrong.
(41) He's 'completely wrong.
Indeed this limiting/reinforcing division is apparent more generally in
English, e.g.
(42) I "thought he was married.
(43) I "know he was married.
(C) Negatives
The complex interaction between intonation and negatives has been dis-
cussed by many writers (e.g. Palmer, 1922; Schubiger, 1935, 1957; Lee,
1956; Halliday, 1967; Liberman and Sag, 1974; Ladd, 1977; Bing, 1980).
The most well-known examples of all are those from Palmer (1922:1):
(44a) He doesn't lend his books to anybody.
(44b) He doesn't lend his books to "anybody.
Palmer's gloss on the meanings of the two examples is as follows: (44a)
means 'he lends his books to nobody' and (44b) means 'he is rather parti-
cular as to the persons he lends his books to; he doesn't lend them to
everybody'. One way of simplifying these glosses is to suggest that two
different meanings of anybody are involved: (a) 'absolutely everybody'
(and this meaning takes the falling tune); and (b) 'any second-rate person'
(and this meaning has the falling-rising tune). Conversely a falling-rising
tune is incompatible or disharmonious with the meaning 'absolutely every-
body' while the falling tune is incompatible with the meaning 'any sec-
ond-rate person'. We are not here involved with an absolute misfit but
rather a limited misfit in which the misfits involve a particular tune and
one of two meanings associated with a lexical item. Ladd (1977:24) sug-
gests the contrast of meanings between fall and fall-rise (or in his terms
Ά-rise') as 'plain focus' versus 'focus within a given set'; the latter mean-
ing is obviously incompatible with the meaning 'absolutely everybody'.
This incompatibility is made even clearer if we actually use a more precise
synonym in each case, and thus produce absolute misfits:
(45a) *He doesn't lend his books to 'any old person.
(45b) He doesn't lend his books to "any old person.
74 A. Cruttenden
(D) Superlatives
(50) He's not just "average size. O r even a bove average size. *He's
big.
As further exemplification, compare the following sets:
(51) It's "huge. It's "massive. It's "marvellous. It's "tiny.
(52) "It's "tall. *It's "pleasant. *It's "small. *It's "thin.
T h e adjectives in (52) can only occur with the extra high fall where there
is an overt comparison with their opposite, e. g.
(53) (It's only short people who are allowed in, people like Peter, Mar-
garet and Jane.) But Jane's "tall!
They certainly cannot occur as the finale of a 'culminating' set of expres-
sions as in (49) and (50). There seems to be some sort of metaphorical link
between the height of the fall and the noteworthiness of the feature de-
scribed by the adjective. Conversely there is a hiisfit between an extra high
fall and an adjective which is not an innate superlative.
So what is the relevance of intonational misfits, that is, intonations
which are impossible, or at least highly unlikely, in particular contexts,
chiefly linguistic contexts involving co-occurring lexis a n d / o r grammar?
Such misfits may be limited, where a particular intonation eradicates the
choice between two possible meanings of lexis/grammar, as in the case of
certain types of negative sentence; or they may be absolute, as in the pro-
hibition on the use of rising intonations with some sentence adverbials.
Such intonational misfits may, firstly, act as operational tests to produce
delicate distinctions within grammatical classes, as, for example, in the
subdivision of frequency adverbials into the falling set which includes re-
gularly and always and the rising set which includes usually; as, too, in the
identification of adjectives which are innately superlative. Secondly, into-
nations which are actually heard in contexts where they would usually be
thought to be impossible (i. e. where they would normally be regarded as
misfits) will be interpreted as jocular or ironic, as in the use of reversed
polarity falling tags, which generally assume that the listener has certain
knowledge, and which may be interpreted as jocular in cases where the lis-
tener clearly does not have such knowledge. Lastly, intonational misfits
will be particularly revealing about the basic abstract meanings of tunes:
for example, falls are shown to be essentially reinforcing while rises are li-
miting. T h e fact that the meaning of a particular tune is heavily dependent
on co-occurring context need not be quite as big a disadvantage as has of-
ten been thought if we focus on the small number of cases of contextual
incompatibility.
References
Allerton, D. J. and Cruttenden, A. (1979). Three reasons for accenting a definite subject.
Journal of Linguistics 15: 49-53.
76 A. Cruttenden
Bing, J. Μ. (1979). Up the noun phrase: another stress rule. In Engdal, E., and Stein, M.,
Papers presented to Emmon Bach by his students. Amherst: University of Massachusetts.
Bing, J. M. (1980). Intonation and the interpretation of negatives. Cahiers linguistiques d'Ot-
tawa, NELS 10: 13-23.
Bolinger, D. (1972). Accent is predictable (if you're a mindreader). Language 48: 633-644.
Cruse, D.A.C. (1980). Review of Hawkins, J. A. (1980), Definiteness and indefiniteness: a
study in reference and grammaticality prediction. Journal of Linguistics 16: 308-316.
Fuchs, A. (1980). Accented subjects in 'all-new' utterances. In Brettschneider, G., and Leh-
mann, C., Wege zur Universalienforschung: sprachwissenschaftliche Beiträge zum 60. Geburts-
tag von Hansjakob Seiler. Tübingen: Narr.
Halliday, M.A.K. (1967). Intonation and Grammar in British English. The Hague: Mouton.
Hawkins, J. A. (1978). Definiteness and Indefiniteness: α Study in Reference and Grammaticality
Prediction. London: Croom Helm.
Iannucci, D. and Dodd, D. (1980). The development of some aspects of quantifier negation
in children. Papers and Reports on Child Language Development, 19: 88-94. (Stanford Uni-
versity: Department of Linguistics).
Ladd, D. R., Jr. (1977). The Function of Α-rise Accent in English. Bloomington: Indiana Lin-
guistics Club.
Ladd, D. R., Jr. (1979). Light and shadow: a study of the syntax and semantics of sentence
accent in English. In Waugh, L. R. and Coetsem, F. van (eds.), Contributions to Grammati-
cal Studies: Semantics and Syntax. Leiden: Brill.
Lee, W. R. (1956). English Intonation: a New Approach. Lingua 5. 345-371.
Liberman, M. and Sag, I. (1974). Prosodic Form and Discourse Function. Tenth Regional
Meeting of the Chicago Linguistics Society.
O'Connor, J. D. and Arnold, G. W. (1961). Intonation of Colloquial English. London: Long-
mans.
Palmer, Η . E. (1922). English Intonation with Systematic Exercises. Cambridge: Heffer.
Schubiger, M. (1935). The Role of Intonation in Spoken English. Cambridge: Heffer.
Schubiger, M. (1958). English Intonation: its Form and Function. Tubingen: Max Niemeyer.
Scuffil, M. (1980). The Interpretation of English Intonation Patterns by Native Speakers and
German-speaking Learners. Unpublished Ph. D. thesis, University of Cambridge.
ANNE CUTLER
Introduction
The central thesis of this paper is that the psycholinguistic evidence on the
perception of prosodic structure in language understanding, and on the
determination of prosodic structure in language production, converge to
show two aspects of the same phenomenon. That is to say, the perceptual
and production evidence together enable us to construct a picture of our
mental representation of the role of prosody in language, and the way this
knowledge is expressed in language use.
Rather than attempt to cover all aspects of prosodic structure, and all
the relevant evidence on how each type of prosodic variation is produced
and perceived, this chapter will concentrate on two phenomena only:
stress and accent. Stress and accent each concern the relative prominence
of one syllable in comparison with others; but as defined here, stress is a
property of words, accent of sentences (or utterances).
The two are not independent in the utterance itself. For instance, accent
is usually realised on a syllable which is marked for stress (although some
exceptions to this generalisation will be discussed below). However, at a
more abstract level of linguistic processing the two phenomena are quite
distinct and are driven by totally independent processes. (For a compre-
hensive discussion of the independence yet interaction of accent and
stress, though one in a theoretical framework different from that devel-
oped in the present paper, see Jassem & Gibbon, 1980.) The evidence to
be cited below, and hence the theoretical conclusions to be drawn, refer
exclusively to English; and the mental representations of prosodic struc-
ture which are inferred are therefore presumably those of English speak-
ers only and are by no means necessarily shared by speakers of other lan-
guages in which word stress and sentence accent are differently expressed
or not expressed at all.
1 This research was supported by a grant from the Science and Engineering Research Coun-
cil, U. K., to the University of Sussex.
78 A. Cutler
Stress
rived from the same root morpheme are stored together in the mental lexi-
con, and that stress errors arise when the root and the correct ending are
chosen, but the primary stress is applied to a syllable marked for stress not
in the intended derivative, but in another member of the lexical cluster.
This account also relies upon the lexical representation of stress informa-
tion. A variety of speech production evidence, therefore, converges to in-
dicate that lexical stress pattern forms part of the representation of words
in the mental lexicon.
Accordingly one would expect that identification of lexical stress pat-
tern would play a large part in word recognition during language compre-
hension. And indeed it does. For instance, when words are misheard, it is
the stress pattern and usually the nature of the stressed syllable which de-
termines what listeners think they hear (see e.g. Games & Bond, 1975).
Typical hearing errors include her peas oyster for herpes zostertesticle for
testable, your purse for reverse, are you using paint remover? for are you
gonna paint your ruler?(a\\ examples from Browman 1978). In each case at
least the steady state portions of the most highly stressed syllables have
been correctly perceived - not surprisingly, since stressed syllables are
acoustically clearer than unstressed syllables. But importantly, the stressed
syllable information has been used to reconstruct a message, in which
number of syllables and stress pattern are the same as in the original, but
precious little else is preserved. Only semantic incongruity (of the per-
ceived message with the context, or of the interlocutor's response based
on an incorrect perception) alerts participants in a conversation to the fact
that a slip of the ear has occurred. One may assume that reconstruction of
an utterance on the basis of partial information in this manner is not con-
fined to those cases where it produces an incorrect result; if it did not of-
ten produce acceptable results it would presumably be abandoned as a
speech perception strategy. That is to say, it would seem that drawing
heavily on information about stress pattern and the nature of the stressed
syllable is a reasonably usual and efficient way of perceiving speech.
What happens, then, when stress pattern information is incorrect? N o t
surprisingly, this often precipitates an error of interpretation. Puns, re-
ports Lagerquist (1980), fail to work when they involve a stress shift. Cut-
ler (1980) reports that a hearer who heard the word perfectionist stressed
on the first syllable, with the second syllable reduced, parsed it as a perfect
shnist, and only became aware of the error when no meaning could be
given to shnist. Bansal (1966) presented listeners with English spoken by
Indian speakers, who often applied word stress in an unorthodox manner,
and found that the listeners tended to interpret what they heard to con-
form with the stress pattern, often in conflict with the segmental informa-
tion. For example, when words with initial stress were uttered with sec-
ond-syllable stress, atmosphere was heard as must fear, yesterday as or study,
character as director, and written as retain. Similarly, when two-syllable
80 A. Cutler
words with stress on the second syllable were uttered with initial stress,
hearers perceived prefer as fearful\ correct as carried, and about as come out.
Robinson (1977) conducted an experimental investigation of the im-
portance of stress pattern in word recognition. Subjects heard lists of
two-syllable nonsense sequences with either initial or final stress. In a
false recognition test they were then presented with two-syllable items
which were made up of the same syllables they had already heard al-
though never in the same combinations which they had heard. Subjects
tended to accept these items (erroneously) if the stress levels of the syl-
lables were the same as they had been in the original presentation. Sim-
ilarly, interference effects in free recall of both nonsense items and short
phrases were found as a result of stress pattern similarity; in other words,
stress pattern identity can precipitate false recognition, often in defiance
of segmental evidence.
As one would suspect, it is not only the case that false stress informa-
tion leads to difficulty of word recognition; prior knowledge of stress pat-
tern can facilitate correct word recognition. Engdahl (1978) found that
when listeners were given a sentence complete but for the last word (e. g.
'Laura tried all the keys but the door wouldn't '), and were asked to
supply an appropriate continuation as quickly as possible, they could do
the task well on the basis of contextual cues alone, but responded signifi-
cantly faster when the stress pattern of the required word was presented
(as a pattern of tones) as well.
Even in the absence of context, stress pattern information can facilitate
lexical access. Subjects in a simple visual lexical decision experiment run
by the present author at Sussex University were presented with words and
non-words in one of two conditions: mixed randomly, or blocked uni-
formly so that a block of 40 items all had the same number of syllables
and stress pattern. Responses in the blocked condition were faster than in
the mixed condition, insignificantly so for the words but significantly for
the non-words. Interestingly, Hirst and Pynte (1978) proposed lexical
stress as one of several arbitrary features which could be incorporated in a
language and which served the function of providing 'extra' word identifi-
cation cues. One of the other such features which Hirst and Pynte pro-
posed was noun gender, and in an analogous experiment to the stress pat-
tern one just described, they presented subjects with French words and
non-words mixed or blocked as to gender; prior knowledge of gender, as
of stress pattern, facilitated performance to a significant extent in the
non-word condition, though not in the word condition. Hirst and Pynte
concluded that prior information about an arbitrary lexical feature such as
gender or stress allows a subcategorisation of the lexicon, so that only the
relevant entries need be consulted; in the case of non-words, the number
of items to be fruitlessly searched is thus significantly reduced by the ap-
propriate information. An alternative explanation, involving not partition-
Stress and Accent 81
ing of the entire lexical space but access from each entry of simply the
stress pattern information alone, is also possible - such a procedure would
also greatly increase the speed with which nonwords could be checked
against potential entries. But whichever is the case, the implication is that
stress is not merely information which becomes available on access of a
word's lexical entry, but is of use to guide lexical access, i. e. enable only
those entries with appropriate stress patterns to be fully accessed. This ex-
planation again depends upon the assumption that stress is a feature of the
lexical entries for words.
It might be argued, of course, that the phenomenon of vowel reduction
in unstressed syllables makes the interpretation of much of the evidence
cited above somewhat doubtful. Since changes of stress pattern usually en-
tail changes of vowel quality, the apparent perceptual effects of lexical
stress may in fact not be exercised directly, but only indirectly via their
segmental side-effects. However, a recent experiment by Cutler and Clif-
ton (1983) demonstrates that when vowel quality is controlled, lexical
stress information itself can indeed be shown to play a part in the word re-
cognition process. The experiment was motivated by a prior study by Ga-
nong (1980), who found that the typical stop consonant identification
function for synthetic stimuli varying in voice onset time (VOT) could be
affected by lexical factors. If, for example, the same [t]-[d] continuum is
prefaced to the syllables [ik] and [ip], in one case the [t] version forms a
word (teak) while the [d] version does not, while in the other case the [d]
version is a word (deep) whereas the [t] version is not. Using many such
pairs, Ganong found that subjects characteristically shifted the crossover
point of their identification function towards the short-lag end on the
V O T continuum (i. e. reported more voiceless than voiced stops) when the
voiceless stop made a word while the voiced stop did not, and shifted it
towards the long-lag end (i. e. reported more voiced than voiceless stops)
when the voiced stop made a word but the voiceless stop made a non-
word. Cutler and Clifton prepared an analogous set of stimuli based on
the words tigress and digress, which contain identical segments from the
second segment on, but differ in stress pattern. Using synthesised versions
of these words in which the initial consonants varied in VOT, Cutler and
Clifton found a similar effect of lexical status to that found by Ganong:
when stress fell on the first syllable, subjects shifted their identification
function crossover so that they reported more [t] than [d], whereas they
reported more [d] than [t] when stress fell on the second syllable. Cutler
and Clifton interpreted this result as indicating that lexical stress alone —
independent of its effects upon vowel quality - can be of primary impor-
tance in word recognition.
Thus the production and comprehension evidence are in full agreement:
lexical stress is an important part of word pronunciations as listed in the
mental lexicon. Moreover, it is, both directly and indirectly via its segmen-
82 A. Cutler
Accent
There are therefore three primary candidates for the most prominent syl-
lable in the sentence: the first syllable of paper; the second syllable of pre-
pared, and the word word which is the stress-marked part of the com-
pound word processor. Any of these three words can bear accent if they are
the focus of the speaker's message, i. e. if the speaker wishes to express or
imply contrast with, or lay emphasis upon, that particular part of his utter-
Stress and Accent 83
ance. Accent on paper, for instance, could imply contrast as a reply to (2),
or express emphasis as a reply to (3):
(3) What did you say was prepared on the word processor?
(4) What do you mean, the word processor speeds things up? This paper
took five days to prepare, and this paper was prepared on the word
processor.
See Ladd (1980) and Cutler and Isard (1980) for further examples of deac-
centing.
If none of the lexical items is individually marked for focus, then accent
must be assigned by default. The default case, in which no word is singled
out for particular emphasis, and contrast and deaccenting are not in-
volved, is equivalent to focus on the clause as a whole (see Ladd, 1980, for
a good discussion of this), in which case the accent falls on the rightmost
lexical item - in our example, on word, in its capacity as the stressed syl-
lable of word processor. If word processor has been marked for deaccenting,
the default accent shifts back one and falls on the verb. That is to say, the
default location for accent is the rightmost lexical item not deaccented. (It
is even possible to contrive a case in which only the word word had previ-
ously occurred in the context, not the whole compound word processor; in
this case only word by itself would be marked for deaccenting, and proces-
sor would still be available to the right of it to receive accent:
84 A. Cutler
(5) This paper about stress on words was prepared on the word proces-
sor.
(8) She had a lot of cups but the one she gave me leaked.
(9) The only other place I've seen that kind of thing is in Rogers Park -
there's nothing like it right around where we live - where we live.
In (8), for instance, the accent should indeed have been placed on a pro-
noun, but the error consisted in assigning it to the wrong pronoun. Again,
in (9) the assignment of accent to live in 'There's nothing like it right
around where we live' is syntactically perfectly acceptable, but was seman-
tically anomalous in the context.
A corollary of this is that erroneous accent placement is only ever de-
tected by the hearer when it results in semantic anomaly. If it doesn't, the
hearer will simply assume that focus was placed where the speaker in-
tended to place it, with the accented item representing what the speaker
considers to be most important. Thus it is up to the speaker to correct ac-
cent placement if it is important to him to avoid giving a wrong impres-
sion; and correction of accent placement is in fact quite common in every-
day speech.
Again the evidence from sentence understanding complements the pro-
duction evidence: hearers pay great attention to where sentence accent
falls when they are understanding a sentence. This is clear from studies in
which the processing of sentence accent has been explicitly investigated.
For instance, experiments using the phoneme-monitoring task show that
Stress and Accent 85
(10) The top experts were all unable to break the code the spy had used.
The target-bearing word itself was then spliced out of each recording and
replaced in all versions by identical copies of the same word excised from
a third, 'neutral', rendition of the sentence. The result of this manipula-
tion was two versions of each sentence, each version with acoustically
identical target-bearing words, but with differing prosodic contexts; in
one case the prosodic contour surrounding the target-bearing word was
consistent with accent falling on the target-bearing word, in the other it
was consistent with the target-bearing word being unaccented. Under
these conditions the 'accented' targets still elicited faster responses than
the 'unaccented', and since the only differences between the two versions
of each sentence lay in the overall prosodic contour, it was concluded that
the listeners must have used cues in the prosody to direct their attention to
the sentence accent.
Subsequent experiments investigated the nature of these cues; the pre-
dicted accent effect was found to be unaffected by cues to stress contained
in the duration of closure of the target stop consonant (Cutler & Darwin,
1981), indicating that the effect is not localised in the tenth of a second or
so immediately preceding onset of the target word. Cutler and Darwin
also found that removing pitch cues - i. e. monotonising the sentences -
did not do away with the accent effect; in monotonised spliced sentences
like (10) the 'accented' targets were still responded to faster than the 'un-
accented' targets. In other words, fundamental frequency variation is not
a necessary component of the prosodic information pattern on which sub-
jects can base their search for sentence accent. Other cues have proven
sufficient; in the present experiment the available cues included both dura-
tional and amplitude information. As Cutler and Darwin pointed out,
86 A. Cutler
however, this does not imply that either durational or amplitude cues must
be necessarily present for the accent effect to be found. There is ample
evidence from studies of prosodic cues to the perception of lexical stress
and to the perception of syntactic structure that the human speech proces-
sor can derive a given piece of information from any of a wide range of
alternative cues. For instance, the location of a syntactic boundary in a
syntactically ambiguous sentence can be identified on the basis of funda-
mental frequency variation (Lea, 1973; Collier & ' t Hart, 1975) or dura-
tional variation (Lehiste, Olive & Streeter, 1976; Scott, 1980). Likewise
both fundamental frequency (e.g. Cheung, Holden & Minifie, 1977) and
duration (e.g. Nakatani & Aston, 1978) serve as effective cues for the per-
ception of lexical stress.
Thus the perception of sentence accent most probably has in common
with the perception of lexical stress and the prosodically guided location
of syntactic boundaries the fact that they can all be achieved by the use of
quite a variety of cues. The question which must be asked at this point is
why prosodic cues to sentence accent are so efficiently employed. That is
to say, we have already seen that lexical stress is an intrinsic part of the
lexical representation of words, and that stress information therefore
serves to aid word identification. Word identification is obviously the
most crucially important part of language understanding. Similarly, syntax
is a vital part of understanding a sentence, and it is not surprising that any
prosodic cues to syntactic structure should be exploited without hesita-
tion. Is sentence accent, however, in the same league as lexical identity and
syntax when it comes to intrinsic importance within the sentence, and
hence usefulness to the sentence comprehension process?
Consider the written representation of language. Each word has an or-
thographic identity and word boundaries are marked with spaces; word
identification is comparatively easy. Syntax is explicitly represented where
necessary by marks of one kind or another. Sentence accent, however, is
usually not represented at all - in rare cases, usually in more casual writ-
ing, words may be underlined or capitalised for emphasis, but there is no
explicit orthographic convention which signals the location of primary ac-
cent in the way that, for instance, a comma signals the location of a syn-
tactic boundary.
Sentence accent is a property of the spoken sentence only. In the utter-
ance it communicates information structure, as we have seen - focus or
contrast. There are other devices for expressing focus, it is true. For exam-
ple, certain syntactic constructions - e. g. clefting, pseudo-clefting, topical-
isation - serve to bring elements of the sentence into the foreground.
Their use is, however, comparatively rare. The information structure of a
sentence is also affected by the structure of the discourse in which the sen-
tence is uttered. However, the important thing to note is that other de-
vices are subordinate to prosodic focussing; accent overrides both dis-
Stress and Accent 87
course cues to what might be most important in the sentence, and syntac-
tic cues. To take a simple example, in the cleft sentence (11) the focus of
the sentence, and hence the location of sentence accent, would in the de-
fault case be assumed to be Sandy.
This is not a major point; it is only to emphasise that syntactic cues to fo-
cus cannot override accent - accent always determines information struc-
ture. Thus the new information in (12) is about starting the fight.
Given that accent is the primary cue to what the speaker considers the
most important part of his utterance, then, it is clear that using cues in the
prosodic contour to direct one's attention to the accented words as speed-
ily as possible would be an effective way of getting quickly to the most im-
portant information. Thus it is no surprise to find that one can mimic the
accent effect in phoneme-monitoring by manipulating information struc-
ture alone without changing the accent. Cutler and Fodor (1979) demon-
strated this by having subjects listen to sentences which were preceded by
questions focussing on one or other part of the sentence. Thus sentence
(13), for instance, could be preceded either by question (14) or (15):
Half the subjects in the experiment heard (13) preceded by (14), and half
heard it preceded by (15). The sentences were recorded only once, and no
strong accent was placed on any item. That is, the speaker who recorded
(13), for instance, endeavoured to assign approximately equal accent to
each of the six content words in the sentence. It is nevertheless the case, as
we have seen, that in English in the default case accent is stronger towards
the end of the sentence, so that it is possible that accent was perceived to
fall more strongly at the end of the sentence than elsewhere. However, the
important fact is that subjects who heard the first question and subjects
88 A. Cutler
who heard the second question each heard acoustically identical versions
of the sentence which followed. Within each group half the subjects were
listening for a target in the early part of the sentence, focussed on by the
first question, while half listened for a target in the later part of the sen-
tence, focussed on by the other question. In (13), for instance, the targets
were / b / - on bestseller- and / k / - on congressman. Responses to the early
target were faster if subjects had heard the question which focussed on the
early part of the sentence, but responses to the late target were faster if
subjects had heard the question which focussed on the end of the sen-
tence; that is, focussed targets were responded to faster.
An overall effect of perceived accent is presumably responsible for the
further finding that responses to the later targets were generally faster
than responses to the early targets. However, the interesting result is that
focussing on a particular part of the sentence leads to faster responses to
targets at that point, just as accenting a particular part of the sentence pro-
duces faster responses to targets on the accented words. Focus behaves a-
nalogously to accent with respect to its effect on phoneme-monitoring
reaction time. This allows us to conjecture that the accent effect may in
fact be a focus effect; in making use of the prosody to direct their atten-
tion towards accented words, listeners are actually doing this because the
accented words are focussed words. Listeners appear to exploit whatever
cues are available - discourse cues (as in the focus experiment) where they
exist; prosodic cues (as in the accent experiments) where these are there to
be used. In other words, sentence accent perception directly decodes the
information which was encoded in the production of accent; accent re-
presents focus, and perception of accent is perception of focus.
Supporting evidence can be found in the way the ability to use prosodic
cues to accent is developed. Cutler and Swinney (1980) reported two ex-
periments investigating the perception of sentence accent and sentence fo-
cus in young children. These experiments were similar in design to the
perception experiments described above, except that the children moni-
tored for whole words rather than individual phonemes. In the first of
these experiments it was found that young children (aged between four
and six) did not respond significantly faster to accented than to unac-
cented words, although older children showed an accent effect exactly
analogous to the effect in adults. The second experiment tested only four-
to six-year-olds and found that among these, only the older children
showed a focus effect. Although the development of the accent and focus
effects has yet to be monitored in the same children, the results so far
seem to imply that the development of the accent effect is dependent on
the development of the focus effect. That is, until children have learnt that
it is a good idea to listen within sentences for the new, focussed, most im-
portant information, they are not going to be able to develop the relatively
sophisticated techniques which adults use to zero in as quickly as possible
Stress and Accent 89
on sentence focus — e. g. tracking the prosodic contour for the cues it gives
to where accent, and hence focus, is going to fall.
In the children, however, we do find a dissociation between the percept-
ual and productive function of accent. Interestingly, it is a dissociation in
the reverse direction to that usually found in children's (or any other) lan-
guage performance, in that for once productive abilities seem to be ahead
of perceptual. That is to say, the very children who cannot respond faster
to accented words, i. e. do not show the adult effect of directing special at-
tention to accented words and anticipating where they will come, neverthe-
less correctly assign sentence accent to new information and produce pros-
odic contours which embody all the information adults need to predict
where accent will fall.
Conclusion
The picture that emerges, then, is that the complementary role of prosodic
perception and prosodic production develops comparatively slowly. Once
adult language competence has been attained, however, the role of pros-
ody in production and in perception is reciprocal: two sides of one coin.
Word stress patterns are part of the lexical identity of words, not arbitrar-
ily assigned by rule; thus in language production lexical stress patterns are
part of the information about each word which is stored in the mental lex-
icon and retrieved when the word is looked up as a sentence is spoken.
Similarly, identification of stress pattern is part of word identification and
is used in the process of looking up a word in the mental lexicon during
the understanding of a sentence. Sentence accent, on the other hand, ex-
presses the information structure of a sentence; when a sentence is pro-
duced the speaker assigns accent according to what he considers to be the
more and less important parts of what he is saying. A listener hearing a
sentence finds it important to identify the location of accent and uses all
available cues to assist him in his search; and the reason why accent is so
keenly sought appears to be precisely because it expresses focus - thus the
perception of accent is as intimately connected with the information struc-
ture of the sentence as is accent production.
References
Bansal, R. K. (1966). The Intelligibility of Indian English. Ph. D. Thesis, London University.
Browman, C. P. (1978). Tip of the Tongue and Slip of the Ear: Implications for Language Pro-
cessing. UCLA Working Papers in Phonetics 42.
Brown, R. & D. McNeill (1966). The "tip of the tongue" phenomenon. Journal of Verbal
Learning & Verbal Behavior 5: 325-337.
90 A. Cutler
1. The problem
The question of where and why casual speech phenomena are found (i. e.
their extralinguistic conditioning) is independent of the question how they
are to be described within grammar. This second question is the primary
interest of my investigation. Such a narrowly defined domain might not
satisfy many of the researchers into linguistic variation in general and
fast/casual speech in particular. I hope, however, to satisfy grammarians,
and particularly phonologists, who have been most concerned with these
phenomena. When confronted with speech other than 'normal' or An-
dante, i.e.
moderately slow, careful, but natural; typical of, for example, delivering a lecture or teaching
a class in a large hall without electronic amplification (Harris 1969: 7)
2. The data
These clear cases of assimilation which abound in fast speech are often fed
by various deletion processes which the literature on speech style differen-
tiation is also full of. Let us consider some of the examples of deletion and
deletion feeding assimilation:
Gimson, 1980:
I think they are a nice young couple, don't you?
1 A regular obligatory process in M H is the loss of a vowel in the first syllable of non-verbal
forms when a stressed suffix is appended (cf. [kajur] [kfurim] 'tied' Bolozky, 1977:
223).
94 G. Dogil
(English)
English sometimes like Welsh deletes the whole syllable cf. [ N kaenik] (me-
chanic), or does not delete a vowel: eg., 'deflation', 'revised', etc.
The problem of 'is'/'has' reduction, as well as reduction of other func-
tion words, is an area for itself. Data here is extremely rich and interesting
and I will try to present the most pertinent parts of it, as I think it provides
a very interesting battleground for conflicting phonological theories.
The often pronounced rule of a text-book grammar of English, or any
other stress-timed language for that matter, says: 'form/function words
which are not under stress can be reduced, and/or cliticised, in fast/casual
speech (they are commonly called weak-forms)'.
Examples are quoted in abundance, (Zwicky, 1970; Selkirk, 1972; Sel-
kirk, 1980; Radford, 1981; Postal & Pullum, 1978). The rule, however,
which is true of such form words as determiners, complementisers and
conjunctions 4 does not apply so clearly to other types of non-lexical cate-
2
I think Bolozky's transcription [mkaenik] is incorrect. In my opinion we have to do with
the loss of the whole syllable and only the relic of it (autosegmentalised nasal quality see
§ 7) is present in form of prenasalization of [k] [Nkaenik]).
3
I am quoting here the transcriptions as they appear in Zwicky (1972), although they seem
to be wrong. In all the words quoted accent should appear on the penultimate rather than
the final syllable. Zwicky's mistake may be attributed to the misinterpretation of the tran-
scription systems in classic Welsh grammars where accent mark follows rather than pre-
ceeds the accented syllable.
4
But see Zwicky (1975: 608): "a radio and a television set"; [. . . rejdiou ij a . . .] not [rejdi-
oun 3 . . .]; 'go to an ogre' [gou t3 q owgr. . .] *[gou tan owgr. . .]; 'away in a manger"
[sweij q 3 . . .] *[3wejn 3 . . .]. The explanation on the lines of Giegerich (1981) will be of-
fered in § 7.
Fast/Casual Speech 95
gory items with strong/weak pairs. This fact has already been noticed and
described by Henry Sweet (1908), who mentioned the following examples:
Sweet's explanation for the constraint on reduction (cf. Sweet, 1980: 31,
1908: 68) was that weak forms can occur only with words they are struc-
turally connected with. That is, string adjacency is not the only require-
ment for fast speech rules to alter and reduce phonological structure: one
also requires structural adjacency. I shall provide the relevant data under a
separate heading. While fast speech rules require statements about struc-
ture adjacency in their SD part, only a phonological theory providing for
such adjacency in its representation can be descriptively adequate.
N o matter how fast one speaks, contraction (reduction) in the above men-
tioned cases is impossible. It is also not possible in other cases mentioned
for auxiliaries in Zwicky (1970).
The fact that it was she will be a blow to the party. *[1]
You and I have gone there once too often. *[3v]
The police boxed in the crowd. (If 'in' not structure-adjacent with
'the crowd'.)
The last examples, which we will discuss in § 7, even in the case of struc-
ture adjacency of 'in the crowd', where reduction of 'in' takes place, can-
not be cliticised to the verb: *[bokstn]
96 G. Dogil
A very specific kind of reduction and cliticisation has been noticed by se-
manticists and syntacticians. The case concerns 'to' reduction and cliticisa-
tion, thoroughly discussed in Lakoff, 1970; Selkirk, 1972; Lightfoot,
1976; Chomsky & Lasnik, 1977; Postal & Pullum, 1978; Andrews, 1979,
among others. Here are some relevant data concerning English verbs fol-
lowed by 'to' which can be reduced in fast speech:
Notice, however, that the items undergoing this process have to be struc-
turally adjacent:
Beside these structure adjacency violations, which are visible from surface
syntax, to-contraction (adjuction) is constrained in more sophisticated
ways. Consider the following paradigm first noticed by Larry Horn:
Fast/Casual Speech 97
3. Pbonemics
are not free variants in any sense, but rather that there is a regular phone-
tic correspondence between these realizations. This goal, establishing
phonetic correspondence, is the weakest point of any phonemic theory,
not only in descriptions of speech styles. The only possiblity for a phon-
emicist is to perform analyses separately for each style, which leads to dif-
ferent phonemic solutions. So described styles are then defined as 'coexist-
ent phonemic systems' and the linguistic variation is accounted for as
'phonemic stability vs. instability'. Such an analysis obviously loses all the
intuitive insights about speech variation. Additionally, the very restrictive
view of phonology which is characteristic of pronouncements of phonemi-
cists, that phonological representation is unstructured and that nonphono-
logical information should not be used in phonemic analysis, forbids this
model of phonology to make any claim (observation) about the cases of
structural adjacency which I noted in § 2.2. I think this suffices to claim
that phonemic theory does not possess either necessary or sufficient ap-
paratus to describe speech styles, their variation and linguistic constraints
on their use. There is, however, one area in speech style research which is
fairly neatly covered by phonemics. The correspondence between the
choice of style and the choice of dialect (cf. Dressier 1972a: 15, 1975: 222)
is, in my opinion, sufficiently accounted for by the definitions of 'coexist-
ent phonemic systems' and 'phonemic stability vs. instability' (cf. Fries &
Pike 1949). However, the problems of regular phonetic correspondence
between styles and across styles and the structure dependence of fast
speech rules remain unaccounted for. I will argue that this is a problem of
Fast/Casual Speech 99
5
Gussmann (1980 ch. 1) has convincingly argued that TGC makes impossible any signifi-
cant conception of phonological or morphological structure.
100 G. Dogil
I like Edinburgh
Slow-careful Φ Φ φ φΦ φ Φ
äe ldek ed η blrou
Φ φΦ ί
Casual-moderate φ φ φ
äe ldek ed η btroa 0
Fast-allegro $ $ $ $ $
äe ldeked n^traa®
Fast-allegrissimo $ $ $ $
ä ldked mblra u
Figure 1
from normal speech. Under closer investigation all these constraints turn
out to be universal, rather than language or style specific. I will offer an
explanation for them in § 7. Bolozky (1977: 221) mentions the normal
speech processes and constraints that do not apply in fast speech. That
some normal speech constraints are loosened in fast speech, or even new
contrasts emerge (cf. Gussmann, 1980: 127-28), is self-evident, and can be
accounted for on universal grounds (cf. § 5, § 7 following).
The elimination of a speech process from fast speech is a more compli-
cated issue, and can be accounted for in terms of process phonology,
which the NGP is not. The only example (quoted after Rudes 1976) where
the process
V [ a long] / [a voice]
in English is banned from fast speech is not very convincing. 6 Besides that,
Bolozky concentrates on TEMPO as a variable in fast/casual speech thus
introducing an undefined concept which is alien to his theory. The depen-
dence of derivational constraints on tempo has to be shown using the ap-
paratus of one's own theory and not assumed. Calling some phenomenon
'TEMPO bound' is disguising but not explaining it.
Summarizing: I am very critical of NGP as a model for describing fast
speech. First of all, it is not structured enough to account for phenomena
6 The whole problem may lie in the improper phonetic interpretation that Rudes provides.
The absence of lengthening of /i/ in front of a flap in [hint] which is a fast speech variant
of [hlttt] can be explained in such a way that flapping is understood primarily as laxing rule
and not as voicing rule. The same applies f o r glottalization in [hi' ni:a] which is not fol-
lowed by vowel shortening (cf. Kiparsky, 1979: 437); Gussmann (1980: 1 2 7 - 2 8 ) provides a
much better example).
Fast/Casual Speech 101
7 In Dressler's polycentristic model the idea of natural process is generalized to all other
components of grammar (cf. Dressier, 1977).
8 This correlation, which has been convincingly established in many places, may dull the at-
tention which has to be paid to phonetic and phonological detail. If this is not the case we
may be confronted with notional fallacy of describing a form by its function, which was
abolished in linguistic methodology half a century ago, but which unfortunately crops up
in some of the N P P analyses.
102 G. Dogil
A language X can be said to choose among this class (of universal phonetic substitutions,
G. D.) all the processes operative in X, and adapt them in such a way that they act as lan-
guage-specific morpheme structure conditions and phonological rules respectively, produ-
cing but language-specific phonetic output forms; that is to say, the application of natural
phonological process within a particular language is controlled by language-specific con-
straints (Herok & Tonelli, 1979: 44-45).
As I understand it, there is nothing then to prevent the change of the con-
straints along with the change of chosen processes and limitations of
them. The data in § 2, however, shows clear limits on application which
do look like constraints; cf. Zwicky's Welsh and English data (§ 2.1).
I will assume then, that some form of phonotactic or similar constraints
have to be added into the description of fast speech data if such a descrip-
tion is to be sufficient (cf. § 7). Another area where the NPP seems to be
insufficient is connected with the data in § 2.2. I have provisionally men-
tioned that any adequate account of this data has to have a possibility of
making statements about structural adjacency of the items which undergo
' To put it bluntly, why isn't [papapa . . .] or something of this sort an optimal outcome of
NPP (I am not sure about the [p] as an optimal outcome; however, the unconstrained natu-
ral application of universal substitutions for vowels does ultimately lead to [a] (cf. Done-
gan, 1979)).
Fast/Casual Speech 103
changes in fast speech. This provides a real problem for the NPP analysis,
as it views phonological strings as unstructured. This proviso definitely
applies to any syntactic, morphological or semantic structuring of a pho-
netic string, which automatically leaves 'to-adjunction' phenomena men-
tioned in § 2.2 unexplained. Donegan and Stampe (1978: 27 ff.) stress ex-
plicitly the prosodic structuring of phonetic strings, but the only prosodic
concepts they have worked out are the syllable and the accentual measure.
This enables them to explain a lot of fast speech data, but the lack of
other, higher, prosodic categories leaves another stock of data unex-
plained.
Summing up: although being highly sympathetic to Dressler's state-
ment, "Das geeignetste phonologische Modell für die Allegroregeln
scheint D. Stampe's 'Natural Phonology' zu sein", I am obliged to say that
in its present form this model is not sufficient to describe the relevant
data. It seems obvious to me that the representations with which the
model works have to be enriched in structure. In view of the data in § 2.2,
the very restrictive conception of phonological representation cannot be
maintained; otherwise, very important observations have to be left unex-
plained. Leaving 'to-adjunction' and similar phenomena aside, the repre-
sentatives of the NPP have noticed themselves the relevance of grammati-
cal and lexical considerations in speech-style related areas. For instance
Dressier (1975: 227) points out that Allegro rules can be observed in dia-
chronic phonology especially as: " . . . 'unregelmäßiger Lautwandel' und
Mutilationen, die sich besonders bei Auxiliarformen, Pronomina, Arti-
keln, Partikeln . . . in Dokumenten vergangener Sprachepochen finden".
The phonology which programmatically excludes grammatical and lexical
information can make only anecdotal, theory-alien observations of such
phenomena. Dressler's extension of the NPP in the form of a polycentrist-
ic grammar is in principle able to account for such interactions between
various components; however, we are still waiting for an explicit account
of natural processes in other components, and of naturalness conflicts (for
some preliminary attempts cf. Dressier, 1977a, b; Dogil, 1980, 1981). It is
to be stressed, however, that a process phonology, an example of which is
the NPP, provides much better account of speech variation than a taxono-
mic phonology. Let us consider then another model of process phonology,
which unlike NPP has always payed very much attention to structure.
This model is Standard Generative Phonology.
α cor
α cor
β ant
β ant
[ + nasal] γ back ANDANTE
γ back
δ distr
δ distr
+ obstr
α cor α cor
β ant β ant
[ + nasal] / — (#) ALLEGRETTO
γ back γ back
δ distr δ distr
+ obstr
move constituents around but that they leave 'traces' in the original places
out of which these constituents have been moved. 10
Selkirk (1972: 74 ff.) proposes that word boundary markers are intro-
duced in initial phrase markers, and when transformations move or delete
elements they leave the boundary markers unaffected. In this way she ex-
plains lack of copula contraction in English (cf. § 2.2), and the same
procedure can be used to explain 'to-adjunction'. 11
Consider the following pair of sentences:
10
For details cf. Radford (1981: 181-96).
11
But see Postal & Pullum (1978 p. 19ff.)
106 G. Dogil
This formulation enforced by the theory clearly misses the descriptive fact
that constraints on preaccentual deletion are quite different from the con-
straints on postaccentual deletion (cf. Dressier, 1975).
Still another area where the theoretical assumptions of SGP lead to un-
desirable results when fast speech data are considered is the account of
phonotactic constraints on representations. The classic SGP (cf. Chomsky
& Halle 1968; Stanley, 1967) accounts for these in the form of language
specific, linear 'morpheme structure constraints'. Thus there is a MSC in
English which prohibits two obstruents from following each other at the
beginning of a word. This constraint, and also many others, are falsified
by the data presented in § 2.1 (cf. Bolozky's and Zwicky's data).
Last but not least, one of the most important goals of SGP, the attempt
to cover the total range of morphophonological variation with the same
type of phonological rules, makes it impossible to capture the observation
that it is only the most 'natural', minimal phonetic substitutions which fi-
gure in speech variation accounts. The common procedure of saying that
it is the 'late phonological rules' which are a matter of study does not
make the issue in any way clearer, as the SGP is not in a position to define
what is meant by 'late phonological rule' and how it is to be distinguished
from 'early phonological rule'. Of the many attempts which try to do that
(Koutsoudas, 1977; Wurzel, 1970; Dinnsen & Eckman, 1977; Leben,
1977; Linell, 1977, etc.), the most promising seems to be the hypothesis of
the so-called 'strict-cyclicity' of phonological rules. This hypothesis distin-
guishes between 'cyclic' and 'post-cyclic' phonological rules. It seems to
me that most of the rules which figure in speech style variation could be
subsummed under the latter group (cf. Mascaro, 1976; Rubach, 1981).
Summing up, one is obliged to say that the SGP, although providing a
sound frame for the description of fast speech, still does not provide a
model which would define the special character of stylistic speech varia-
tion. Its representations, features and rules do not make these phenomena
follow from the way they are represented, but leave them rather as no
Fast/Casual Speech 107
more than arbitrary typological observations. Let us look then at the most
recent development of generative phonology, which attempted a basic re-
vision of some of the postulates of the SGP, and see how this accounts for
the phenomena noted in § 2.
tonological tier: L Η Η
a s p a s r a g s s segmental tier
syllabic tier
12
The type of constraints I have in mind could be something like the following:
"Once a certain feature is autosegmentalised, it has to be autosegmental everywhere in a
given language, i. e. there is no possibility of autosegmentalisation of features which apply
to individual contexts or parts of representations only."
or something like suggested (but not followed) by Clements, 1977:
"Each autosegmental tier has to be affected independently by rules specific to it."
Fast/Casual Speech 109
Syntax ->· I will assume here one of the versions of the so-called
T-model proposed within the Extended Standard
Theory of Generative Grammar. 13
13
Specially two models may be taken into consideration: that of Chomsky & Lasnik (1977):
Base Rules
i
move-α
1
S-Structure
Base Rules
1
move-NP
I Binding
NP-structure - {LF, Indexing - semantic
i Construal Interpretation
move-Wh
i
S-Structure
i
Filters
Deletions etc.
Phonetic Interpretation
110 G. Dogil
Fast/Casual Speech 111
14
I assume that only natural phonological processes are sensitive to prosodic structure.
112 G. Dogil
scope of a foot (Σ) (cf. Herok & Tonelli examples in § 2.1). In faster
speech, however, the scope of this rule becomes greater; in German it will
be a word (ω), in Italian a phrase (Φ). The same is true of all other assimi-
lation processes noted in § 2.
Prosodic structure, understood as a specific mapping of syntactic struc-
ture, also explains the fact why assimilation does not take place in certain
contexts. Let us consider the notorious 'to-adjunction' case in some detail.
The phonological process concerns a coronal consonant in a specific
context: (deletion after a nasal, deletion after another coronal, voicing as-
similation as in 'hafta', flapping, etc.). All of these natural processes,
which in English are foot bound (their scope is a foot), can in fast speech
apply within the domain of a word (ω), or a phrase (Φ). They are barred,
however, from application on the higher prosodic domains. That is why
the application of assimilation in such cases where verb and 'to' belong to
two different prosodic phrases, or even intonational phrases, is not possi-
ble. Consider the following examples from § 2.2:
((They want)i (to all intents and purposes)i (to destroy us)i) u
* They wanna . . .
Sometimes, however, even phrase internal 'V + to' structures do not con-
tract; consider the following:
This is captured within metrical theory by the assumption that the pros-
odic structure is a mapping of the syntactic structure. The syntactic struc-
ture for b) will contain an empty category15 between a verb and 'to'; it will
look something like:
This empty category is visible to the assimilation rule, i. e. the rule cannot
apply within this context. There are certain problems with this analysis,
which have been noted very often in recent literature on syntax (cf.
Chomsky & Lasnik, 1977: 78; Postal & Pullum, 1978; Andrews, 1978;
Jaeggli, 1980; Chomsky, 1981), because [e] (also called wh-trace), is not
the only empty category figuring in the syntactic structures which are
15
For a syntactic motivation for this and other empty categories see Chomsky (1981).
Fast/Casual Speech 113
Notice that the constraint does not specify whether these are the catego-
ries of NP-structure, S-Structure, Logical Form, or any other relevant re-
presentation, thus providing for prosody to be an independent module in
Generative Grammar, "filtering the sequences generated by the syntactic
(or other, G. D.) component" (cf. Ronat, this volume) 16 . This constraint
also provides a neat explanation for a large portion of reduction data men-
tioned in § 2. Consider the impossibility of reduction even in the fastest
speech in cases like:
16
Such a broad formulation of the constraint on prosodic processing opens a possibility of
accounting for apparent lexical exceptions to 'to-adjunction' (cf. Postal & Pullum, 1978;
Andrews, 1979). We can claim, for instance, that lack of assimilation in cases like 'thought
to -» thoughta' is due to the thematic structure of these verbs.
114 G. Dogil
Under metrical theory as assumed here (cf. Selkirk 1980a: 19ff.), non-Iex-
ical items may be reduced in English 17 if they occupy a weak position in a
phrase. Selkirk describes it by means of the Monosyllable Rule:
<ow
I
Σ -*• a w , if ω dominates a non-lexical item.
I
σ
In a), the prepostion 'in' does not reduce because it is the only ω within a
phrase; in b) it reduces as it is cow relative to 0)s within the same phrase.
This difference is explained by the difference brought up by mapping of
the S-structures.
The S-structure for a) is approximately:
the empty head of the underlined function words will be counted as a pos-
sible salience bearer. Through a prosodic restructuring, the function
words assume the strong position in their phrases, and thus become im-
mune to reduction by the monosyllable rule (for one possible form of this
restructuring see Dogil, 1980b; Ladd 1980; Ronat, this volume).
17
This seems also to be true for German and, I would expect, other stress-timed languages
as well.
Fast/Casual Speech 115
In the cases presented above the reduction or its lack has been explained
by the constraints on mapping, but there are also constraints specific to
the prosodic component itself that restrict contraction. For instance, if
three function words come together in a phrase, one of them (possibly the
middle one) will be strengthened and thus may not be reduced (cf. Giege-
rich, 1981; Williams, 1980 on similar prosodic transformations).
Prosodic constraints can also explain many peculiarities of another area
typical of fast/casual speech, namely cliticization. I will not discuss any
details of cliticization here (cf. Zwicky, 1977). Let me mention, however,
some prosodic impediments which the MAP can capture. Cliticization is
usually understood as a adjunction of reduced material to the elements
immediately adjacent in the string. However, sometimes cliticization does
not follow reduction. MAP picks up those places in a fairly natural fash-
ion. One such case is when the reduced word would violate the phono-
tactic constraints on the syllable structure. Note that MAP defines the
principles on construction and prominence, as well as the syntactic domain
for each of the prosodic categories. Thus the syllable will also be defined.
The English syllable, for example, is defined as a metrical tree of the
following type shown in Figure 4.18 Now if a reduced word would in any
sense violate this template it may not get cliticized.
" Cf. Halle & Vergnaud, 1978, 1980; Kiparsky, 1979; Mohanan, 1979; Selkirk, 1980b, for
other proposals concerning the syllable structure.
116 G. Dogil
least one terminal S, with its sister to its right" we can see why English has
primarily enclitics and why cliticization goes more easily with final
stressed words and monosyllabic lexical items. (Consider from this view-
point Zwicky's data in § 2.)
Prosodic well-formedness on the level below word (ω), can also explain
some other apparent blocks to reduction. Consider, for instance, Bo-
lozky's and Zwicky's data on Modern Hebrew, Welsh and English men-
tioned in § 2. All of the reductions and lack of reduction there can be ex-
plained by well-formedness constraints on syllable structure (cf. Kiparksy,
1979; Lowenstamm, 1981; Kaye & Lowenstamm, 1982), and, in the cases
like 'deflation', 'revised', foot structure (the initial syllables are footed and
thus cannot be reduced; cf. Selkirk, 1980b; Dogil, forthcoming).
Summing up, we may say that MAP, with general principles assumed by
it, provides all necessary and sufficient criteria for the description of fast/
casual speech. This theory also possesses the necessary clarity and preci-
sion (cf. Vergnaud, 1977) so that the linguist who employs it is not forced
to adjust it to those features which are peculiar to the data he is examin-
ing. Quite to the contrary, the MAP model rationalises the special pro-
perties of fast/casual speech, in the sense that they follow directly from
the way the phenomena are represented. This does not presuppose that
MAP is the final stage in the research of those phenomena. Socio-, psy-
cho- and discourse-context features (cf. Dressier, 1975; Postal & Pullum,
1978) play a very important role, and have to be scrutinized in any full ac-
count of speech style variation; however, they may only function as an in-
dependent addition to the basic grammatical description.
References
Dogil, G. (1980a). Elementary accent systems, wiener linguistische gazette 24: 1-23. Revised
version published in Dressler et al. (eds.) (1981: 89-101).
Dogil, G. (1980b). Focus marking in Polish. Linguistic Analysis 6: 221-45.
Dogil, G. (forthcoming). Destressing or defooting in English: on deciding between linear
and non-linear phonology. Folia Linguistica Europea.
Donegan, P. (1978). On the natural phonology of vowels. Working Papers in Linguistics 23.
Ohio State University: Columbus.
Donegan, P. & D. Stampe (1978). The syllable in phonological and prosodic structure. In
Bell & Hooper (eds.) (1978: 25-35).
Drachmann, G. (ed.) (1975). Akten der 1. Salzburger Frühlingstagung fiir Linguistik. Tübingen:
G. Narr Verlag.
Dressler, W. (1972a). Phonologische Schnellsprechregeln in der Wiener Umgangssprache.
wiener linguistische gazette. 1: 1-29.
Dressier, W. (1972b). Allegroregeln rechtfertigen Lentoregeln. Innsbruck: IBS.
Dressler, W. (1975). Methodisches zu Allegroregeln. In Dressier et al. (eds.) (1975:
219-234).
Dressler, W. (1977a). Elements of a polycentristic theory of word formation, wiener linguis-
tische gazette 15: 13-32.
Dressler, W. (1977b). Grundfragen der Morphonologie. Wien: Verlag der Österreichischen
Akademie der Wissenschaften.
Dressler, W. & F. MareS. (eds.) (1975). Phonologica 1972. München: Fink Verlag.
Dressler, W. & O. Pfeiffer (eds.) (1977). Phonologica 1976. Innsbruck; Innsbrucker Beiträge
zur Sprachwissenschaft.
Dressier, W., O. Pfeiffer & J. Rennison (eds.) (1981). Phonologica 1980. Innsbruck; Inns-
brucker Beiträge zur Sprachwissenschaft.
Fries, W. & K. Pike (1949). Coexistent phonemic systems. Language 25: 29-50.
Gibbon, D. (1981). A new look at intonation syntax and semantics. In A. James and P. West-
ney (eds.). New linguistic impulses in foreign language teaching. Tubingen: G. Narr. 71-98.
Gibbon, D. (this volume). Intonation as an adaptive process.
Giegerich, H . (1981). On the nature and scope of metrical theory. Bloomington: IULC.
Gimson, A. (1980). An introduction to the pronounciation of English. London: Arnold.
Goldsmith, J. (1976). Autosegmentalphonology. Bloomington: IULC.
Gussmann, Ε. (1980). Studies in abstract phonology. Cambridge, Mass.: M I T Press.
Halle, M. & J.-R. Vergnaud (1978). Metrical structure in phonology. Ms., MIT.
Halle, M. & J.-R. Vergnaud (1980). Three dimensional phonology. Journal of Linguistic Re-
search 1: 83-106.
Harris, J. W. (1969). Spanish phonology. Cambridge, Mass.: M I T Press.
Harris, J. W. (1980). Nonconcatenative morphology and Spanish plurals. Journals of Linguis-
tic Research 1: 15-33.
Herok, T. & L. Tonelli (1979). How to describe phonological variation. Papers and Studies in
Contrastive Linguistics X: 41-55.
Hooper, J. (1976). An introduction to Natural Generative Phonology. New York: Academic
Press.
Jaeggli, O. (1980). Remarks on To contraction. LInq. 11.1.
Joos, M. (ed.) (1963). Readings in Linguistics I. New York: American Council of Learned So-
cieties.
Kahn, D. (1976). Syllable Based Generalizations in English Phonology. Bloomington: IULC.
Kaye, J. & J. Lowenstamm (1982). Syllable structure and markedness theory. GLOW, Pisa
(proceedings to appear).
Kenstowicz, M. & Ch. Kisseberth (1977). Topics in Phonological Theory. New York: Academic
Press.
King, H . (1970). On blocking the rule for contraction in English. LInq. 1: 134-36.
Kiparsky, P. (1979). Metrical structure assignment is cyclic. LInq. 10: 421-41.
118 G. Dogil
Kiparsky, P. (1981). Remarks on the metrical structure of the syllable. In Dressler et al.
(eds.) (1981: 245-257).
Koutsoudas, A. (1977). On the necessity of the morphophonemic-allophonic distinction. In
Dressler et al. (eds.) (1977: 121-127).
Ladd, R. (1980). The Structure ofInternational Meaning. Bloomington: IULC.
Lakoff, G. (1970). Global rules. Language 48: 76-87.
Leben, W. (1977). On the interpretive function of phonological rules. In Dressler et al. (eds.)
(1977: 21-29).
Lightfoot, D. (1976). Trace theory and twice moved NP's. LInq. 7: 559-82.
Linell, P. (1977). Morphonology as part of morphology. In Dressler et al. (eds.) (1977:
9-21).
Lowenstamm, J. (1981). On the maximal cluster approach to syllable structure. LInq. 12:
575-604.
Mascarö, J. (1976). Catalan Phonology and the Phonological Cycle. MIT dissertation. Distri-
buted by IULC.
MacCarthy, J. (1979). Formal Problems in Semitic Phonology and Morphology. MIT disserta-
tion, unpublished.
MacCawley, J. (1968). The Phonological Component of a Grammar of Japanese. The Hague:
Mouton.
Mohanan, K. (1979). On syllabicity. In Safir (ed.) (1979: 182-191).
Postal, P. & G. Pullum (1978). Traces and the description of Englisch Complementizer Con-
traction. LInq. 9: 1-29.
Postal, P. & G. Pullum (1982). The contraction debate. LInq. 13: 122-138.
Radford, A. (1981). Transformational Syntax: A Student's Guide to Chomsky's Extended Stand-
ard Theory. London: CUP.
Rennison, J. (1981). Bidialektale Phonologie. Wiesbaden: F. Steiner.
Riemsdijk, Η. & E. Williams (1981). N P structure. Linguistic Review 1: 171-217.
Ronat, M. (1982). Logical form and prosodic islands. This volume.
Rubach, J. (1981). Cyclic Phonology and Palatalization in Polish and Englisch. Warszawa: Uni-
wersytet Warszawski.
Rudes, B. (1976). Lexical representation and variable rules in Natural Generative Phonology.
Glossa 10: 111-150.
Safir, K. (ed.) (1979). Papers on Syllable Structure, Metrical Structure and Harmony processes.
Cambridge, Mass.: MIT.
Selkirk, E. (1972). The Phrase Phonology of English and French. MIT dissertation, distributed
by IULC: Bloomington.
Selkirk, E. (1980a). On Prosodic Structure and its Relation to Syntactic Structure. Bloomington:
IULC.
Selkirk, E. (1980b). The role of prosodic categories in English word stress. LInq. 11:
561-605.
Selkirk, E. (1980c). Prosodic domains in phonology: Sanskrit revisited. In Aronoff & Kean
(eds.) (1980: 107-129).
Skibniewski, L. (1980). Fast speech and syllabification in English. Ms., UAM Poznan.
Stampe, D. (1969). The acquisition of phonetic representation. CLS 5: 443-54.
Stampe, D. (1972). How I Spent my Summer Vacation. Ph.D. dissertation, distributed by
IULC: Bloomington.
Stanley, R. (1967). Redundancy rules in phonology. Language Ay. 393-435.
Stockwell, R., J. Bowen & I. Silva-Fuenzalida (1956). Spanish juncture and intonation. Lan-
guage 32. Reprinted i n j o o s (ed.) (1963: 406-18).
Sweet, H. (1890, 1908). The Indispensable Foundation. Repr. by Henderson (ed.) (1971). Lon-
don: Oxford University Press.
Vergnaud, J.-R. (1977). Formal properties of phonological rules. In Butts & Hintikka (eds.)
(1977: 299-318).
Fast/Casual Speech 119
Williams, E. (1980). Remarks on stress and anaphora. Journal of Linguistic Research 1;3:
1-17.
Wurzel, W. (1970). Studien zur deutschen Lautstruktur. Studia Grammatica VIII, Berlin.
Zwicky, A. (1970). Auxiliary reduction in English. LInq. 1: 323-36.
Zwicky, A. (1972). On casual speech. CLS 8: 607-15.
Zwicky, Α. (1977). On clitics. In Dressler et al. (eds.) (1977: 29-41).
ANTHONY FOX
II
Quirk, 1964; Crystal, 1969) but on a different basis: for him, subordina-
tion occurs with identical tones differing in pitch range.
4) A more elaborate approach is to recognise a larger intonational en-
tity with various kinds of internal structure. Though there are hints of this
kind throughout the literature, explicit proposals are few. Schubiger
(1958) suggests a unit with a structure analogous to that of the tone-
group: a string of tone-groups where one is 'nuclear'. Several types are re-
cognised with the nuclear tone-group appearing in various positions in the
sequence with either a rising or a falling tone. A more recent attempt in a
more explicit theoretical framework has been made by the present writer
(Fox, 1973) where a 'paratone-group', with various structural relations
within it, has been postulated.
All these proposals suggest quite strongly that the tone-group is not an
isolated entity, but contracts relationships with other tone-groups in se-
quences. Clearly, any investigation of the role of intonation in discourse
must attempt to establish the nature of these relationships and to evaluate
their significance for the structure of discourse.
Ill
H + L, L + L, L + Η , Η + Η
What kinds of relationships between the tone-groups in such sequences
can be established? T h e particular relationship that we shall investigate
here is that of dependency, i. e. we can establish 'subordinating' and 'co-or-
dinating' sequences, where the former has a dependent and an independent
tone-group, and the latter only independent tone-groups. Dependency is
in essence a co-occurrence relationship: a tone-group is subordinated to
another if its occurrence depends on the occurrence of that tone-group,
such that it cannot occur alone.
If we examine the above types of sequence from this point of view, we
find that different sequences appear to have different kinds of internal re-
lationships. The type H + L (where, it will be recalled, Η represents any
pattern not ending in a low pitch, e. g. high rise, low rise, level, fall-rise
etc., and L represents any low-ending pattern, e.g. high fall, low fall,
rise-fall) seems to be interpretable as forming a subordinating sequence
where the Η pattern is dependent on the L pattern. Thus, in an utterance
such as:
when I get 'back / I'll give you a ring
(where / indicates a tone-group division and ' and - indicate nuclear pitch
patterns) the first tone-group is subordinated to the second. T h a t this re-
lationship is not a grammatical one, despite the fact that it parallels the
grammatical dependency in this example, is shown by reversing the order
of the clauses:
I'll give you a 'ring / when I get "back
Provided the same sequence of patterns is used, an Η-type in the first
tone-group and an L-type in the second, the first tone-group is subordi-
nate in both cases.
We could attempt to characterise the dependency relations here in terms
of 'information distribution' or 'textual' status: the information conveyed
by the first tone-group is in some sense less significant than that conveyed
by the second. But the intention is that we should for the moment avoid
all such 'external' interpretations. T h e dependency relation here is in the
first instance a structural relation between tone-groups.
The relations established in the second type of sequence, L + L, appear
to be different from the ones just considered. Here the impression is not
of a dependent tone-group followed by an independent one, but rather of
two independent tone-groups in a co-ordinating relationship. An example
is:
I'll give you a "ring / when I get "back
where the falling patterns could also be rising-falling. Despite the gram-
matical subordination of the second clause, there is no intonational prior-
ity here; the two tone-groups appear as equivalent entities.
T h e L + Η sequence is more complex, and more difficult to analyse. In
124 A. Fox
fact, the analysis appears to depend on the pattern of the second tone-
group. The Η pattern may be a high rise, as, for example, in a typical 'tag
question':
(a) you're from "London / aren't you.
or a low rise, as in:
(b) I'm going to "Leeds / on -Friday
or a fall-rise, as in:
(c) we can start axgain / if you 'like
Examples (a) and (b) seem to differ in respect of the relationships be-
tween the tone-groups. In (b) it seems clear that the final tone-group is
subordinated to the first. Several interpretations of this well-known Eng-
lish pattern are found in the literature, such as 'a fall with a rise in the
tail', or a 'double tonic' with two nuclei, etc. But there is a consensus that
the final rise is in some way an appendix to the fall of subordinate status
(cf. Halliday's term 'minor information point' - Halliday, 1967). Here it is
treated as a sequence of two tone-groups, where the second is subordi-
nate. It can thus be seen as the reverse of the Η + L sequence discussed
above.
This interpretation will not do for example (a), however, which, despite
the apparent similarity to (b), is best seen as a coordinate structure, since
the two parts seem relatively autonomous. It is thus analogous to the L +
L type, and in fact the final high rise can be replaced by a fall in another
type of tag question without affecting the structural relationships. Identity
of pattern in the two tone-groups is thus not essential for coordinate
structures. (Incidentally, the same anomaly applies in the syntactic struc-
ture of tag questions: though syntactically coordinate, these sentences are
exceptional in having a different structure - declarative and interrogative
- in the two coordinate clauses.)
Although example (c) is of a type which is not as frequently discussed as
(b), it seems to fall into the same category as (b), namely it is a subordinat-
ing sequence with a final dependent tone-group.
The remaining type of sequence is Η 4- Η. This is the least frequently
encountered of the basic sequences (Armstrong and Ward, 1926, do not
include sequences of 'tune II' among their possible combinations), but it
certainly does occur, as in the following examples:
(a) are you coming 'with me / to the 'pub?
(b) I said I "would / if he "asked me
(c) if I ask you "nicely / will you 'come?
For (a) and (b) the coordinate interpretation seems most plausible, but
(c) is more doubtful and would also be open to a subordinating interpreta-
tion, comparable to the Η + L type. Such utterances as this may well be
structurally ambiguous; they are in any case not common.
Summarising the findings so far, it can be claimed that tone-groups in
sequences may be seen as falling into two basic kinds of relationship: a
Subordinating and Co-ordinating Intonation Structures 125
The sequences examined so far have been minimal, with only two tone-
groups. Longer sequences do, of course, occur, with a corresponding in-
crease in the possible combinations of pattern types. Again restricting our-
selves to the basic dichotomy Η versus L, sequences of three tone-groups
give us eight possibilities, those of four tone-groups sixteen possibilities,
and so on. In terms of the types of structural relationship contracted by
tone-groups in these sequences, however, the possibilities are more re-
stricted, and longer sequences can be seen as extensions of the basic types.
One way of expanding the basic types of sequence is to allow more than
one dependent tone-group. Thus the ba type can be extended to bba, bbba
and so on:
On "Sundays / if the "weather's good / and I haven't got too much
"work to do / I go "climbing
Similarly the ab type can be extended to abb, abbb and so on:
I go "climbing / on ,Sundays / if the .weather's good / and I haven't got
too much ,work to do
Incidentally this example shows the inappropriateness of treating the
fall + low rise combination as a single 'double nucleus' tone-group, since
the rise can be repeated indefinitely (cf. Fox, 1973). These two types of ex-
pansion can also be combined, giving bab, babb, bbabb etc. An example of
type bbabb would be:
On "Sundays / in the "summer / I go "climbing / if the /weather's good /
and I haven't got too much /work to do
126 A. Fox
Sequences may also have more than two independent tone-groups, giv-
ing structures such as aaa, aaaä and so on:
this is "John / my "brother / from "Sheffield
are you 'John / Bill's 'brother / from 'Sheffield?
The 'tag-question' type of co-ordinating structure can also have more
than one L type pattern before the rise:
You're John "Smith / from "Sheffield / 'aren't you?
though repetition of the high rise would be unusual in such structures.
Finally, multiple subordination and co-ordination can be combined in a
single sequence to produce bbaa, baab etc. An example with the structure
bbaaabb would be:
On "Sunday / if the "weather's good / I shall go "climbing / with my
"brother / from "Sheffield / I , think / if I haven't got too much ,work to
do.
The structural principles underlying longer sequences are thus quite
easily established, and the patterns can be generalised into a formula:
b 0 aib 0
(bGa b 0 )i
This would expand into a, ba, ab, bab, aa, baa, aab, aba, baba etc.
Structures such as this may seem quite complex, but by the standards of
syntactic structure they are extremely simple. In fact, intonation structures
never approach the complexity of syntax, as certain kinds of structuring
seem to be excluded. It is not possible, for example, to embed one tone-
group inside another (unlike the embedding of sentences) or to have dis-
continuity in a tone-group. Even the possibility of more than one level of
Subordinating and Co-ordinating Intonation Structures 127
IV
'non-final' and 'final'. Such an implication is not restricted to lists, but can
occur more widely; subordinating structures, with their hierarchical order-
ing, suggest integration, while co-ordinating structures imply mere concat-
enation.
But it must be noted that it is not possible to give a specific semantic
function to the 'tones' themselves. The rising patterns have a different sta-
tus in each case, and it is not the 'tones' that determine the meaning but
rather their structural roles. Hence other 'tones' which would be structur-
ally equivalent convey exactly the same implication: the rise of the 'choice'
version could be a fall-rise or a level pattern, while any pair of co-ordinate
tone-groups (such as fall -I- fall, or fall-rise + fall-rise) would preserve
the implication of the 'example' version.
Furthermore, it is clear that what is chosen here is the structure as a
whole rather than individual tone-groups. It is not possible to associate a
subordinate tone-group with 'subordinate information' in a straightfor-
ward way. 'Tea' is in no sense subordinate to 'coffee'; nor is it 'given'
while coffee is 'new'. It would also make little sense to say that the
speaker is 'referring' to tea, but 'proclaiming' coffee. We must therefore
assign significance to the choice of whole structures.
This example shows how a sentence of a specific syntactic type may in-
teract with very general characteristics of intonation structure, resulting in
an apparent syntactic function for intonation. It is of course, evident that
this is not really a syntactic function of intonation, but a more general dis-
course articulating function which can be open to more specific interpreta-
tion in conjunction with a particular type of sentence. In fact, it is clear
that certain types of sentence may be able to exploit general functions of
intonation in quite specific ways, and that certain sentence types may tend
to be regularly associated with particular intonation structures. This is not
meant to imply that intonation structures occur obligatorily with certain
sentence types; it merely follows from the uncontroversial observation
that both sentence structures and intonation structures carry significance
in discourse, and may often converge. It is thus possible, to a limited ex-
tent at least, to identify some likely correspondences between syntactic
features of utterances and types of intonation structure.
The most common type of structure for utterance containing more than
one tone-group is a subordinating one of the type ba. This structure is
general whatever the syntactic relationships contained within the utter-
ance. Certain types of utterance nevertheless seem to prefer a structure in
which the subordinate tone-group follows, i.e. ab, where the final tone-
group has a low rise or a fall-rise. Particularly susceptible to this kind of
treatment are:
(i) relative clauses (usually of the non-defining type if they have a sepa-
rate tone-group):
there's "Smith /who wrote that book on "speech acts
(ii) appositional phrases (also usually non-defining):
I'll introduce you to "John / my "brother
(iii) sentential relative clauses:
I missed the "lecture / which was a "pity
(iv) adverbial comment clauses:
he's "failed / as we ex"pected
(v) tag questions (most), with the same or different patterns in each
tone group:
he's from "Birmingham / 'isn't he?
he's from "Birmingham / "isn't he?
(vi) unlinked clauses:
I don't "know / I've for"gotten
VI
The notion of 'discourse' adopted so far has been rather narrow, being ef-
fectively limited to long utterances by a single speaker. It is interesting to
consider the possibility of widening the scope of the concepts discussed
here to include more extended discourse involving more than one partici-
pant. Can intonation structures of the kind envisaged here cross the
boundaries between the utterances of different speakers? Since several of
the 'non-final' dependent patterns can also occur in apparently independ-
ent tone-groups, one possibility might be to regard such tone-groups as
forming incomplete structures, to be completed by another speaker.
An obvious example of this kind is provided by the high rising pattern
found regularly in 'yes/no' questions. In terms of our present framework,
such patterns could be seen as independent tone-groups, but we could also
consider them to be simply the first, dependent, part of a structure which
is completed by the reply. Alternatively, and without invoking another
speaker's reply, we could interpret such questions as the first part of a
truncated alternative question, with a suppressed '. . . or not' which would
again complete the structure. This explanation for the rise in questions has
a long history; it was put forward by Coleman (1914), and has recently
been revived, in a somewhat different theoretical framework, by Pope
(1976). The 'alternative question' approach would also perhaps explain
why yes/no questions regularly have a rise, while wh-questions regularly
do not: the latter cannot be supplemented by '. . . or not'.
The fall-rise pattern could perhaps be seen in a similar way. We have al-
ready noted a relationship between the interpretation of the meaning of
Subordinating and Co-ordinating Intonation Structures 133
References
1. Introduction
Table 1
a) expected pattern (rightmost b) 'ill-behaved'pattern
placement)
1 plans to leave plans to leave
2 the doctor I was telling you the doctor I was telling you
about about
3 which turn should we take which turn should we take
4 the sun's shining the sün's shining
my umbrella is dirty my umbrella is dirty
principles (see above, fn. 1); but while her procedure is quite ad hoc
('patching operations', as Bolinger remarks), a systematically functional
analysis shows each of the different cases to be a manifestation of much
more far-reaching regularities.
1.2. Very scant attention is paid to the discourse conditions for different
accent placements by Bresnan, Lakoff, Berman & Szamosi, Stockwell. 2 If
we distinguish between the patterns possible in 'all-new' use and those
that necessarily presuppose 'givenness' of part or all of the construction,
we can account for the alternative placements in half of the cases:
- Accent on the predicate in cases like the sun's shining, my umbrella is
dirty systematically presupposes the subject's (often, both the sub-
ject's and the predicate's) being 'given' via context/situation, while
the sun's shining, my umbrella is dirty are patterns usable where no
such 'given'ness is implied. 3
- While which turn should we take? does not presuppose 'givenness' of
either turn or take, which turn should we take?explicitly does so either
for turn or for both 4 ; the same applies to the host of analogous exam-
ples adduced above all by Lakoff: cf. Bolinger 1972, Berman & Sza-
mosi, Ladd, and also Bresnan's own distinction of 'initiatory' vs. 'eli-
citory' questions (which is awkward as far as the qualifications given
2
The artificiality of their procedures is well illustrated in the introductory footnote to Ber-
man & Szamosi: "We should like to thank many professors, students, and neighbors for
acting as informants, pronouncing long (and boring) lists of sentences for us" (emphasis mine).
' cf. Fuchs (1980) with more literature on this point. Without the persistent neglect of this
simple fact, the myth of 'normal' rightmost accentuation could not so easily have installed
itself.
4
Givenness of only the object is pragmatically improbable with this example, but a syste-
matic possibility for the construction type.
136 Α. Fuchs
for the latter are concerned - cf. Bolinger 1972: 642 - but rightly
keeps apart initiatory and non-initiatory - or 'second-instance' -
uses).5
Functionally speaking, then, the question of rightmost or prior accent
placement is of no particular importance. We may reorder our accent pat-
terns according to the criterion just used, in the way shown in Table 2.
Table 2
'all-new' use possible6 'given'ness (ofpart or all) presup-
posed
1.3. The patterns that necessarily presuppose some 'given'ness are the
ones that will mainly occupy us in this paper. Their system, however, is re-
lated to that of the others, which will therefore have to be sketched, too.
In so doing, I shall briefly touch upon the question of the alternative pat-
terns under 1) and 2) in Table 2 (extensive treatment is to be found in
Fuchs, ms).
All the patterns listed in the left rubric correspond to what is usually
termed 'normal stress', but they are by no means the only ones possible in
'all-new' use. In fact, that they should so commonly be regarded as 'nor-
mal' is, I think, to quite an extent due to one-sided choice of examples
(discussions of accent placement overwhelmingly are concerned with
one-accent patterns - does 'the' accent go on this 'or' on that element? - ,
while in spontaneous speech pluri-accent patterns abound). In my theory,
the patterns are cases of the phenomenon I call integration (Fuchs 1976,
1980), a concept hopefully capable of providing an explication of what is
usually referred to by 'normal stress', and of reconciling 'semantic' and
'syntactic' approaches as well as showing the unity of 'normal' and 'con-
trastive' accent. I briefly summarize the main points (see also Fuchs, ms.):
5
'Second instance' is borrowed from Bolinger; see, e.g., 1952.
' The patterns are not confined to such use, cf. below.
'Deaccenting' and 'Default Accent' 137
tionally relevant', at the given point in discourse, may receive two different treatments: each
IC may be introduced as a separate point of information and, accordingly, accented; or the
whole construction be 'integrated', treated as one 'globally new' unit, by accenting one of the
ICs only - the location of the 'integrative' accent being conventionally fixed.
I call the IC of a construction that is to receive accent in the case of integration its expo-
nent (the subject in subject-predicate constructions, the object in verb -I- object construc-
tions, etc.). The location, of course, often coincides with traditional 'normal stress'.
Accent on the 'exponent' only is systematically ambiguous, the emphasis (or 'focus') sig-
nalled by it has variable scope: both immediate constituents may be 'new', 'informationally
relevant', and integrated; or the emphasis may cover the accented IC only, the other refer-
ring to 'given', 'presupposed' matter. This ambiguity is systematically resolved by recourse to
the context, verbal and/or situational (including the character of the items accented)/
Thus with all cases in the left rubric above, we are in presence of integra-
tion. But while for the bulk of integratable syntactic constructions there is
one location of accent designated for the case of integration, the construc-
tions under 1) and 2) pose a special problem in admitting of two place-
ments each. This question has to be treated in a broader framework
(which I attempt in Fuchs, ms).
1.4. All the patterns we have been considering have one accent for two
ICs. In all those in the left rubric in Table 2 that accent may signal focus
either on the accented item only, or on the whole construction. Lack of ac-
cent thus may have two different conditions there (and the two cases must
be distinguished in any analysis; if not, much confusion arises!): the con-
stituent without accent may either be 'given' or (in the case of integration)
fall under the scope of the emphasis, be 'new' itself (part of the 'globally
new' unit). 8
In the right rubric, the situation is the reverse: the accented constituent
may either be 'new', the rest being 'given', or be itself 'given' (part of one
'globally given' unit), a case quite contrary to our current accent place-
ment 'intuitions'.
It is obvious that, to make sense of this, we have to go into the syste-
matics of the patterns that (necessarily) presuppose 'givenness', in more
detail.
2. 'Default accent'
2.0. Two authors have paid particular attention to the accent phenomena
in sentences presupposing 'givenness': Bolinger and Ladd. Both base their
7
The phenomenon of variable scope has been used to reduce the 'normal'/'contrastive', 'se-
mantic'/'syntactic' dichotomies also by Ladd. For resemblances and differences between
our approaches see Fuchs (ms).
• cf. Ladd (1979: 111).
138 Α. Fuchs
"In each of these examples we see the deaccenting of various items felt as 'given' or 'in the
discourse': books in (1), the measure in (2), car in (3). In each case the deaccented noun has
been somehow referred to or alluded to earlier in the discourse . .
"a corollary of deaccenting, which is that in order for an item to be perceived as deaccented,
the accent must fall elsewhere, and it is thus possible for a word to be accented as it were by
default. In (1) we can say that the accent is on read merely to deaccent books. This is not a fo-
cus accent but a default accent. We are not focusing on read; the positive, marked aspect of
the accentual pattern in (1) is not that read is accented, but that books is deaccented. In (2),
we are similarly unable to find a focus on denounce; the accent is there merely to signal the
deaccenting of the measure."
(b)John washed the car. I was afraid someone else would do it.
(Implication that I wanted to do it myself.)
"Someone else in (b) is deaccented to signal coreferentiality with John; the accent
falls on afraid by default."
Ladd concludes (ibid.):
"In all of these cases, then, the accent falls on an item - read, denounce, be, said, a lot of, afraid
- which provides us with little semantic justification for talking of focus; in each case, how-
ever, there are subsequent items in the sentence - books, the measure, dentists, it would be hot
all day, male insects, someone else - which seem to have some reason in the context f o r being
deaccented. From all this we conclude that there are two fundamentally different types of
non-neutral accent placement - narrow focus accent, as discussed in the previous section,
and default accent."
For the sake of my argumentation below, I also adduce the remaining ex-
amples from that section (Ladd, 108 f., numbering of the examples mine):
(8) A. Boy, I really have to get moving. My parents are coming today
and I've still got to finish this paper and get the place cleaned up
and get some laundry done and . . . Wow, I don't know if I'm go-
ing to make it.
a) B. Oh, yeah, I forgot about that. What time are you meeting
your parents?
b) B. Oh, yeah, I forgot about that. What time are your parents
arriving?
(9) A. Why don't you have some French Toast?
a) B. I've forgotten how to make French Toast.
b) B. There's nothing to make French Toast out of.
(10) A. I'm a linguist.
B. How many languages do you speak?
"there is no contrast between 'in' and 'out o f , nor is there any special attention of any other
kind bestowed on the word in; it merely carries a sentence accent which, if it were to fall any-
where else, might be mistaken as contrastive (146)."
140 Α. Fuchs
"part of the more general phenomenon of de-accenting elements that are repeated or presup-
posed. In I did write it - you didn't write it, the nuclear accent is not shifted to the auxiliary
merely to have it there', but in part also to get it off the repeated write. The accent is re-
quired for the yes-no-emphasis, but it would be misleading if it fell on a content word."
So, although he does not use the term 'default accent', Bolinger, like
Ladd, has a conception of accents sometimes being where they are because
they had to be gotten off a repeated or presupposed element (his formula-
tions are somewhat hesitant, though, at least in 1971: " n o t . . . merely", "in
part also").
' The procedure commonly assumed to be used for 'yes-no-emphasis', according to Bo-
linger.
10
In the case of the first element of phrasal verbs, for instance.
'Deaccenting' and 'Default Accent' 141
everywhere . . .". Thus, for Bolinger, the reason for the accent on take is
the common one, the element's being 'new'. 11
2. The mere negative concept of 'default' placement as a corollary of
'deaccenting' is not sufficient to account for the cases so described by Bo-
linger and Ladd. In fact, there are inconsistencies in its application with
both authors:
a) In quite a proportion of Ladd's examples, the accented word is itself
'given': one would accordingly expect it to be deaccented as well, if deac-
centing is semantically-based:
read in (1) is an exact repetition of the lexical part of the verb, as is he in (4); in the beautiful
example (10), in the non-'contrastive' reading where the speaker assumes as common know-
ledge that linguists 'speak (many) languages', speak is implied by the context. For (7) and (8),
context indications are not sufficient to determine with certainty whether the predicate ex-
pressions should be taken as 'given' or 'new': in the former case, the objection made here
would apply; in the latter, the following.
b) In all the others, on the other hand (including (7) and (8), if the
predicates are interpreted as 'new'), the accent is perfectly compatible with
the traditional semantic theory of accent as signalling what is 'new' in dis-
course, without any need for a notion of placement 'by default'. De-
nounced in (2), American in (3), said {5), a lot of {6), make/make out of {9)
(as well as the examples of compounds in a later section, p. 120) all illus-
trate "the accenting of new elements which is to be found everywhere".12
Although Ladd convincingly argues (97-104) that the common 'normal'/'contrastive' dicho-
tomy is too simple, he obviously succumbs to its suggestiveness here himself (which is often
in fact hard to avoid): since the patterns considered are not 'normal', but their accent cannot
be seen as somehow contrasting the item either, some 'other kind' of accenting must be in-
volved . . . (See the rhetorical "what does afraid contrast with?" p. 106, and the whole argu-
mentation esp. on that page.)
u
This is imprecise in view of the discourse conditions assumed for this pattern by Bresnan,
namely, a preceding we should take one of these turns (1972: 340). The two sentences are of
course highly improbable as a discourse sequence in direct succession; it remains that take
just as well as turn is assumed to be 'given'.
12
The accent on out (of) may surprise at first sight, but is I think analogous to that in throw
out, throw away, etc.
142 Α. Fuchs
ally equivalent — but are they? £add's theory is in principle more specific
with regard to the location of 'default' accent (see below, § 3); still, he
does not take alternative possibilities into account at all (why not doesn't
read books, how many languages do you speak? which would seem very
plausible 'deaccentings' in view of the context givens, after all.).
We will have to determine the status of the alternative possibilities and
search for functional differences corresponding to them. The need for
'positive' criteria is also borne out by the fact that, e.g., the verb and the
object may be accented at the same time, in second instance (see below
§ 3.6.2.). It is impossible in such a case for the accent on the verb form to be
there 'by default', yet its semantic value is the same as in Ladd's examples.
3.1. Surely we have to leave all the cases where there is perfect justifica-
tion for the accent in traditional semantic terms - case b) above - out of a
discussion of 'default' placement. Since Ladd himself by no means treats
all deaccenting for 'given'ness as resulting in 'default' accent 13 , the amend-
ment should be in line with his thinking.
Although this eliminates a large proportion of Ladd's examples, others
remain that definitely need an explanation, for the reason that the ac-
cented element itself is 'given'.
13
cf. 1980: 111, where nineteenth century is treated as deaccented, under certain context con-
ditions, without there being any question of 'default'.
'Deaccenting' and 'Default Accent' 143
Ladd considers this ambiguity (which he nicely illustrates with how many languages do you
speak - see the comment, 109) as strong "evidence for the focus/default distinction", and the
discussion of read books in his book seems to point even more strongly in the direction just
outlined (see Ronat this vol.).
The cases here considered strongly invite such a hypothesis, and the fre-
quency in discourse of the construction type considered may seem to con-
firm it again and again (I held it for a long time). But the generalisation is
not warranted. In fact, the description would only fit constructions involv-
ing a verb as one IC, and not all of these. Besides, the second-instance ac-
cent here is always on the verb, regardless of whether the integrative pat-
tern has it on the co-constituent or on the verb itself (for cases of the
latter sort, see below § 3.5.3). This ties in with the possibility mentioned
above of accent on the verb and its co-constituent in second instance,
where evidently the accent on the verb form cannot be a result of 'deac-
centing' the co-constituent, and calls for a 'positive' explanation of the ac-
cent on verb forms in second-instance sentences - in fact, generally.
3.3. At this point, we need to sharpen our notions regarding the semantics
of accent placement: we are still left with the paradox of constituents to be
characterized at the same time as 'given' and as 'new'; and the peculiarities
connected with accentuation within predicates have to be elucidated
against a more general background.
Abbreviating modes of expression are necessary, but the terms current
in the description of accent function, although mostly indispensable at the
moment, are all unclear or misleading in some respect or other (and of
course no-one is fully satisfied with them). 'Focussing' is useful, but rather
metaphoric. 'Relevance' leaves open the question of a point of reference:
relevant to whom, to what? (it has often in fact been misconstrued in a
psychological sense, 'what the speaker feels is important'); and 'new-
ness'/'givenness' cannot be taken to be what accent signals in a literal
manner either.
1. Hearers know what is 'given' in discourse and what is not; in fact,
have to know it, to be able to competently participate: recent work has im-
pressively shown how interactants constantly have to draw on their fund
of common knowledge, in general and as regards the givens of the situa-
tion at hand, to be able to adequately produce and correctly interpret ut-
144 Α. Fuchs
14
Cf. the complex interpretation of deictics, beautifully illustrated in Fillmore (1976), Haw-
kins (1978) for definiteness, more generally Grice (1975), Labov & Fanshel (1977) and
Keenan & Schieffelin (1976). The systematic relevance of the fact is coming to light in an-
alyses of the most diverse phenomena.
» Cf. Halliday (1967: 204), Ladd (1979: 128).
" I am not referring at this point to the accentlessness of a 'new' element resulting through
integration.
17
A term I borrow from Bolinger, who uses it passim (e. g. in 1952), but less technically than
I intend to.
18
For a somewhat different kind of use, see Ladd's doesn't read, trash (1979: 128).
'Deaccenting' and 'Default Accent' 145
given. While we certainly do not agree with his conclusion that accenting for 'new informa-
tion' and for 'contrast' are different in principle, the dilemma vanishes as soon as factual
newness and 'newness' as a signifie are properly distinguished.
An example: A: May I speak to Peter? B: Peter's gone (he just left half an hour ago, and. . .). The
answer clearly is not intended to contrast Peter to anyone else; what is focussed is the 'the-
matic' relation ('theme' in Halliday's sense, roughly): cf. Peter's gone, where Β would simply
be giving some information about the referent introduced. Cf. also: A: You look desperatem·.
My purse is gone. My purse is gone here would be rather perplexing: "Who has been speaking
of purses?".
Sgall et al. (37 f.) speak of the possible 'underlining' of 'grammatemes' (in verb forms) as
opposed to that of lexical elements. This comes quite close to our approach. However,
'grammatical' vs. 'lexical' is not the relevant distinction: for one thing, grammatical elements
often have oppositive meaning features, and focus on them must be distinguished from focus
on their relational features.
" In discourse, possible oppositions are limited: not all the members of a syntactic category
are in virtual opposition, but only those within the universe(s) of discourse activated at
that point in the discourse.
20 This is not the only systematic possibility: in certain contexts ('contrast'), the relational
ferent placement much more in line with his deaccenting concept in leav-
ing 'given' matter accentless:
(1') John doesn't read books
(4 ) . . . that there are guys who want to be dentists
(8a ) what time are you meeting your parents?
(8b ) What time are your parents arriving?
(10 ) How many languages do you speak?
(for 8a and 8b I here assume the verb to refer to given matter).
The different placements presuppose different discourse conditions
(different 'questions of immediate concern'). Meanings vary systemati-
cally with a variety of factors; the following, however, seems one possible
paraphrase for (1) as against (1'):
(1) "What you are saying implies the possibility of John's reading
books - well, he doesn't"
(Γ) "You are saying that John reads books - let me tell you that such
isn't the case."21
How are such differences - slight from the point of view of denotation,
but real enough in view of possible interactional consequences - to be ac-
counted for systematically?
3.5.2. Relational features. The more complex the internal buildup of a ver-
bal expression, the more different are the relations established within it
and to elements outside it. Thus a passive predicate 'orients' the action de-
noted differently with respect to the participants than an active one, a mo-
dal auxiliary within the predicate may strongly restrict the degree to which
the action, state etc. denoted is ascribed to the subject, etc. I shall concen-
trate here on two relations that are established by any lexical predication 22
and that I term, provisionally perhaps, 1) connection, 2) ascription.
The two relations go undistinguished in most accounts of "the" func-
tion of predication. Their distinction from each other (as well as from
other semantic features of the verb form), however, clearly shows in that
they may separately constitute the domain of emphasis; cases of this sort
also allow assignment to each of them of a formal locus within the verb
form.
'Connection' corresponds more to what is sometimes called the 'nexus'
function of predication, 'ascription' more to what is called its 'statement'
function. 'Connection' is effected through the bringing together of a (po-
tential) subject and the expression of an action, state, etc., in the form of
an infinitive, a participle, or the 'bare stem' of a finite form; 'ascription' by
the use of a finite form (including imperatives), the prerequisite for insert-
ing the complex created by 'connection' into a higher-level discourse unit,
as a (relatively) independent 'move'.
A distinction between two 'fonctions verbales' beyond the expression of the lexical content is
drawn by Benveniste (1966: 154): 'fonction cohesive', 'fonction assertive'; and Searle's dis-
tinction between 'predication' and 'illocutionary acts' also is relevant here. The three dicho-
tomies have many common elements, but are not parallel. Searle's is more comprehensive
than Benveniste's in sharply dividing the illocutionary aspects from the others. His practical
(not theoretical) neglect of formal linguistic distinctions, however, leads him to disregard
other differences. From the viewpoint of linguistic form-function correlations, Searle's 'pre-
dication' has to be further subdivided (as have the 'illocutionary acts'), and this subdivision
entails a somewhat different formulation of his primary distinction. Searle's term and in
large part his description of 'predication' make it look as though the relation were tied to a
finite verb form, but in fact he also considers other syntactic frames: see the discussion 30 f.
regarding expressions like I promise to come. Searle's recourse to 'deep structure', in order to
instal a subject-predicate relation between the relevant terms here too, amounts to stating a
common semantic relation in both constructions while abstracting from the formal/func-
tional difference. If we take the difference into account as well, we have two relations within
the finite predication, the common one and the one proper to the finite form.
Illocutionary acts, too, fall into two basic categories from the viewpoint of linguistic
form-function correlations: 1. the very narrow system of first-layer, 'forced-choice' speech
actions instituted in the linguistic form: any finite utterance has to take the form of either an
affirmation or a request for action or a request for information; 2. the system of speech ac-
22
I limit the treatment to predications that contain some lexical element, and to finite predi-
cations - extension to the whole system of verbal forms may necessitate some reformula-
tion.
148 Α. Fuchs
tions building on this first layer, the rules for which always specify a) one of the 'first-layer'
actions, b) social/pragmatic factors of the situation. (See the Labov & Fanshel model of dis-
course analysis, 1977.)
23
Under the specific conditions that I am just delineating, this will be the case most of the
time.
24
The accent goes on the 'outermost' element to be integrated (see below). - Expressions
with a predicative nominal belong here, too; their behavior in first instance as well as in
iteration shows some characteristic peculiarities, however.
'Deaccenting' and 'Default Accent' 149
(2) Iteration
The activity, state, etc., expressed by a lexically simple form or a combina-
tion of verb plus 'specifier(s)' is given.
a) Accent may go on any part of the predicate in order to 'contrastively'
focus its oppositive semantic content (provided the element has any such
content). We will not be concerned with this possibility any further.
b) Any of the relational features may become the object of focussing,
which is formally expressed by accenting that part of the expression which
contains its formal locus.
Comprehensive rules determining the formal loci of the two relations
(and of others) are not easily formulated in a few words, in view of the
possible formal diversity and complexity of predicate expressions. For a
concise formulation, we will first have to determine the structural order-
ing of the elements liable to enter into predications. For the moment, I
shall set up a very tentative, and incomplete, 'inner-outer' scale, from left
to right: expression of finiteness - passive auxiliary - verb 'theme' (partici-
ple/infinitive/'bare stem' of simple form) - . . . — specifier 1 (non-verbal
parts of 'phrasal verbs' etc.) - specifier 2 ('inner' complements) - specifier
3 ('outer' complements) - . . .
Ascription may then be said to have the 'innermost' element as its for-
mal locus, connection the next 'outer' one. Since these elements may
either be combined in one word or dissociated, depending on the internal
buildup of the predicate (and there are some differences in this respect be-
tween English and German morphology), different possible accent place-
ments and systematic ambiguities result. However, the formulations above
(which simplify but, I hope, do not distort) should be able to account for
the relevant placement patterns, in English as well as German.
To summarize: ascription is focussed by accenting the form containing
the 'innermost' element of the predicate (the expression of finiteness - but
see note 22). Connection is focussed by accenting the form containing the
next 'outer' element. Where the two elements coincide in one form, inter-
pretation of the accent as a focussing on ascription or on connection syste-
matically depends on context.
A precise account of the functional side of focus on connection or on
ascription would presuppose quite an elaborate systematics of the possible
interactions between the parts of the predicate among each other and with
the other parts of the sentence, under different context conditions. A very
rough, and partial, sketch follows (cf. also § 3.6.2).
Every predicate 'ascribes' anew. 'Ascription' is never 'pure', always in
some mode: request for action, for information, (relative) affirmation (i. e.
including negation, modalisation via auxiliaries etc.). Ascribing a given
event to a participant in focussing on connection makes connection itself the
dimension of relevance; it affords a means to specifically concentrate on
the affirming, questioning, denying etc. of the factuality of the event (un-
150 Α. Fuchs
der the aspects/to the extent specified by the rest of the sentence) whose
possible relation with the participant is itself not news.
With focus on ascription, ascription itself is 'called up' as a dimension of
relevance. Since the event-participant relation is 'given in the discourse',
calling up this dimension of relevance results in establishing a relation to
the 'source' of that givenness: prior ascription (in some mode . . .). This
kind of focussing thus affords a metacommunicative device for specifically
referring back to a prior ascription while effecting a new one (in one of
the possible modes, again, and under the aspects/to the extent specified by
the rest of the sentence). 25
There are several quite explicit references to the phenomenon of focussing on relational
aspects of the predication (but without distinction of different relations): Halliday, 1967:
204; Sgall et al., 1973: 37f.; Worth, 1964: 699).
B. Pluri-accent patterns
a) Focus on connection and focus on ascription are combinable within
one predicate expression: see examples (38) and (39) below.
b) They are also (even jointly?) combinable with independent accenting
of 'specifiers' such as an object expression: see examples (36) and (37).
The 'specifier' may in that case be iterated itself; it may also be part of a 'new' predicate, and
bear an integrative accent expressing focus on the 'new' event. - In combination with the de-
signation of a 'new' event, the pragmatic function of focus on ascription is somewhat differ-
ent from what was said above: it may be used to imply the possibility of contradiction, e. g.,
in such cases. These uses are not treated here.
25
Metacommunicative function is present also in the iterative focussing on the 'thematic' re-
lation illustrated above § 3.3.
'Deaccenting' and 'Default Accent' 151
3.6. Illustration
3.6.1. Ladd's relevant examples illustrate focus on connection. The rules
given above under A. 2)b) should make it clear why it is only the object or
the predicate noun that goes unaccented although the lexical content of
the verb is no less given (cf. also above § 3.2 regarding Ladd's 'broad fo-
cus').
(1) John doesn't read books "there has just been a question of reading
(a) book(s) in relation to John. Well, this relation does not obtain."
(4) . . . but I'm awfiilly glad there are guys who want to bέ dentists "I
have just spoken about being a dentist, and what it is like (for 'one',
for 'a guy*) to be one. Well, I'm glad there are 'ones', 'guys' who
want it to be the case for them."
(10) How many languages do you speak?"You have just given me to un-
derstand that you speak (many) languages. Well, of how many is it
the case?"
The alternatives proposed in 3.4. would be the patterns in order for fo-
cus on ascription (for (4'), obviously, this would require a context where
want to is given, otherwise the pattern could not be so interpreted).
In the examples with focus on connection, there are other 'new' ele-
ments beside the one focussed: we said above that 'ascription' is effected
anew with any finite form while focussed only under certain conditions;
likewise, the negation in (1), want to in (4), how many in (10) are inte-
grated with the focussed-on element, which makes for a particularly nar-
row bond, in terms of discourse articulation, between them and the 'factu-
ality' ('connection') feature. In the alternative versions, the negation and
the interrogative elements are more narrowly tied in that sense to the fea-
ture of ascription (in a certain mode, etc.) 26 .
Even out of context, the alternative versions sound more 'insistent', which is explainable by
their focus on 'ascription'. The specific character of this kind of focus becomes even more
notable when comparing the versions in question to another possibility, accent on not, bow
many etc. instead of on the verbal expression. This is the pattern one naively expects in the
case of the predicate's being given as long as the possibility of focussing on relational fea-
tures is not recognized. It is in fact a systematic possibility in this case, as is accent on the ne-
gation/question word and the verb expression, but articulates the discourse in a slightly dif-
ferent manner, 'calls up' other/additional contexts of relevance in comparison to the patterns
here treated.
(22) also sehn wir uns heut abend? (well then are we meeting tonight -
interlocutor had mentioned, some time before, that she might be
kept from going to a party that night to which both were invited).
(23) Wann kommt denn der Ralf eigentlich? Say, when is Ralf coming? Α.
question 'out of the blue'; however, Ralf was supposed to drop in
some time that afternoon.
(24) A: wie ist das nun/findet die Versammlung statt heute abend? (now
what are we at/is the meeting tonight taking place?
B: die findet statt/jaa (it is taking place/yes)
(25) (Child has asked his mother to make ravioli for lunch; she half-
promised. After a while he comes into the kitchen, sees her cook-
ing and says:) gelt du kochst Ravioli? (You are making ravioli, aren't
you?)
(26) A: aber der betreffende [a farmer, AF] kommt natürlich für so was
nicht aus Kanada hierher (but the fellow won't come over from
Canada for something like that)
B: äch/so'n tJrlaub/wenn man das damit verbinden kann (oh I
don't know/a holiday/if you can combine both)
A: jaa/Landwirte mächen nicht oft Urlaub (farmers don't often go
on holidays you know)
(27) jetzt hör doch auf{now do stop), jetzt sei doch still (well do be quiet):
a very frequent type - iterated imperatives.
(28) du machst mich nervös! (you do make me nervous) — exclamation of
a teen-age daughter to her mother
(29) Child: wäs ist mehr/hundert Pfennig oder ne Mark? (what's more:
a hundred Pfennigs or one Mark?)
Mother: was meinst dü denn (well what do you think)
Child: hundert Pfennig (a hundred Pfennigs)
Mother: nee/hundert Pfennig sind ne Mark (no: hundred Pfennigs
are one Mark)
(30) (referring back to interlocutor's report that her baby fell off the
table) du sag mal wann ist sie denn eigentlich runtergefallen so um
wieviel Uhr/wenn das schon länger her ist. . . (actually when did
she fall down, at what time approximately. If it was some time
ago . . .). Speaker goes on to discuss symptoms of brain concus-
sion.
(31) die haben ihr dann gesagt das sei doch 'ne reine Impülshandlung
was sie da vorhätt und sie sollt's lassen/und sie hat's dann gelassen
(and those people told her that what she was planning was a mere
impulsive action and that she had better give it up/and she did give
it up. . .)
(32) (to a child fearfully inquiring about the modalities of an imminent
vaccination; after some reassuring information) außerdem bist du
doch auch schon ein paarmal geimpft worden (and besides, you
'Deaccenting' and 'Default Accent' 153
have been vaccinated a few times before, haven't you? - seil, it's
happened to you before, it's not new to you).
(33) (regarding a man claiming to have suffered injustice at his job) ich
hab neulich auch mit dem Μ. gesprochen der das ja alles miterlebt
hat/und der hat auch gesagt der sei schlecht behandelt worden (and
I've recently been talking to Μ., who has witnessed all those things
after all, and he says X. has been treated badly - scil. as he claims).
(34) bist du denn nun in Heidelberg gewesen? (well have you been to
Heidelberg after all? - interlocutor had been planning a trip to H .
but been very unsure whether she would actually go).
(35) (in a discussion) das kannste nicht in die Waagschale werfen/das
kannste nicht in die Waagschale werfen/darauf kommt's überhaupt
nicht an (you can't throw that into the balance/you can't throw
that into the balance/that's not at all what matters)
(36) (speaking about what kinds of accent patterns to collect. Certain,
it turned out during the conversation, could be neglected for the
moment.) Gut, dann läss ich die jetzt erst mal/aber ich sämmel Iter-
ation (ok, so I'll leave those for the moment/but I will sample itera-
tion).
(37) (does Christopher's pair of jeans need washing, in view of an im-
minent visit?) Jeans wäscht man doch nur einmal in der Saison/
oder? (but jeans you only wash once a season, don't you?)
(38) ja und habt ihr ihn denn dann eingeläden? (well and did you invite
him in the end?)
(39) (A and Β have been waiting for the postman: they can usually no-
tice it when he passes. After an hour or so of unsuccessful watch-
ing:) A: du ich geh jetzt mal an den Briefkasten. Β (sharp protest):
aber ' s ist doch keine Post gekommen. (A: Listen, I'm going to the
letter-box. Β: But the mail hasn't come!)
(40) Grocer watching customer stow away the things bought: Geht's
so? Customer: Wenn Sie mir ne Tüte geben könnten. . . (G: will
you manage? - standard question at the time plastic bags were gen-
erously given; C: If you could give me a bag . . .)
I have refrained from supplying accents in the English versions, but
hope the reader will do so (cf. § 3.5.4).
In (20), (21), (22), the placement of accent itself would also warrant first-instance interpreta-
tion; the contextual givens only make it clear that what is involved is focus on 'connection'.
The German examples, formally, would admit interpretation in terms of focus on ascription
as well; English is less ambiguous here, we would have to have either it's spinning, I sie it, are
we meeting, or it is spinning, I do see it, are we meeting. From (23) through (35), first instance
is manifestly excluded; the only one-accent patterns possible in that case would be accent on
a 'specifier' (the subject in 23, the non-verbal element of the 'phrasal verb' in 24, the object in
25, etc.). Semantically, (23) could be interpreted as focus on connection or on ascription: the
former insisting more on the moment of realisation, the second on the fact that Ralf's coming
154 Α. Fuchs
has been an issue between the interlocutors. The English version would have to be differ-
ently accented in the two cases: when is... vs. when's... coming. Formally, the same ambi-
guity is given in (25) and (26) (not in the English versions, though), but the larger context
(perhaps not entirely graspable from the extracts) makes for focus on 'connection' ('factual-
ity'); (27), too, leaves two options: emphasis on prior ascription or on realisation (a possible
situation for the latter: interlocutor is hesitating whether to stop a certain activity or not, and
I might give the advice: hör auf). Generally, focus on connection emphasizes the uncertainty
of the event in the perspective of the situation preceding it, while focus on ascription relates to its
relevance from the point of view of the issue at hand (cf. 30!); the reader may test this on most
of the examples, in their actual form and context as well as with possible accent/context sub-
stitutions. Examples (32) (formally, accent on been would be equivalent in the English ver-
sion) and (33), accent on has, illustrate cases where the verb form contains several auxiliaries
and infinitives, the rules for which remain to be stated in detail. Example (35), where verb +
complement form a semantically indissoluble unit, shows particularly clearly how it is the
whole group, not just the complement ('specifier') that is given (in fact, the iterated version is
preceded by its verbatim first-instance equivalent). (36) and (37) show the possibility of com-
bining accent on the verb form and on the 'specifier', (38) and (39) combination of focus on
connection and on ascription - a very elegant, and perfectly common, handling of perspec-
tives. (Note the accent shift in the 'phrasal verb' in (39): eingeladen as against first-instance
eingeladen) There is of course no compulsion to use this or that kind of iterative focussing in
a given context: the speaker is to a large degree free to take this or that perspective (cf. also
Ladd, 128: ". . . the speaker simply chooses to connect answer to question in a slightly differ-
ent way"); thus in (31), the speaker connects und sie hat's dann gelassen to the issue at hand,
while the context may well lead one to expect emphasis on the uncertainty of the event in the
past situation. 'Rhetorical' exploitation is frequent, particularly in politeness strategies (cf.
Brown/Levinson): in (40), the customer is asking for a plastic bag, but insinuating that the
grocer is about to give her one on his own initiative.
With those of Ladd's examples that are not explainable by common se-
mantic theory without some additional specifications, the source of the
difficulty, as I have tried to show, lies in the presence of covert 'relational'
meaning features that may be focussed on by accenting the verb form just
as the lexical content may - a source of considerable, though systematic,
ambiguity.
Bolinger's and Ladd's descriptions share the notion of a placement 'by
default'. But their perspectives are different. In fact, as we mentioned
above, Bolinger dismisses Bresnan's what tum should we take as being per-
fectly in accordance with the usual semantic account (neglecting the fact
that take is not meant to be 'new' in it), while this category of cases is at
the core of Ladd's concern. In Ladd's approach, the 'default' examples are
analyzed in terms of their deviation from 'normal' and 'contrastive' accen-
tuation; what Bolinger tries to account for is non-'contrastive' accent on a
'low-content' element, a deviant case in terms of his semantic theory (this
emphasis, as we mentioned above, leads him to treat even put him to death,
threw it away etc. on that model). While those cases of Ladd's that we
'Deaccenting' and 'Default Accent' 155
have been trying to explain are definitely not describable in terms of 'ac-
cent on a low-content element', those of Bolinger's examples that do not
involve accent on a verbal form require an explanation different from the
one given so far. Bolinger rightly invokes the higher-level distribution of
accent as a reason. I should like to speak of 'higher level accent', however,
where he speaks of 'sentence accent' as a factor (a sentence does not ne-
cessarily have one accent, as Bolinger is of course well aware; the accent
on the 'low-content' element may indirectly result from accent attributed
to a constituent a level higher than itself, but lower than sentence level).
And 'dependence on a low-content element' is an overgeneralization.
To substantiate this, we need a descriptive framework that assigns ac-
cent not per sentence, but per constituent (which is in accordance with Bo-
linger's general approach), and in so doing takes exact account of the syn-
tactic hierarchy. Let me sketch such a system in a few lines:27
1) A construction with immediate constituents A and Β has the follow-
ing possibilities of accent distribution: (i) A + B (ii) A + B (iii) Ä + B (iv)
A + B. The first is used to make mention of the referents involved while
presupposing them as 'given' and as not constituting points of relevance at
that place in the communication; the second, to present each separately as
a point of relevance. One of the one-accent patterns (see Fuchs ms.) may
be used for integration of both ICs into one point of relevance. The integ-
rative pattern is always ambiguous, its other systematically possible use be-
ing focus on the constituent accented only. The non-integrative one-ac-
cent pattern is often limited to use for focus on the accented IC only; see,
however, section 3.4. case 2b) and the analyses below.
2) Whenever an IC that is to receive accent according to 1) is in itself
complex, for the placement of accent within it 1) again applies (with the
obvious exclusion of pattern (i)).
3) This goes on as long as there is further syntactic ramification.
4) Higher-level choices may bar the possibility of some lower-level
choices with respect either to accent placement or to its interpretation. I
mention the principle, but will not go into the question any further here.
The following two examples (from Fuchs, 1976) illustrate the diagram-
matic representations used below:
(41) (Speaker has taken a bicycle to the repair shop because the gear
switch does not work. After explaining this) und wenn Sie sich das
Ventil am Vorderrad nochmal anschauen/ich glaube da entweicht
Luft (and if you'd please have a look at the valve on the front
wheel/I think there is air escaping [lit. there escapes air]).
27
More detail in Fuchs 1976 ch. 3, and 1976, 1980 passim. To simplify, I neglect the ques-
tion of the lexical/nonlexical character of the constituents. The concern for the syntactic
hierarchy advocated here has nothing in common of course with Ladd's notion of a hier-
archy of accentability.
156 Α. Fuchs
(42) das ist also genau das was die Transformationsgrammatik oder
Konstituentengrammatiken durch rekursive Regeln äh erzeugen
(so this is precisely what ['that which'] [the] transformational
grammar or constituent grammars er generate with recursive
rules).
I shall not especially take account here of the treatment of such consti-
tuents as regularly go without accent even when 'new' (ich/1\ da/there etc.)
and just put them in their proper place in the diagram without further
marking. Otherwise, Ά ' marks any non-terminal constituent that, on the
respective hierarchical level, is to receive accent, either as the exponent of
an integrated construction or as a point of relevance in its own right. To
mark the former case, I shall enclose the construction in parentheses with
an index Right indexes ν and g mark 'new' and 'given', respectively.
Needless to say, the accent patterns in the examples are such as they are
in virtue of the specific meanings that were to be conveyed in the contexts
of their use. Other patterns could have been chosen, f o r other purposes.
T h e hierarchical analysis here proposed differs from other cyclic ap-
proaches in stressing the semantic choices effected at each level. 28 N o t e
that the trees are not intended to reflect linearity.
(43)
(1) ich An
(3) da A N
28 The 'tree' representation is thus obviously quite different in interpretation from that used
by Liberman & Prince (1977).
'Deaccenting' and 'Default Accent' 157
seldom
am
everybody A
in (Rumania)g
(47) I'm not sure about what to review and what not to review
am A
(not sure)Q A
about A
A and A
what A what A
A . / \ .
is A
(the sense)c A
Provided the IC analyses are basically correct, it would thus seem that we
can explain all the accents in question by stating that the 'low-content' ele-
ment is, at that hierarchical level, the only non-given, i. e. new constituent.
(I have dozens of examples of this kind from spontaneous discourse in
German; so my claims are not limited to the few examples above. Cf. also
Fuchs 1976: 309 f.)
We will thus want to keep the part in Bolinger's explanation that
amounts to saying that the accent results from the interaction of higher-
level stress assignment with the 'givenness' of the co-constituent. That the
accent should be on a 'low-content' element, on the contrary, generalizes
what is only a possible side-effect that strikes us when it occurs because it
so patently contradicts common expectations. The attempt to cover the
iteration cases by the same explanation led to unwarranted semantic classi-
fications: for why should throw in throw away be 'low-content' in compar-
ison to away? For the kind of examples we are treating in this section,
however, the notion of 'accent by default' has some truth: the constituents
accented in them would not receive accent in first-instance use (except if
they were meant to establish separate relevance relationships within the
discourse) were it not for the fact that the dominating constituent has an
accent to be realized somewhere, while their co-constituent is 'given' (and
not meant to be 'iterated') and thus cannot take it.
There is no 'dependence on a low-content element' for the distribution
pattern described to occur, nor is the reason for the accent being where it
is that it "would be misleading if it fell on a content word": For one thing,
the 'given' co-constituent may itself be 'low-content', cf. am liebsten würd'
ich mit dir fahren (I wish I could go with you - to someone who has just
been talking about a projected trip); komm mal zu mir (do come to me):19
" English: A: . . . rather an intricate pröblem! Β: Yes. But let's not go into it now. A pub-
lished example: O'Connor & Arnold, 1973: 226 (I render their 'tone group' notation by
simple accent marks): But h6w did you manage it? - There was nothing to it. It couldn't
have been simpler.
160 Α. Fuchs
Semester- A
ferien
{sind Α) ι
die A
It will have become clear by now t h a t . . . in which I meant it and . . . in which I did mean it,
or . . . what to review... and . . . what I should review... are not really equivalent, nor are
any other alternative accent placements, 'low-content' or n o t . . . the sense in which I meant it,
e. g., may presuppose something like "there has been a question of my meaning 'it' in some
specific sense, and now you are presenting this sense; well:. . ."; . . . in which I did mean it
specifically presupposes a questioning of the fact. Again, the differences are small from the
point of view of denotation, but real enough interactionally.
5. Conclusion
5.1 We are still at an elementary stage in the analysis of accent, and on the
whole our systems of rules for its use are based on a few simple notions,
just as the data base mostly is poor. Many patterns of accentuation and
their uses have come under attention only quite recently.
The patterns mentioned in this paper do not fit into current models of
description in that their accents are neither interpretable as a foreground-
ing ('contrastive' or otherwise) of the content of the element in the usual
sense, nor can they be described as 'normal', 'neutral' placements in view
of the current rules for those. Their deviation from current expectations
derives from several different basic principles of accent use, however,
which is one reason why attempts to account for them by some one addi-
tion to the model used by the author (Bresnan's proposal, e.g., or the 'de-
'Deaccenting' and 'Default Accent' 161
5.2 For our analysis of the 'default' cases, we have had to sharpen current
basic notions in several respects. Most importantly:
1) The terms 'given' and 'new' in the description of the function of ac-
cent do not always designate the same. We have to distinguish between
two facts: factual 'givenness'/'newness' of the element in the discourse (a
notion that does not simply reduce to that of prior mention), which does
not in any mechanical way determine accent placement; and the conven-
tional meaning associated with accent/accentlessness, for which we should
perhaps agree to settle on some other technical term.
2) The semantic function of accentuation is characterized by a high de-
gree of systematic variability with contexts: 'ambiguity' in a somewhat ne-
gative terminology, systematic adaptability to different purposes, more
positively. The choice of an accent pattern is not determined by context
conditions, since it has an autonomous semantic function (although cer-
tain accent patterns may show a high degree of affinity to certain context
types). Its interpretation, on the contrary, is systematically dependent on
context givens of all sorts (cf. Halliday, 1967: 206): word class, syntactic
relations to other elements, accentedness/accentlessness of those ele-
ments, (factual) givenness/newness in the discourse etc.
3) It is not only the semantic content of an element in the usual sense,
as defined by its opposition to other elements of its class, that can be fo-
cussed on by an accent; the 'relational' semantic features as defined by the
syntagmatic relations it stands in, too, figure importantly as possible do-
mains of focus.
4) The syntactic hierarchy of a sentence does not determine accent
choice, but plays an important role as a framework for the choices to be ef-
162 Α. Fuchs
'Sentence accent', then, is 'sentence' accent in the sense that the informational focussing ef-
fected by it - which may be bestowed on any of its constituents, and on several at a time - is
fully interpretable only against the background of the relations the constituents stand in
within the sentence. The term must not mislead us into thinking that what an accent func-
tionally relates to is simply the sentence as a unit. If it did (in a 'delimitative' function), the
rules for its placement would in all probability be simpler than they are (for how should the
accent help delimit the unit, otherwise?), and an unalloyed 'syntactic' approach might have
been sufficient to capture them.
5.3 We have long had to contend with 'contrastive' accent as a unit pre-
sumably sui generis, when what is involved is just accent and some 'con-
trastive' use, or context (Bolinger, 1961; Gibbon, 1980). It looks as if 'de-
fault accent' were on its way to becoming hypostatized into another such
'kind of accent'. But, again, although the rules for the use of accent in in-
teraction with the local semantics, syntax, and discourse conditions are in
part rather intricate, and in spite of extensive '-etic' semantic variation,
'-emically', focussing by accent is unitary.
'Default' accent is 'focus' accent. 31
10
Four formal patterns are available, but semantically, they correspond to many more possi-
bilities: separate focus vs. integration, focus on a new element vs. iteration, etc.
31
If one were to speak this sentence, in the context here presupposed, there might well (not:
'would have to') be accents on the subject, on is, and on the predicate noun - and analysis
in terms of 'default' impossible. (Interestingly, while one quite commonly accents many
constituents in a sentence, one usually underscores but one.)
12
If the notion of 'kinds of accent' is fruitful at all, it should certainly be reserved for units
formally distinct, such as Bolinger's A-, B- and C accents. Still, mutatis mutandis the same
objection may apply; the differences between A, Β and C ought probably to be described
in the framework of some other more general intonational dimension (cf. Knowles this
volume).
'Deaccenting' and 'Default Accent' 163
References
Bearth, T . (1980). "Is there a universal correlation between pitch and information value?" In:
Brettschneider, G. & Lehmann, Ch. (eds.). Wege zur Universalienforschung. Tübingen:
Gunter Narr. 124-130.
Benveniste, E. (1966). "La phrase nominale". In: Benveniste, E., Prohlemes de linguistique ge-
nerale. Paris: Gallimard. 151-167.
Berman, A. & Szamosi, M . (1972). "Observations on sentential stress". Language 48:
304-328.
Bolinger, D. (1952). "Linear Modification". PMLA 67: 1117-44
Bolinger, D. (1958). "A theory of pitch accent in English". Word 14: 109-149.
Bolinger, D. (1961). "Contrastive accent and contrastive stress". Language 37: 83-96.
Bolinger, D. (1971). The phrasal verb in English. Cambridge, Mass.: M I T Press.
Bolinger, D. (1972). "Accent is predictable (if you're a mind-reader)". Language 48: 633-644.
Bresnan, J. (1971). "Sentence stress and syntactic transformations". Language 47: 257-281.
Bresnan, J. (1972). "Stress and syntax: a reply". Language 48: 326-42.
Brown, P. & Levinson, S. (1978). "Universals in language use: politeness phenomena". In:
Goody, Ε. N . (ed.). Questions and politeness. Cambridge: Cambridge University Press.
56-289.
Chafe, W. L. (1974). "Language and consciousness". Language 50: 111-33.
Fillmore, C. J. (1976). "Pragmatics and the Description of Discourse". In: Schmidt, S. J.
(ed.). Pragmatik/Pragmatics. Vol. 2. München: Wilhelm Fink. 147-174.
Fuchs, Α. (1976). " ' N o r m a l e r ' und 'kontrastiver' Akzent". Lingua 38: 293-312.
Fuchs, A. (1980). "Accented Subjects in 'All-New' Utterances". In: Brettschneider, G. & Leh-
mann, Ch. (eds.). Wege zur Universalienforschung. Tübingen: Gunter N a r r . 449-461.
Fuchs, A. (ms.) 'Instructions to leave': on integrative ('normal') accent placement.
Gibbon, Dafydd (1980). "A new look at intonation syntax and semantics". In: James, A. &
Westney, P. (eds.). New Linguistic Impulses in Language Teaching. Tübingen: Gunter Narr.
71-98.
33
I thank Harald Fricke, Richard Geiger and Dafydd Gibbon f o r close critical reading of
the manuscript.
164 Α. Fuchs
Grice, Η. P. (1975). "Logic and conversation". In: Cole, P. & Morgan, J. (eds.). Syntax and
Semantics. Vol. III. New York etc.: Academic Press. 43-59.
Halliday, Μ. A. K. (1967). "Notes on transitivity and theme in English". Part 2. Journal of
Linguistics 3: 199-244.
Hawkins, J. A. (1978). Definiteness and Indefiniteness. A Study in Reference and Grammaticality
Prediction. London: Croom Helm.
Keenan, E. O. & Schieffelin, B. (1976). "Topic as a discourse notion: a study of topic in the
conversation of children and adults". In: Li, C. N. (ed.). Subject and Topic. New York etc.:
Academic Press. 337-384.
Labov, W. & Fanshel, D. (1977). Therapeutic Discourse. New York etc.: Academic Press.
Ladd, D. R. jr. (1979). "Light and shadow. A study of the syntax and semantics of sentence
accent in English". In: Waugh, L. & van Coetsem, F. (eds.). Contributions to Grammatical
Studies: Semantics and Syntax. Leiden: E. J. Brill. 93-131.
Lakoff, G. (1972). "The global nature of the nuclear stress rule", language 48: 285-303.
Liberman, M. & Prince, A. (1977). "On Stress and Linguistic Rhythm". Linguistic Inquiry 8:
249-336.
Martinet, A. (1968). "Accent et tons". In: Martinet, Α., La Linguistique synchronique, etudes et
recherches par A. Martinet. Paris: Presses Universitäres de France. 141-161.
O'Connor, J. D. & Arnold, G. F. (1973). Intonation of colloquial English. 2nd, rev. ed. Lon-
don: Longman.
Schubiger, M. (1958). English Intonation. Its Form and Function. Tubingen: Niemeyer.
Searle, J. R. (1969). Speech Acts. An Essay in the Philosophy of Language. Cambridge: Univer-
sity Press.
Sgall, P., Hajitovä, Ε. and BeneSovä, Ε. (1973). Topic, Focus and Generative Semantics. Kron-
berg: Scriptor.
Stockwell, R. P. (1972). "The role of intonation: reconsiderations and other considerations".
In: Bolinger, D. (ed.). Intonation. Harmondsworth: Penguin Books. 87-109.
Worth, D. S. (1964). "Suprasyntactics". Proceedings of the Ninth International Congress of Lin-
guists. Den Haag - Paris: Mouton. 698-704.
DAFYDD GIBBON
This paper describes a procedural model for English intonation, and ap-
plies it to complex dialogue data. In the first section, descriptive (§ 1.1.)
and heuristic (§ 1.2.) assumptions are discussed. A set of descriptive cate-
gories for English intonation is presented in § 2, followed by an analysis
of complex intonation data in § 3 and an outline of a procedural syntax
for English intonation in § 4. In § 5, some extensions of the notion of
'process' to include 'adaptation to context' are proposed, and in § 6 the
main properties of the theory are briefly evaluated in terms of a dynamic
speaker-hearer model.
The main points developed in the paper concern
(1) the metalocutionary hypothesis (§ 1.2.);
(2) the status of discourse tokens (§ 1.2., § 3);
(3) the articulatory bases of intonation and their relation to perceptual
patterns (§ 2, § 5);
(4) the pulse accent theory (§ 2);
(5) iterative and recursive processes in discourse phonology, organised in
prosodic frames (§ 4);
(6) the adaptation of intonation processes to context, described as a feed-
back system (§ 5);
(7) consequences of this procedural approach with regard to the method-
ological problem of 'fuzziness' in intonation description (§ 2.3., § 6).
A major aim of this approach is to provide conceptual bridges between
phonetic, structural and functional aspects of intonation, and for this rea-
son concepts are adapted from a fairly wide selection of disparate fields,
from the physiology of speech to discourse analysis. Emphasis is laid on
analysing 'data in use' rather than intuitively constructed 'data'.
most major authors from Pike (1945) through Jassem (1952) and Crystal
(1969). I have called this view, in an explicit form, the 'metalocutionary
hypothesis', since intonations mark, inter alia, structural properties of lo-
cutions and their functions in dialogue, acting as a 'suprasegmental' or, se-
mantically speaking, 'metalocutionary' system (Gibbon, 1980).
In addition to the classical component of this view, the following hypo-
thesis is maintained: the metalocutionary interpretation function presup-
poses a level-selection or domain-switching function which selects the
relevant level of locutionary structuring for intonational marking. This
level-selection function defines stylistic or functional variation in intona-
tion, and is sensitive to the discourse context. Locutionary structure is
more complex than the intonational means available for indexing it, and
includes at least the following kinds of structure as values of the level-se-
lection function and thus as domains for prosodic indexing:
(1) Word and syllable structure.
(2) Phrase structure: relations between syncategories and categories, spec-
ifiers and heads of phrases (e. g. in an X syntax), early vs. late linear
position (each paired with weaker and stronger accents, respectively);
constituent boundary indexing.
(3) Semantic structure: operator/operand scope, as with negation, quanti-
fication, degree and focus adverbs (stronger accent on the operand).
(4) Topical (semantic frame) structure: anaphora/contrast, other given/
new relations in topic development (also weak/strong).
(5) Speaker attitudes of modal (knowledge, belief, obligation) and apprai-
sive types (emphasis, pejoration, amelioration).
(6) Discourse organisation: turn-taking processes, speech act sequencing,
indexing of completion/noncompletion of dialogue constituents.
(7) Discourse type (style, genre, register) indexing: both via the selection
function itself and via specific genre markers (e.g. the 'chroma' fea-
ture of § 2.1., § 3).
For instance, in read-aloud citations, level (2) will tend to dominate; in
spoken narrative, (4) is more relevant, while in small-talk, (6) may be
dominant. A case of domain selection between levels (2) and (5) or (6) is
discussed by Bing (this volume), Boves & al. (this volume).
This descriptive background is stated here in nuce (cf. also Gibbon,
1982) since, although the body of the paper is more concerned with the
formal organisation of intonation processes, it presupposes this back-
ground.
and functions of such data in the direction of a more precise formal de-
scription; an important feature of the strategy is that it is necessary to pro-
ceed along a broad front, taking form and function description along
roughly at the same pace.
It has proved heuristically useful to distinguish between two data types
and to try and combine use of these types in the descriptive cycle. These
two types centre around concepts such as paradigmatic, static, structural,
classificatory on the one hand, and syntagmatic, dynamic, procedural, raw on
the other. Some of these words may seem odd partners, and there is syste-
matic overlap between the groups, but the data types, which I shall refer
to here as 'P-data' and 'S-data', are real enough in practice. P-data are ed-
ited productions of sentences or sentence constituents, preplanned, read as
citation forms (i.e. as unimaginatively as possible), and paired with para-
digmatic metalinguistic judgments of constituency type ('sentencehood',
'intonation group') or more complex equivalences ('ambiguity', 'substitut-
ability', 'paraphrase'); these subsentential paradigmatic judgments are of-
ten supplemented by sentential functional, syntagmatic judgments on ana-
phora, subjecthood, 'focus-of. S-data are discourse tokens with context
specifications ('context' in the sense of 'cotext' as well as the more general
sense), from a variety of registers along scales such as 'most spontaneous'
to 'most rehearsed'; the context specifications are syntagmatic metaliguis-
tic judgments on function in discourse, and paradigmatic judgments on
constituent types are regarded as derived from these (cf. also Gibbon,
1976a: 44f.).
In developing precise descriptions of selected points, P-data are indis-
pensable but they embody certain heuristic assumptions:
(1) The style/register of pre-planned utterances read and judged as cita-
tion forms is representative of all speech and allows access to the un-
derlying 'language'.
(2) The 'clear case' technique based on straightforward P-data judgments
allows quasi-inductive 'predictions' to be made about less clear cases
on the basis of an exact theory, providing a suitable strategy for dis-
covering the underlying 'language'.
Both of these assumptions are, however, questionable. To avoid the res-
trictions which they bring, a therory, whether developed using P-data or
S-data or both, must be tested continually on S-data in order to falsify ex-
isting descriptions by extending the data base. One valuable contrary tech-
nique to (2) is, in fact, the description of easily isolated, conspicuous, but
by no means 'clear' cases; continued interest in 'call contours' (cf. Gibbon,
1976: Ch. 4; Ladd, 1978), at first sight a peripheral issue, is a case in point.
This technique is, no doubt for psychological rather than logical reasons,
widespread but unsung in all branches of linguistics. A reason for using
S-data, counter to (1), is to account for 'functional' properties of speech:
'functionalistic' need not mean 'seeking teleological explanations', it can
168 D. Gibbon
2.1. Primitives
The primitives are local (short-scope; roughly speaking: word-oriented)
and global (long-scope; roughly speaking: phrase/sentence/text-
oriented).
The local primitives are accent pulses and their modifications (some
modifications being reflexes of global features determined by construc-
tions or strategies).
(1) Accent pulses. Accents are conceived as laryngeal pulses. The larynx
is regarded as a complex elastic body whose steady state with respect to
vocal cord tension is altered by 'stretching' ('Cricothyroid') and 'com-
pressing' ('Sternohyoid') pulses. After a pulse, the larynx returns (other
things being equal) to its pre-pulse state. The auditory reflexes of these
modifications are a pitch change in one direction on the leading flank of
the pulse, followed by a change in the reverse direction after the pulse.
The articulatory 'correlates' are conceptually simpler than the auditory
patterning, and are presupposed by the latter. They are more abstract,
however, from an empirical point of view. Aspects of the pulse theory of
accent are shown pictorially in Figure 1.
The two phonological accent types are (note that the names are not in-
tended to imply a simple association with single muscles):
i. Cricothyroid pulse ( = upward pitch movement), denoted by 'f';
ii. Sternohyoid pulse ( = downward pitch movement), denoted by 'J,'.
The I pulse can be felt with a finger as a narrowing of the intercartilage
ridge at the front of the larynx, the J, pulse sometimes as a lowering of the
entire larynx. Not all short-range pitch movements are functions of laryn-
geal pulses. Pitch perturbations of supraglottal origin, the use of subglot-
tal mechanisms, and other factors, affect vocal cord stretching and com-
pression. Bolinger's Accents A and C could be partially explicated in these
terms as f and J, respectively; the ensuing fall or rise is a natural auditory
reflex of the post-pulse relaxation predicted for unmodified cases by the
laryngeal pulse theory. Other more complex forms of accent are due to
pulse modifications. Similarities with Lieberman's theory (1967) are evi-
dent, at least for the \ pulse.
(2) Accent modifications. The central accent modification is pulse am-
plitude^ degree of pitch change, = Δ ί 0 (i. e. change in fundamental fre-
quency), = prominence), which realises accentual gradation. A second set
of accent modifications affects pulse timing relative to syllable structure:
170 D. Gibbon
. . . plus delayed
(cf. 'emphatic
triggering and
rise-prefixes')
slow rise time
Figure 1: Schematic outline of pulse types and modifications (baseline modifications not
shown); comments apply mainly to rise pulses.
π-frame
[peakline(p baseline]
BASELINE
PEAKLINE
young hedgehog trundled along through the leaves in the greenstuff in the wood . .
n-frame
peakline) p baseline]
PEAKLINE
BASELINE
he'd never been outside of the wood before
2. Accentual sequen-
cing: aj—. . .—aj|—. . .—a;n—. . .—am
3. Syllabic sequen-
cing: si-. . -Sg- . -Sjp. · -Sj 0 -. • -Sk- · -s p
some theories they are postulated as the only type, enumerable in an in-
tonation lexicon (Liberman, 1978). They will not be considered here.
Iterative processes have frequently been postulated for the description
of accent sequences in more or less explicit fashion for many decades. In a
previous critical survey of research I represented one iterative property of
'Tone Groups' in the diagrammatic form of Figure 3; in Fox (this volume)
and Halliday (1967), this iterative property is made explicit. Simple tran-
sition networks formulated as finite state machines may be used to de-
scribe this property of intonation patterning. Since 1978 I have been using
transition networks for instruction purposes to synthesise 'stylisations' of
pitch accent sequences in English (in a sense approximating to the usage
of t Hart) with a microcomputer (cf. Gibbon, 1981b); networks and tones
are interactively definable by the operator. Several approaches have used
similar notions (Reich, 1969; t Hart & Cohen, 1976; Gibbon, 1981a; Pi-
errehumbert, 1980). It is evident that a formal explication of rhythm,
among other things, will have to rely heavily on an interative principle at
some point, whether this is understood, in real terms, as a cerebral 'clock'
frequency or as a perceptual scanning and pattern-recognition principle
(cf. Neisser, 1967).
In a previous study I have shown that in German at least two levels of
iterative process can be identified in complex expository discourse (Gib-
bon, 1981a). At the lowest prominence level, there are sequences such as r,
r r, rr r, . . . with various explainable 'exceptions', where V symbolizes a [
accent and V'symbolises a more prominent [ accent (i. e. with higher pulse
amplitude or following cricothyroid tensing). At the next prominence lev-
[loytnse]
• » © - h S
Figure 4: Outlines of augmented transition networks for the German data from Gibbon
(1981).
176 D. Gibbon
IrtfH
Figure 5: A (top). Iterative network equivalent to Σ2 and Σ3 of Figure 3. Higher values of α
are interpreted here as higher pulse prominence (a function of pulse amplitude and
polarity).
Β (bottom). Generalisation of the above by including recursion. The current value
of β, a pulse prominence increment value, is passed to the network as a parameter
on each call. Lower values of β are interpreted as higher pulse prominence.
Intonation as an Adaptive Process 177
2.3. Strategies
The preceding discussion has shown several points at which similar pro-
cesses appear to be in operation at different levels of structure. One rela-
tively trivial set of examples encompasses 'relaxation' after a f pulse and
the overall downslope or downdrift of the peakline; in this set, various ar-
ticulatory mechanisms contribute towards producing an auditory impres-
sion of falling pitch. A perceptually or functionally oriented priciple apply-
ing here might be a principle of cooperation of means. Another case could be
the use of laryngeal and subglottal gestures with timing modifications to
intensify a paticular auditory effect such as a pitch rise. On the other
hand, the use of a low Prehead to modify a high Head, producing a com-
plex initial boundary tone, or of wide pitch bandwidth (peakline minus
baseline), or of alternating f and j. pulses, or of pulse pairs to acceler-
ate a pitch fall at terminal boundaries, points to the operation of another
perceptual principle: a principle of sequential pitch contrast. Finally - con-
trasting with the last-named principle - in level tones, in sustained-peak-
line speech, and in stereotypic speech involving [peakline = ρ baseline]
specifications, a principle of equilibration appears to be in operation, with
the two laryngeal functions operating antagonistically and under the stabi-
lising influence of negatively phased auditory feedback: a pitch drop trig-
ger the I mechanism, and a pitch rise triggers the { mechanism (and/or
the respective subglottal mechanisms). The effect is to counteract accen-
tual overshoot (for f) or undershoot (for | ) , as well as to work against re-
laxation and downdrift effects. In a previous study of 'call contours'
(1976a: Chap. 4.3.) I called this functional feature of discourse phonology
'chroma', taking the term from Bachem (1950); see also § 5 below.
It is not clear how the notion of strategy is to be integrated into the
overall description; the ideas behind the notion are simple, however. First,
strategies are intended to represent the complementary functional status
of perceptual as against articulatory processes within a dynamic feedback
cycle. Second, they represent an attempt to generalise over processes
which are phonologically different in structure but related in function.
Third, if considered without a hypothesis on underlying articulatory pro-
cesses, they are likely to lead to indeterminacy and uncontrolled variety in
descriptions, since arbitrary decisions have then to be made in chossing
between perceptual models. That there is indeed little clarity in this field is
perhaps informal support for the necessity of a dynamic production-per-
ception relation such as that postulated here.
And one pig, it was erm a young pig, about that size, you know, mid-
dling, anderm it was dead, and it was lying there. I'd never seen a dead
pig before. Absolutely stiff.
B. Di- the children saw it, did they?
A. Oh they were engrossed, you know.
C. Oh yes!
A. It was marvellous, erm they thought this was wonderful. And erm they
asked why it was dead, and er the farmer apparently didn't want his
wife to know, because he'd overfed them before and she'd been furi-
ous - and of course he was trying to keep it from her. But all the kids
were agog about this dead pig, and he was telling them not to tell the
farmer's wife
D. Yeah.
A. and all this . . . So this pig was absolutely dead, so they put it on - they
have a sort of smouldering heap that smoulders all the time - so they
went to burn the pig, and all the kids (laughter) were hanging over the
gate watching this pig. And they were very taken that the pig had died
because it had eaten too much, you know.
D. What a marvellous death!
B. A moral in that somewhere!
This anecdote elaborates a semantic frame with the structure "(ρ be-
cause q) & f , where:
ρ stands for "one pig died",
q stands for "it ate too much",
r stands for an appraisal (perhaps to be glossed as 'fascinated pejora-
tion') of "p because <f or its constituents.
Topical development within this frame is as follows:
Stage 1 (statement of the frame): "one pig died because it ate too much".
Stage 2 (elaboration of the frame as a function of the elaboration of its consti-
tuents, in the following order):
Ει(ρ): "and one pig it was erm a young pig . . ."
E(r): "oh they were engrossed you know . . ."
E(<7): "they asked why it was dead . . ."
Ei(p): "so this pig was absolutely dead . . .".
Stage 3 (restatement of the frame):
r: "and they were very taken that"
ρ: "the pig had died" because
q: "it had eaten too much".
This development is marked in several ways. First, it is very noticeable
that turn-taking is highly restricted, limited to a clarification and a few in-
terjections; this contrasts strongly with the following section of the dia-
logue, in which a new semantic frame is negotiated. Second, topical devel-
opment in the anecdote is marked by conjunction patterns. Of particular
interest are the largely asyndetic sequences associated with r, coordination
180 D. Gibbon
PROSODIC π-frame
FRAMES: "p-line > p b-line
high init. pitch
Ap = wide, downslope
γ-frame γ-frame
downslope downslope
convergence convergence
Figure 7: Prosodic frames and selected specifications for Ei (p) from the Crystal & Davy
corpus extract, showing 'warm' restart ('warm' since existing top-level π-frame
specifications are used). Mapping of prosodic categories on to the locution is sym-
bolised by parentheses and dotted lines; forced closure of open γ-frames after the
restart is symbolised by a large square bracket.
Intonation as an Adaptive Process 183
At the risk of being accused of reading too much into a limited set of ex-
amples, I shall restrict attention to "p because cf in Stage 1 and Ei(p) of
Stage 2 in the dialogue concerned. It would be scarcely possible to cover
more detail in the space of a single article. T h e prosodic structures of the
relevant extracts are represented in Figures 6 and 7. The main aspects of
an appropriate syntax for such structures are shown in Figure 7.
T h e nodes π and γ are interpreted as organisational frames; the pro-
cesses which operate within these frames have access to features denoted
by the 'multilabels' (Eikmeyer, 1980) which are attached to nodes in the fi-
gures.
The overall picture conveyed by the figures is relatively complex; how-
ever, at each level organisation is relatively simple, as Figure 8 shows. T h e
principles underlying the interrelationships between levels are also simple,
though the combinatory possibilities are large, leading to complex struc-
tures in actual use. T h e apparent complexity of specific structures is per-
haps an artefact of the analysis, a kind of 'optical illusion' to which all an-
alytic description is prone. Conditions on mapping between specifications
at different levels of embedding cannot be discussed here.
π-frames
Observable features associated with π-frames are:
(1) long-scope baseline feature specification, in particular [peakline ) p
baseline], downslope, peakline-baseline convergence.
(2) High initial peakline value (ultimately manifested at ß;); cf. the 'in-
itial boundary features' of § 2.1. above. By this criterion, π corresponds
approximately to Fox's "paratone group" (1973) or Brazil's "pitch se-
quence" (1978).
(3) In Figure 6, peakline modifications can be observed in two cases; in
each case, the category with altered peakline is marked as a π-frame.
γ-frames
Observable criteria for γ-frames are chiefly boundary criteria (see below),
though the global π-frame specifications (e.g. downslope) are also ac-
cessed in the internal organisation of γ-frames; these also show internal
downslope (to a lesser extent than the longer γ-frame sequences within π-
frames). In different terminology, the higher-level specifications 'perco-
late' down to the lower level. T h e superordinate π-frame peakline may be
determined heuristically from the heights of Heads of successive y-
frames.
[pÄb.-linel
[specs J
ε* ©—•o
Επ
.
Figure 8: Networks for description of structures represented in Figures 5 and 6. Note the
two kinds of recursion of π-frames: right-branching, in, Σ π and centre-embedding,
in Σ α (cf. "narrative tag" and parenthesis marking, respectively). Application to
other data types will necessitate specifying more end states, inter alia. Details of
ATN tests and actions based on § 2 are not included; a full phonetic interpretation
requires such specifications.
present the levels concerned. The highest level system, Σπ, first defines the
global specification and then calls the next system, Σγ. On return to the
highest level, recursion to Σ π itself is possible, yielding 'narrative tags' or
similar marking stuctures. The Σγ system generates initial boundary speci-
fications, transfers control to the accentual rhythm system, Σ α , and on re-
turn generates terminal boundary specifications; it has the option of ite-
rating (looping). T h e lowest level system, Σ α , iteratively generates
accentual pulses or, alternatively, recurs to Σ π to generate parenthesis
markings. Accent iteration explicates the notion of 'subjective rhythm'.
Boundaries
The boundaries of a γ-frame are designated β; and ß t . The initial bound-
ary of a γ-frame is, in the illustrations, marked by a f pulse of high ampli-
tude; if the γ-frame is, in turn, π-frame-initial, then the pulse is higher still
in amplitude.
A criterion for the initial boundary of a γ-frame is, in a large number of
cases, Anacrusis ('A' in the figures), i. e. a sequence of unstressed syllables
spoken more rapidly than elsewhere and not in the general rhythmic pat-
tern (Jassem & Gibbon, 1980; Jassem, this volume). Anacrusis is taken to
mark the initial unstressed segment of a γ-frame before the Head (the first
accented syllable of a γ-frame). T h e initial boundary is complex, not
correlating with a single boundary tone, but consisting of both Anacrusis
(where present) and Head. The unmarked β; has a mid-pitch Anacrusis
and a f pulse, ( = high Head, Onset). In one fairly common marked case
(cf. the fixed parenthetic expression you know in Ei(p), Anacrusis is high,
followed by a I pulse, ( = low Head), showing the Principle of Sequential
Pitch Contrast. Anacrusis is an important category, both heuristically and
theoretically. Its phonetic properties, as a fast, arhythmic series of un-
stressed syllables (Jassem) and separated by a pitch level change from the
preceding group (Crystal), signal the start of a new γ-frame. Three special
properties of Anacrusis are illustrated in the present corpus extract:
(1) It operates together with the category Head to form a complex
boundary signal β;; the neutral case is lax, i.e. baseline-level Anacrusis
with I Head. Tense ( = high pitch) Anacrusis also occurs, either with f ,
forming an overall tense β;, or with |(as in you know), creating a conspicu-
ous boundary effect by means of the Principle of Sequential Pitch Contrast
(§ 2.3.).
(2) It is not necessarily separated from the preceding γ-frame by a
pause; in several cases, it follows the preceding ß t extremely rapidly
(marked as [fast] in the figures).
(3) It is also independent of, and may be interrupted by, minor hesita-
tion stretches, filled or unfilled. A criterion for continuity of the Anacrusis
in such cases is continutity of pitch contour near baseline level. It has al-
186 D. Gibbon
ready been noted that the hesitation type andern has different discourse
status.
The γ-frame terminal boundary ß t has different properties from β;- the
traditional terminology of Nucleus and Tail is adopted here for exposi-
tion, but not used in the figures. The neutral case of a ß t is simply a f pulse
followed by a fade at all levels: π, γ and pulse level. The fade constitutes
an unmarked Tail. Marked cases are of several kinds:
(1) An [a Accent] [ - a Accent] sequence on Nucleus and Tail, respec-
tively ( = steep falls and steep rises).
(2) [tense] or [tensing] Tail.
(3) Tail with laryngeal equilibration ( = 'chroma', 'stylization'). Note
that (1) and (3) may combine. If so, double-fall or double-rise 'call con-
tours' are produced; (3) alone results in a level contour. In the former
case, the [ - a Accent] contributes to the Sequential Pitch Contrast 'conspir-
acy' which in the neutral case applies automatically through the cumula-
tive effect of 'fade' at all levels.
Nucleus/Tonic
Recent investigations (Brown & al., 1980) have cast doubt on the validity
of the tradtitional notion of 'Nucleus' or 'Tonic', with results which indi-
cate that impressions of 'nuclear prominence' are a function of lexico-con-
textual information (including contrastive contexts) and of sequence-final
position per se, and are not elementary prosodic judgments. Investigations
of Head status have tended to point in the other direction, which may
mean that the so-called Nucleus is not as well-defined a category as Head
in prosodic frames - despite traditional descriptive priorities. The γ-
frames are frequently interrupted and re-started in 'real-time speech'; a
well-defined initial boundary is, perphaps for this reason alone, a more
important point of orientation than a final boundary. When γ-frames are
concluded after a recursion, either the peakline before the recursion is re-
covered, or a new Head is established. In either case, the Head is the de-
termining category, not the Nucleus.
Accentual pulses
The π and γ systems illustrated in the figures have a variety of interesting
properties. The pulse accent sequences themselves belong to a system of
rhythm generation which includes secondary accents. The pulses are
adapted to the current context in three main ways:
(1) position in the peakline-baseline vector defined in the dominating
π-frame or frames, and by the boundaries β; and ß t in the dominating γ-
frame(s);
(2) semantic conditions determined by the current locutionary constitu-
ent, contrastive contexts being associated with increased pulse amplitude
( = higher/lower deviation from the baseline) and syllable lengthening,
Intonation as an Adaptive Process 187
and appraisive contexts with longer leading flank rise time together with
syllable lengthening, yielding certain kinds of 'emphatic' 'rise prefix' in-
cluding delayed rises, rise-falls, 'homosyllabic preheads';
(3) in other styles such as reading aloud, syntactic structure (which pro-
vides constraints of detail in other styles, too) may be the main available
context, in which case pulse amplitude is mainly determined by locution-
ary specifier-head and other operator-operand relations, approximately as
postulated in conventional P-data-based 'sentence stress' descriptions; cf.
also Brazil, this volume).
Although the modifications to accents occur locally, they are deter-
mined by global context specifications which are accessible to pulse rou-
tines; the pulses themselves do not necessarily have to be specified even
for polarity, if the global [baseline )/(p peakline] feature is accessed. Local
specifications such as (1), (2) and (3) above can override the global specifi-
cations, but local specifications are marked specifications. This applies in
particular to pulses in non-Head and non-Nucleus environments, which
do not have any direct affiliation to frame categories in the figures.
These conditions on pulse accent sequences may be restated in the fol-
lowing terms: this approach implies that there is no actual accent type
Head or Nucleus; Head and Nucleus are positions in γ-frames, i. e. func-
tional terms analoguous to 'subject' and 'object', etc., in sentence syntax;
they are associated with 'syncategorematic' position-marking systems β;
and βt , whose phonetic correlates (boundary tones, pulse modifications)
may be said to have a theoretical status within γ-frames analogous to case
inflexions within sentences.
{[artic.,.aud.l.
\ rof name 1
0
tion. On the other hand, the strategy concept also tends to make any in-
tonation hypothesis based on articulatory features difficult to falsify or
confirm. A further implication of the auditorily based overriding strate-
gies, in particular that of larynx equilibration under the stabilising influ-
ence of negatively phased auditory feedback, is that they control a func-
tional adaptation cycle which is sensitive to situational and locutionary
context, but is also operative within the phonological processes of intona-
tion as a dynamic interaction of articulatory and auditory factors. This in-
teractive cycle is outlined in the network of Figure 9. The indices symbol-
ise stages in temporal operation. The full cycle has a planing stage, an
execution stage, as well as checking and (as required) correcting stages. In
this aspect of the present approach lies the motivation for suggesting in § 2
above that the fundamental units of intonation organisation, and possibly
for all phonological organisation, are (articulation, percept) pairs. An ad-
vantage of this conception is that it systematically rejects a 'speaker/
hearer' idealisation for discourse phonology, preferring a dynamic combi-
nation of speaker and hearer perspectives.
To summarise: a process-oriented model of intonation for English has
been proposed, which incorporates a recursive category, the π-frame, an
iterative category, the γ-frame, and a conception of accents as articulatory
pulses. These were applied to the description of discourse data, and for-
mulated as a system of transition networks which take contextual feed-
back at phonological and functional levels into account. Much is still spec-
ulative, there are gaps in the theory, and new kinds of empirical evidence
will be needed in order to test the claims. But if it suggests ways of com-
bining the various lines of analysis required in the study of discourse
phonology, even if it does not to turn out to be viable in detail, the theory
will have served a useful purpose.
Intonation as an Adaptive Process 191
Bibliography
Jassem, W. and Gibbon, D. (1980). Re-defining English stress. Journal of the International
Phonetic Association 10: pp. 2-16.
Jassem, W., Hill, D. R., Witten, I. H. (this volume). Isochrony in English speech: Its statisti-
cal validity and linguistic relevance.
Klinghardt, Η. and Klemm, G. (1920). Übungen im englischen Tonfall. Cöthen.
Ladd, D. R. (1978). The Structure of Intonational Meaning. Diss., Cornell University.
Lehiste, I. (1975). The phonetic structure of paragraphs. In Nooteboom, S. G. and Cohen,
A. (eds.), pp. 195-203.
Liberman, Μ. Y. (1978). The Intonational System of English. Bloomington, IULC.
Lieberman, P. (1967). Intonation, Perception, and Language. Cambridge, Mass., M.I.T. Press.
Neisser, U. (1967). Cognitive Psychology. Englewood Cliffs, N. J., Prentice-Hall.
Nooteboom, S. G. and Cohen, A. (eds.) (1975). Structure and Process in Speech Perception.
Heidelberg/New York, Springer.
Ohala, J. J. (1978). Production of Tone. In Fromkin, V. A. (ed.), pp. 5-40.
Ozga, J. (1980). Ά functional analysis of some pause and pitch step-up combinations', In
Dechert, H. W. and Rauppach, M. (eds.), pp. 221-225.
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Diss., M.I.T.
Pike, K. L. (1945). The Intonation of American English. Ann Arbor, The University of Michi-
gan Press.
Rauh, G., ed. (1982). Essays on Deixis. Tübingen, Narr.
Reich, P. A. (1969). The finiteness of Natural Language Language 45: 831-843.
Trim, J. L. M. (1964). Tonetic stress-marks for German. In Abercrombie, D. et al. (eds.), pp.
374-383.
J. ' Τ H A R T
Introduction
400
Hz
300
200
100
An important assumption
But how can it be done? It is obvious that it is not enough to draw some
smoothed line that would constitute only visually a satisfactory solution:
we shall have to prove that it is representative of the perceived pitch con-
tour. The result of the stylization must therefore be made audible. This is
possible with a special instrument, the Intonator (Willems, 1966), by
means of which the original course of F 0 is replaced by an artificial con-
tour, which is constructed by the experimenter. Nowadays, a computer
program, which is more flexible, has superseded the hardware Intonator
(Vogten and Willems, 1977). The construction of the artificial contour is
guided by two criteria: it should contain the smallest number of straight
line segments (in the plane of the logarithm of F 0 as a function of time),
but it should sound indistinguishable from the original. This is achieved
by trial and error. One could, for instance, start with only one horizontal
line, at the average of F 0 over the utterance. The resultant sound is ridicu-
lous and by no means the same as the original. Somewhat better is, to give
this line a slight tilt downwards from beginning to end, in order to ac-
count for the slowly falling F„ observable in the measurement, the declina-
tion. Obviously, this is still not an auditorily acceptable approximation of
196 J . ' t Hart
the original: the artificial contour clearly lacks a number of major move-
ments up from and down to the lower declination line. If these are added
and carefully timed, the resulting contour will eventually become indistin-
guishable from the original, at least to the ear of the experimenter. The
perceptual equality is later checked in a listening experiment with several
tens of native listeners. See Fig. 2.
400
Hz
300
200
100
50
.5 1 1.5 2 s
Figure 2: Solid line: result of a stylization in such a way that no difference is audible between
the stylized contour and the original (dotted curve).
Results
Functions of intonation
The phonetic approach described has resulted in the first place in a purely
melodic description of intonational phenomena and was not primarily
meant to be used for studying possible functions of intonation.
Nevertheless a number of functions have been encountered, and some
of them have been verified experimentally.
198 J. 't Hart
Theoretical implications
The main aim of the phonetic approach to intonation has been to give an
explicit, and if possible, exhaustive description of the various melodic pos-
sibilities in the given language. Attempts to develop from this a theoretical
framework would not have been essential. Meanwhile, however, the anal-
ysis of such a vast amount of material over a long period of time almost
A Phonetic Approach to Intonation 199
"P2 (a) The nature and the order of all pitch movements in an utterance
are determined by the intonation pattern.
(b) Among the pitch movements of any intonation pattern there is
at least one which possesses such phonetic properties as are ne-
cessary for bringing about a pitch accent.
(c) The location of the accent-lending pitch movement(s) is deter-
mined by the position of the words that carry sentence stress,
and, more specifically, by the position of the lexically accented
syllable in each of these words"
is shown to find empirical support, so that PI must be rejected.
Physiological evidence in support of the assumption that the perceptu-
ally relevant pitch movements are related to corresponding activities on
the part of the speaker is provided by measurements of the electromyogra-
phic signal from the musculus cricothyroideus during the production of
various different intonation patterns, with pitch accents on varying syl-
lables within the utterances (Collier, 1975 a). The contraction and relaxa-
tion of this muscle are thus clearly visible as discrete events and are shown
to be the main cause of the changes of fundamental frequency upward and
downward respectively.
As was implied in the Introduction, one should not be very optimistic
about the relationship between these experimentally found phonetic re-
sults and impressionistically based linguistic conceptions, let alone that the
former would constitute the 'phonetic correlates' of the latter. Collier
(1974) is no longer alone in this criticism, witness recent investigations by
Brown, Currie and others in Edinburgh (Brown et al., 1980, Currie, 1980,
1981). Strangely enough, no-one as yet has even taken the trouble, as far
as I know, to react to Collier's sharp and provocative paper, which he de-
liberately published in a well-known, linguistic journal.
It has become clear to us that attempting to find the relationship be-
tween the abstract mental categories which are the intonation patterns and
their concrete realizations in the form of F 0 curves cannot be successful if
the attempt is made in one stroke ('t Hart and Collier, 1975). Therefore,
we have considered that a 'perceptual detour' must be made in three steps:
(1) from the concrete and global F 0 curve to the concrete and atomistic
pitch movements, in an attempt to design a proper coding system;
(2) from these to the stylized pitch contours (via the grammar), again glo-
bal and concrete; (3) since these are perceptually equivalent to the F 0
curves, the attempt to find the relation of the latter with the intonation
pattern can be replaced by an attempt to find it between the stylized pitch
contours and the abstract, global intonation patterns. And this problem
can be tackled in an experimental way.
The theoretical implication is that it enables us to make explicit what
belongs to the intonational competence shared by speakers and listeners.
There is no reason to suspect that application of the stylization method is
A Phonetic Approach to Intonation 201
restricted to only a few languages. It has already been shown to work for
other languages than Dutch, also independent of the "Dutch school"
(Vaissiere, 1971, for French; Fujisaki and Sudo, 1971, for Japanese).
The basic assumption about the link between the intonational gestures
on the part of the speaker and the extraction of the perceptually relevant
pitch movements by the listener, which is based on their shared implicit
knowledge of the intonation system of their language, must be considered
valid for users of all languages.
References
Brown, G., K. L. Currie and J. Kenworthy (1980). Questions of Intonation. Croom Helm.
Cohen, A. and J. 't H a r t (1967). On the anatomy of intonation. Lingua 19/2: 177-192.
Collier, R. (1970). T h e optimum position of prominence-lending pitch rises. IPO Annual
Progress Report 5: 81-85.
Collier, R. (1972). From pitch to intonation. Unpubl. doct. diss. University of Louvain.
Collier, R. (1974). Intonation from a structural linguistic viewpoint: a criticism. Linguistics
129: 5-28.
Collier, R. (1975a). Physiological correlates of intonation patterns. /. Acoust. Soc. Am. 58/1 :
249-255.
Collier, R. (1975b). Perceptual and linguistic tolerance in intonation. IRALXIII/4: 293-308.
Collier, R. and J. 't H a r t (1972). Perceptual experiments on Dutch intonation. Proc. Vllth
Int. Congr. Phon. Sei., 1971, Montreal, T h e Hague, Paris: Mouton.
Collier, R. and J. 't H a r t (1975). T h e role of intonation in speech perception. In A. Cohen
and S. G. Nooteboom (eds.). Structure and Process in Speech Perception. Berlin, Heidelberg,
New York: 107-123.
Currie, Karen L. (1980). An initial "Search for Tonics". Language and Speech, 23/4: 329-350.
Currie, Karen L. (1981). Further experiments in the "Search for Tonics". Language and
Speech, 24/1: 1-28.
Fujisaki, H . and H . Sudo (1971). Synthesis by rule of prosodic features of connected Japa-
nese. Proc. Vllth Int. Congr. Acoust. Budapest 1971: 133-136.
't Hart, J. (1972). Intonational Rhyme. In Romporti, M . and P. Janota, (eds.). Acta Universi-
tatis Carolinae, Philologica 1. Phonetica Pragensia III. 105-109.
't Hart, J. (1976). Psychoacoustic backgrounds of pitch contour stylization. IPO Annual
Progress Report 11: 11-19.
't Hart, J. (1977). Vers une base psychophonetique de la stylisation intonative. Actes des 8'ime>
Joumees d'etudes sur la parole, Aix-en-Provence. Vol. 1: 167-173.
't Hart, J. (1980). Synthese stilisierter Tonhöhenkonturen als Methode zur Analyse der In-
tonation. Linguistische Arbeiten und Berichte der FU Berlin.
't Hart, J. (1981). Differential sensitivity to pitch distance, particularly in speech. /. Acoust.
Soc. Am. 693, 811-821.
't Hart, J. and R. Collier (1975). Integrating different levels of intonation analysis. ]. of
Phonetics 3: 235-255.
't Hart, J. and R. Collier (1979). On the interaction of accentuation and intonation in Dutch.
Proc. IXth. Int. Congr. Phon. Sei., Copenhagen. Vol. II: 395-402.
Van Katwijk, A. (1974). Accentuation in Dutch, an experimental linguistic study. Doct. diss.
University of Utrecht, Van Gorcum and Comp.: Assen.
Van Katwijk, A. and G. A. Govaert (1967). Prominence as a function of the location of pitch
movement. IPO Annual Progress Report!·. 115-117.
202 J. 't Hart
Klatt, D. H . (1980). Real-time speech synthesis by rule. J. Acoust. Soc. Am. Suppl. 1, Vol 68,
S18 (A).
Maeda, S. (1976). A characterization of American English intonation. Ph. D. Thesis: M.I.T.
Nooteboom, S. G., T. Kruyt and J. Terken (1980). What speakers and listeners do with pitch
accents: some explorations. Nordic Prosody II. Proc. Second Symposium on Prosody in the
Nordic Languages. Trondheim: Tapir.
de Pijper, J. R. (1980). A melodical model of British English intonation. IPO Annual Progress
Report 15: 54-58.
de Rooij, J. J. (1979). Speech punctuation, an acoustic and perceptual study of some aspects of
speech prosody in Dutch. Unpubl. doct. diss: University of Utrecht.
Vaissifire, J. (1971). Contribution a la Synthese par Regle du Franfais. Thdse de 3' 4me cycle,
Univ. de Grenoble.
Vögten, L. L. M. and L. F. Willems (1977). The Formator: a speech analysis-synthesis system
based on formant extraction from linear prediction coefficients. IPO Annual Progress Re-
port 12: 47-54.
Willems, L. F. (1966). The Intonator. IPO Annual Progress Report 1: 123-125.
W. JASSEM, D. R. HILL, I. H . W I T T E N
1. Introduction
2. The Theory
11. What are the relations, if any, between the rhythm units and syntax
or morphology?
The investigation reported on below proposes a method which may give
partial answers to points 2 and 4. It also assumes a hypothesis related to
point 7 and tests it statistically, leading to a possible answer to 10 a and b.
Reference will also be made to point 11.
Two specific theories of the rhythm of spoken English have been pro-
posed. The one, which will here be referred to as (A), was first put for-
ward by Abercrombie (1964, 1973), and the other, referred to as (B), by
Jassem (1952), (slightly modified in Jassem, 1980 and 1981).2
Apart from the fairly obvious postulate that both theories submit, viz.
one of a tendency towards equality of interstress intervals, they have one
important premise in common. They do not start off with any higher-or-
der syntactic units of which the rhythm units would be constituents. In
fact, the 'beats', or 'bars', or 'feet', etc. of either theory, though possibly
correlated with syntactic entities, are independent of them.3
Abercrombie's theory was further developed by Halliday (1970) and
Witten (1977). The two theories may be summarized as follows:
(A) Abercrombie
1. The rhythm unit, called FOOT, always begins with a stressed syl-
lable, consequently any unstressed syllable follows a stressed one within
the same Foot. All unstressed syllables may therefore be described as post-
accentual (or postictic).
2. If any utterance begins with an unstressed syllable, a silent stress is
posited, this being an abstraction manifested as zero sound, i. e. not mate-
rialized objectively, though real psychologically (subjectively).
3. A disyllabic foot is triple-timed and may be represented by one of
the following structures: υ-(short-long), η η (medium-medium), or
2
Jassem 1980 and 1981 are recent editions of earlier works first published in 1954 and 1962
respectively. The modifications of the original 1952 version of (B) contained in these
works were proposed long before the results of the present investigation were available,
though they are largely borne out by it.
3
In this, as in other aspects, the position of both authors are drastically different from those
assumed by the proponents of any variety of Generative Phonology. But it may be interest-
ing to note that Jassem's 'rhythm units' seem to coincide with Chomsky and Halle's 'phon-
ological words'. The locution (utterance, tone-group, sentence, etc.) The book was in an
unlikely place is analysed into three 'phonological words' (Chomsky and Halle 1968:
367-368): tbe-book wäs-in-an-unlikely place. Exactly the same division is obtained by apply-
ing the principles expounded in Jassem 1952. Abercrombie's interpretation would be en-
tirely different: the I book-was-in-an-un- I likely I place.
206 W. Jassem, D. R. Hill, I. H. Witten
— u (long-short). The original version of the theory did not discuss feet of
more than two syllables, and this part of the theory was supplied by Wit-
ten (1977). This mora-based structure of rhythm was not insisted on by
Halliday (1970).
4. The internal rhythmic structure of a Foot is inherently related to its
segmental structure, e.g. (C)V 1 CV(C) 4 and (C)V 2 (C)V(C) both produce
η η , etc. Abercrombie adds, however, that "the phonematic structure of
the syllable m a y . . . at times be quite irrelevant" (1964: 217).
5. There are certain relations between rhythm and syntax. For instance,
. . the quantities depend on the presence of a word boundary" (ibid,
p. 219). There is a rhythmic difference between a one-syllable word fol-
lowed by an unstressed syllable of a word which is not directly related
syntactically, e. g., (take) Grey to (London) as opposed to Greater (London)
- and a word followed by an enclitic, e. g. take it, tell him. It is pointed out
that enclitic treatment of monosyllables is "not entirely clear" (p. 221),
some other cases of "rhythmic linking" being piece of, may there (be), etc.5
6. Stress is assumed (see, e.g., Abercrombie, 1967: 35; Ladefoged,
1975: 222) to be increased effort (or energy). The notion of stress is pri-
mary in relation to the notion of rhythm.
(B) Jassem
1. English speech consists of two kinds of rhythm units: (a) Narrow
Rhythm Units (NRU) and (b) Anacruses (ANA). For a given tempo, the
length of a narrow rhythm unit depends on the number of syllables. This
length is a constant for a monosyllabic rhythm unit and a given tempo,
and may be denoted by Y. As the number of syllables in a narrow rhythm
unit increases, the length of the narrow rhythm unit (NRU) also increases,
but not proportionately. A two-syllable N R U is longer than a monosyllabic
one, but it is distinctly less than 2Y. A three-syllable N R U is also longer
than a two-syllable one, but its length is significantly less than 3 Y. The
length of longer rhythm units is determined analogously.
2. Individual syllables within a multisyllable N R U tend to be of equal
length, i.e., the complete length of a polysyllabic N R U tends to be some-
what equally divided among the constituent syllables.6
4
V 1 = short vowel, V 2 = long vowel or diphthong, C = consonant.
5
As the phonematic structure of the Foot is only sometimes relevant to its internal rhythmi-
cal structure, and the relations between syntax and rhythm are not always clear, it is not
possible, within the framework of theory (A), to deduce the rhythm of an utterance from
its phonemic transcription. Nor does it seem to be possible to make simple additions to a
transcription so as to indicate the internal rhythmical structure of the Foot.
' It follows from (1) and (2) that the relative lengths of NRUs and their constituent syllables
may be graphically represented like this:
Isochrony in English Speech 207
1 syllable ι 1
2 syllables . 11 ρ
3 syllables ι 11 ι. .
4 syllables. ,ι „ ,, ι
etc.
7
It is assumed that accent is perceived in longer utterances due to the durational variation of
syllables, e.g., in David's fighting him now. I, U .LLJUI. J. But one might justifiably
ask how the (rhythmic) accent can be determined - and perceived - in an utterance like
Dinner's ready. 11 . ι ·. ,, where all the syllables tend to be of equal length. The answer is
that accent, like all phonetic, phonological and other linguistic entities, is recognized, in
the speech signal, by reference to internalized (memorized) patterns. It is assumed that pat-
terns like those shown in fn. 6 above are remembered, together with a quasi-absolute 'beat'
length, typical for the given speech tempo. — Mental traces for various quasi-absolute
beat lengths are necessary elements of a conductor's musical memory. — Thus, dinner's
ready . „ >ι » ι two NRUs of two syllables each, rather than four one-syllable NRUs be-
cause these would be almost twice as long, as in Jack bought four dogs. u—..—π—>. It should
also be noted that speech perception is 'heterarchical', with parallel signal processing me-
chanisms active at several levels, and with feedback. These complex processes concur in the
resolution of possible ambiguities.
8
Cf. Ladefoged's examples: speed\ speedy, speedily (1975, p. 103).
208 W. Jassem, D. R. Hill, I. H. Witten
' Cf. such unconventional spellings as kinder (kind of), sorta (sort of), cupper (cup of), etc.
10
Further examples of this type of transcription may be found in Jassem 1949, 1952, 1980
and 1981, and also in O'Connor 1967.
11
According to yet another theory of English speech rhythm proposed by Allen (1968), one
unstressed syllable following a stressed one does not affect the length of the interstress in-
terval, but a further increase of the number of unstressed syllables results in an increase of
the total length of that interval. The isochrony effect is maintained by a "negative correla-
Isochrony in English Speech 209
4. Syllable or phone?
tion between the length of a given unstressed syllable and the number of syllables in the
interstress interval" (p. 52); " . . . as we add more unstressed syllables, the interval gets
longer, but the longer it gets, the more it resists any further increase in length" (p. 53).
However: "Pre-clitics are generally shorter than post-clitics and they may undergo differ-
ent kinds of changes because of these intrinsic differences" (ibid.). Lehiste (1972) also
noted the fact that an unstressed syllable may "shorten" a preceding stressed syllable, or
very nearly fail to do so - according to whether it belongs to the same word or not. T h e
loss of the 'shortening effect' is particularly noticeable between subject and predicate.
12
Syllables consisting of between 3 and 9 phones were embedded in a fixed frame "Take
Park", e.g., 'Take / s e s / Park', etc.
210 W. Jassem, D. R. Hill, I. H. Witten
lable division" (Hockett, 1955: 52) because two peaks are joined by an in-
terlude. A syllabification like /smol-9/ ( s m a l l e r ) is morphophonemic,
not phonetic or phonemic. C h i n a and f i n e r are perfect rhymes in Stand-
ard British and there is no phonological ground for treating them differ-
ently at the level of phone-(phoneme)-to-syllable synthesis. In fact / n / is
an interlude in both words. Such cases make it very difficult, if at all possi-
ble, to measure the length of English syllables in connected speech. But
the boundaries between segments (and phones) can, at least in principle,
always be located in spectrograms. It is therefore preferable to examine
rhythm units, such as Feet or NRUs, or whatever, as sequences of seg-
ments rather than sequences of syllables. Following O'Connor's train of
thought, it seems appropriate to investigate the relations between the
length of a rhythm unit and that of the constituent phones.
5. The Experiment
It will have been gathered from the preceding sections that much more re-
search is needed, both at the speech wave and the perception levels, before
a completely reliable description of English speech rhythm and the under-
lying principle of isochrony can be formulated. The main aim of the pres-
ent experiment is (a) to test a hypothesis on the reality of isochrony in the
speech signal on the basis of some reasonably representative material, and
(b) to see whether there is any statistically justifiable reason for preferring
one of the two specific theories presented in Sec. 3 over the other. Interre-
lations between rhythm and syntax in the light of the two theories will
also come under discussion.
Considering the complexity of the entire problem of rhythm and iso-
chrony in English, outlined above in Sec. 2, the present study can only
hope to be one step in the direction of a complete solution.
Study Units Nos. 30 and 39 of Μ. A. K. Halliday: A Course in Spoken
English: Intonation (1970) have been selected because the recorded materi-
als are readily available commercially, so that - if necessary - measure-
ments or other experiments may be made on the same material by other
interested specialists. The pronunciation of the texts seemed to the present
authors to be a good compromise between careful and deliberate, and cas-
ual and natural. Since a reliable automatic method of segmentation has
not yet been devised, it was necessary to make the measurements visually
from conventional spectrograms. The duration of individual phonetic seg-
ments was measured with an accuracy of about 5 ms and all the measure-
ments were double-checked by at least two of the authors. The incidence
of 'stress' is marked in the printed texts, but was checked by the authors in
careful, though informal, listening tests and was confirmed, except for
one or two cases.
Isochrony in English Speech 211
After extensive preliminary statistical testing, the phones were divided into
18 classes, as follows:
F- flaps and initial voiceless lenis stops,
D- the weak-friction lenis fricatives / ό , ν / ,
G- the non-syllabic vocoids /w, j, r / ,
E- the checked non-open vowels /e, i, 3, CD, λ/
B- non-initial lenis stops,
N- non-syllabic nasals,
H- the aspirate and the initial voiceless fortis aspirated stops,
K- fortis unaspirated stops,
Z- the heavy-friction lenis fricatives / z , 3/,
s c - syllabic contoids,
K H - aspirated unaccented fortis stops,
s- fortis fricatives,
A F V - the lenis affricates /dß, d r / ,
Ο- the close unchecked and the open checked vowels /i:, u:, ae (:),
D/,
KHA - accented fortis stops,
AF - the fortis affricates /tj", tr/,
A- the mid and open unchecked monophthongs /3:, a:, o:/ and the
diphthongs
F T H - aspirated final fortis stops.
It will be noted that the classes include types of phones rather than types
of phonemes. Preliminary statistical testing revealed that there are syste-
matic durational differences between allophones of one phoneme, whilst
phones belonging to different phonemes may be of the same duration.
Initially, a simplifying assumption was made that the mean durations of
the phone types may be calculated irrespective of the two kinds of rhythm
units posited under theory (B).
Table 1 presents the mean durations, with the variances, standard devia-
tions and coefficients of variability.
The differences between the means need not necessarily be all statisti-
cally significant. An analysis of variance showed that at least some of the
means were different at α = .01, as shown in Table 2.
The null hypothesis on equality of all means being rejected at α = .01,
a means-clustering analysis as proposed by Gabriel (1964) was performed.
This analysis groups together those of the means that do not differ signifi-
cantly. It is based on the principle of minimum within-group variance with
maximum between-group variance. This test led (after the 5th step) to the
following grouping:
(F) (D) ( G E B N ) ( H K Z S C ) ( K H S A F V O ) (KHA AF A FTH)
The differences between the means within groups have not been shown
212 W . Jassem, D. R. Hill, I. H. Witten
variance SS df MS F F.05 F. oi
calculated
Table 3: Mean durations of phone classes in rhythm units (ms). Size of class in parentheses.
AF 1.000 —
Α .959 .717
FTH 1.010 -
214 W. Jassem, D. R. Hill, I. H. Witten
(1) With but a few exceptions, the ranking of the means is the same
within each of the three sets of data (and the same as that in Table 2).
(2) Most of the shifts of ranking occur in the ANA column, in which
the means are less reliable because of smaller sizes of the classes.
(3) With very few exceptions, whenever the three means are available,
the ANA mean is the smallest and the N R U mean the largest. Table 4
shows the relative means assuming that for each phone class the duration
in N R U is equal to unity.
Table 5 includes mean values that are critical for the problem at issue
here.
7. Rhythm Models
The functions in Fig. 1 have physical sense only for η = {real positive
integer}. The upper limit of η is not known.
In reality, there is no functional relationship between d and n, at least
because (a) there are systematic differences in the duration of phones, and
rhythm units of a given size may consist of different phone classes, thus
differing in duration, and (b) there is a certain measure of random varia-
tion even if the rhythm units are of the same size and have the same struc-
ture in terms of classes to which the constituent phones belong. There are
no doubt other sources of variation, e.g., the position in the utterance (ut-
terance-final rhythm units may tend to be longer), consequently, a realis-
tic model of isochrony is a regression model in which d = a + bn, i. e.,
one in which the duration of the rhythm unit is estimated from its size on
the basis of a best-fitting analytical relationship between the two variables,
216 W. Jassem, D. R. Hill, I. H. Witten
Figure 2: A regression model of the relationship between d- the duration of a rhythm unit
and η - the number of phones in the unit.
d t
T-
2
η
d t
ι . i . 1 >
1 2 3 4 5 6 η
8. Analysis of Regression
Table 6
d - d d-d
n-fl a (n-fl) σ r r*· 100% a b arctg b
Ρ Ρ
FOOT 0.00 0.00 2.21 1.98 0.72 51.7 -0.0026 0.644 32.8°
NRU 0.00 0.00 1.63 1.47 0.62 39.0 -0.0018 0.561 29.3°
ANA 0.00 0.00 1.53 1.68 0.83 69.6 -0.0036 0.916 42.5°
13
Cf. above, Sec. 6, on the relative mean durations of NRU and ANA.
Isochrony in English Speech 219
ably this factor is the distinction between final and nonfinal position. The
ANA can, by definition, only stand in a nonfinal position. The coefficient
of determination for the FOOT is intermediate between the other two and
indicates that theory (A) is not, in a statistical sense, unacceptable. But it is
shown to obliterate a distinction which is statistically very highly significant
(viz. that between ANAs and NRUs). The isochrony effect is shown in
Fig. 5.
Table 7
r[(n-n),
wm σ(η-η)1 r[(n-n)»,d] r[(n-fl),d]
(n-n)']
a b c H-100%
ANA 2.34 3.81 0.55 0.83 0.56 0.117 0.844 0.0515 70.3
220 W . Jassem, D. R. Hill, I. H. Witten
It can be seen from Table 7 that the coefficients of determination for all
three types of rhythm unit are marginally better than in the linear model,
which is more directly interpretable. Therefore, there is very little, if any-
thing, to be gained from a quadratic regression model.
Table 8
d d
σ- Γ Γ2· 100% a b arctg b
P Ρ
FOOT 5.74 2.09 0.80 63.8 -4.35 0.758 37.1°
NRU 4.25 1.48 0.73 52.8 -3.07 0.724 35.9°
ANA 2.92 1.58 0.90 80.4 -2.78 0.951 43.6°
Table 9
d d ,d d.
η σ(η) 4 r(n,i) r(nA a b c r 2 · 100 %
Ρ Ρ Ρ Ρ Ρ
FOOT 5.76 5.76 5.74 2.21 2.09 0.93 0.72 0.80 1.33 -0.187 0.942 64.4
NRU 4.25 4.25 4.25 1.63 1.48 0.91 0.62 0.73 1.11 -0.221 0.950 53.8
ANA 2.94 2.92 2.92 1.53 1.58 0.96 0.83 0.90 0.17 -0.378 1.302 81.3
first tone group includes the subordinate clause plus the subject and part
of the predicate of the main clause, the remainder of which forms the sec-
ond tone group. Many peculiar tone group boundaries may be found in
Halliday 1970. Here are some more examples: //he was grey and he was
woolly and his//pride was inordinate, he danced on a sandbank in the//mid-
dle of Australia and he//went to the Big God Ngong/ / (p. 121). / / on the Isle
of Man you can//still ride in a horse-drawn tram// (p. 117). Such tone
group boundaries, strange from the syntactical point of view, are due to
the assumption that unstressed (unaccented) syllables always belong to the
same rhythm unit as the preceding stressed (accented) syllables and from
the assumption that silence (pause) is a marker of a tone-group bound-
ary 14 . Model (B) of English rhythm has a simpler relation to syntax and
does not result in such disconcerting discrepancies between the phonologi-
cal and the syntactical structure of running speech.
14
On a distributional view of the tone-group, see Jassem 1978.
224 W. Jassem, D. R. Hill, I. H. Witten
Acknowledgements
The authors wish to express their appreciation of a grant from the Na-
tional Research Council of Canada to the Department of Computer
Science, University of Calgary, Alberta which enabled WJ to work there
during an extended visit to Canada, and to thank the British Council for
supporting a shorter working visit by WJ to the University of Essex, Col-
chester and one by IHW to Poznaü. The co-operation of dr M. Krzysko
and Mr. P. Stolarski of the Computer Centre of Mickiewicz University,
Poznari, in the computing labour is also gratefully appreciated.
References
Abercrombie, D. (1964). Syllable quantity and enclitics in English. In D. Abercrombie & al.
(eds.), In Honour of Daniel Jones. Longmans: London. 216-222.
Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh University Press: Edin-
burgh.
Abercrombie, D. (1973). A phonetician's view of verse structure. In Phonetics in Linguistics.
Longman: London. 6-13.
Adams, C. (1979). English Speech Rhythm and the Foreign Learner. Mouton: The Hague.
Allan, G. D. (1968). On testing for certain stress-timing effects. UCLA Working Papers in
Phonetics 10: 47-59.
Bolinger, D. L. (1965). Pitch accent and sentence rhythm. In D. L. Bolinger, Forms of Eng-
lish. Hokuou Publ. Co.: Tokyo. 139-180.
Gabriel, K. R. (1964). A procedure for testing the homogeneity of all sets of means in analy-
sis of variance. Biometrics 20 (3): 459—477.
Halliday, Μ. A. K. (1970). A Course of Spoken English: Intonation. Oxford University Press:
Oxford.
Hockett, C. F. (1955). A Manual of Phonology. Indiana University Publications in Anthropol-
ogy and Linguistics, Memoir 11.
Jassem, W. (1949). Indication of rhythm in the transcription of Educated Southern English.
Le Maitre phonetique 111/92: 22-24.
Isochrony in English Speech 225
segmenting pitch patterns in some way we can arrive at the irreducible un-
its of intonation. On the contrary: we need to take several other factors
into account, including rhythm, gradient, voice quality, and the way vow-
els and consonants fit the pitch contour. In this paper I shall retain as
much as possible of the traditional approach, including terminology and
symbols; but the effect of going outside RP is that a quite different net-
work of relationships among familiar patterns emerges, and many of the
assumptions on which traditional analyses are based are shown to be un-
tenable.
2. Accent
The simplest job that intonation has to do is to draw the hearer's attention
to a part of a locution, and the gesture involved is an upward obtrusion in
pitch. When this gesture is used to highlight a single syllable, it is conven-
tionally called an accent: the pitch rises to a peak on the accented syllable
and then returns to a lower pitch. The notion of accent is often associated
with Bolinger (1958) but it is in fact one of the oldest ideas in lingustics. It
can be traced from the Vedic tradition through several ancient traditions
to comparative philology, where it proved of central importance from the
time of Verner on. Bolinger's particular formulation of accent is not with-
out its problems, and his accents A, B, C are prosodically complex and
open to the same sort of objections as British tones. Although accents A,
B, C have been used as a universal basis of comparison (Bolinger, 1978),
their description does not seem to my mind to be sufficiently general to
apply even to English outside the standardised varieties.
2.2. Tails
The term tail is used by Kingdon (1958) to refer to any syllables following
the last accented syllable of a tone-unit. The term can more usefully be
used to refer to the stretch following the change of gradient in the accent
contour: that is, the part that levels out at low pitch in the RP type, or the
part that falls sharply and levels out in the Northern type.
The item that begins the tail is the highest in a hierarchy of the follow-
ing kind: (1) a suppressed accent, e.g. rocking ·horse (where (•) marks the
tail), (2) an unreduced trailing syllable, e. g. stele<phone, (3) a reduced trail-
ing syllable, e.g. secreta·ry / sekra^tri/, (4) a syllable coda, e.g. well,
(5) the accented vowel itself, e.g. ηοΓηο ν / , far Πα<α/.
Types 1 and 2 are sometimes described as 'stressed' syllables, but this is
a different use of the term from its use for an accented syllable. Types 4
and 5 are not called stress, presumably because the whole syllable is al-
ready described as stressed. Type 3 are a positive embarrassment to stress
theories, as the syllable is simultaneously stressed and unstressed accord-
ing to the definition of stress being applied at the time. Vowel reduction
shows them to be unstressed, but the change of pitch gradient gives them
a prominence that cannot be disregarded. They are good candidates for
the isochronous type of stress, and can even count for ictus in verse and
form the rhyme, as in this piece of doggerel:
They came to have a picnic and to see
The campus of the university.
shorthand; they may be longer in the case of an actor reciting his lines, or
a phonetician illustrating constructed examples.
3.1. Sandhi
When several accented items are grouped together in a single tone unit,
non-final accent contours are modified by sandhi rules. These are of two
kinds, pitch and rhythm. The pitch rule, which operates in most dialects of
England, levels the tone of the non-final accent contour, and the rhythmi-
cal rule cancels the tempo change at the accent point.
Consider a word like telephonic which has two accents, which would be
grouped into one tone unit in all but the most extraordinary cases. The
pitch rises rapidly to a peak on te-- the 'head', marked (') - and then stays
fairly level until *pho - the 'nucleus', marked Q - at which point it falls to
low. The effect of the tone sandhi rule is to spread the accent contour
over two accents. Since the non-final accent is no longer nuclear, it does
not have the rhythm of a nuclear contour, and the change of tempo at the
accent point is cancelled, with consequent shortening of the first two syl-
lables.
In the stock RP-type h n (i.e. head plus falling nucleus), it is assumed
that both these rules are carried out. In fact they are independent. For in-
stance, if I give my address as ' Farmdale Road I have to make the proper
noun clear as the hearer has to rely on phonological cues to interpret it; in
order to make it clear I may draw it out, even though it is in the head. The
resulting 'Färmdale - where the double macron (") indicates the lengthen-
ing — is in between the traditional 'head' and 'nucleus' and may be one of
the things called a level nucleus. In Scouse, the tone sandhi and rhythm
rules do not apply in sequence, but as alternatives. By far the commonest
type of head simply shortens pre-nuclear contours, so that after the ac-
cented syllable the pitch begins to return to low, e.g.:
a re connaissance 'aircraft over from "Germany
has what would be described as a sequence of 'falls' for RP. This kind of
head is very common outside England, and in order to recognise it, we
need to apply rhythmical as well as pitch hypotheses. Level heads with
long drawn out accent contours do occur in Scouse, e.g.:
I can't stand that "language any ionger
Which had the illocutionary force of removing a foul-mouthed customer
from a Liverpool pub.
Linear grouping is used for single words, short phrases, or simple sen-
tences which present no structural problems for the hearer. The sandhi
rules apply cyclically and in the same way for each pre-nuclear accent, and
to indicate that they form a single group, each accent is pitched a little
lower than the preceding one. This produces a 'stepping head* for RP, or
a sequence of peaks for the Scouse type, e.g. 'Proto-'Indo-'Euro*pean. If
the rhythm sandhi rule has applied, this can be followed by an accent sup-
pression rule, which removes all accents between the first and the last,
thus 'Proto-Indo-Euro^pean. (These suppressed accents are sometimes
marked with a raised dot or small circle in British notation.) This suppres-
sion rule, incidentally, explains why the 'stepping head' can appear so nor-
mal through introspection, and yet rather odd in real texts.
Hierarchical grouping preserves the accentuation of its parts, and as it
were conflates potentially separate tone-units into one. For instance, a list
containing a Ford Escort and a 'Morris "Minor can be conflated into a Ford
' Escort and a Morris Minor: in each potential tone-unit pre-nuclear accents
are suppressed, and the nucleus of the first becomes the head of the con-
flated unit. At the upper limit, this rule can conflate whole sentences, as is
sometimes done by comedians, e.g. She said "What would you like?" he
said "What are you "offering?", where the head begins on like and the nu-
cleus falls on offering which are the potential nuclei of the sentences spo-
ken as separate tone-units. At the lower limit, hierarchical grouping over-
laps with linear grouping. Thus if we take the university and of "Lancaster
as a single tone-unit, we obtain either the ' university of "Lancaster by the
linear rule, or the university of "Lancaster by the hierarchical rule.
next. There is such a signal and it involves modifying the relations be-
tween rhythm and pitch at the boundary. In RP, the trough and rise is
brought forward on to the tail of the first tone-unit, and since this is an
area of slow tempo, the auditory effect is of an upward glide. This is the
'rising tone' of non-final sense groups of the text-books. However, the
rise as opposed to the final fall is not necessarily the diagnostic feature,
because for the corresponding pattern in Scouse, only the trough is
brought forward.
Now it might be argued that a fall + trough is much the same thing as a
fall alone, but this follows from a shortcoming of the tone analysis. The
effect of the trough is to change the gradient of the accent contour.
Whereas the plain accent has a 'convex' fall, this modified version has rel-
atively level pitch up to the tail, at which point it drops rapidly, thus pro-
ducing a 'convex-concave' fall, which we shall symbolise ( \ ) . In certain
cases, when the accented vowel is very short and followed immediately by
the tail, this 'convex-concave' fall is indistinguishable from the RP-type
concave fall, e.g. one fellow got killed, where the peak is reached on / i /
and immediately slumps to low on / l / .
At this point let us consider a Scouse example like The 'Roundie was the
Ro tunda 'Theatre. The first accent differs from the third in gradient, and
the second differs from the third in rhythm. All three would be called
'falls' in the 'tone' analysis. It follows that Scouse cannot be analysed in
terms of 'tones'. Listening to RP data suggests that rhythm and gradient
are relevant also in RP, although in slightly different ways, and that a
'tone' transcription consequently misses much relevant information.
4. Relative Pitch
The importance of relative pitch has been increasingly recognised over re-
cent years. What has not been so clearly perceived is that there are at least
three different problems entwined here. First there is the problem of
identifying the placing of pitch pattern in the total span of which an indi-
vidual is capable (see, e.g. Brown, 1977: 127-134). Secondly there is the
pitch relation between successive tone-units, and thirdly there is the height
of any individual accent contour relative to other accents. Brazil's notion
of key (e. g. Brazil et al, 1980: 26) is defined as a property of the tone-unit
but is in many of the examples a property of a single accent contour. The
analysis of relative pitch is further complicated by different interpretations
of the notion of pitch raising. In one case, pitch movements may simply be
amplified so that top pitch is raised, and the bottom line stays where it is.
In another case, the voice could be re-tuned like a musical instrument, so
that as top pitch is raised, bottom pitch is raised in some proportion. (The
term key suggests the latter case.)
When adults talk to infants, and when people ask questions, the pitch is
often observed to rise within the total span. In either case the raising may
be accompanied by an appropriate change of voice quality, and by a facial
gesture involving an upwards movement of the eyebrows. Question signals
of this kind are observed in many languages other than English, and can
be very marked in some varieties of English, but they are rather neglected
in descriptions of RP.
The relative pitch of tone-unit and individual accent contours are gov-
erned by the use of an upward obtrusion in pitch as an attention device. If
there is no reason to call attention to an item, it is pitched slightly lower than
the previous item, so that successive tone-units step down, and if there are
several accents in one tone-unit, they step down in pitch. There are all sorts
of possible reasons for drawing attention to an item: highlighting a new tech-
nical term or the name of a book, finding an equivalent for bold type or in-
verted commas when reading from a printed text, or even an away win in a
football score. But one of the standard uses of the attention device is to group
items: a relative rise in pitch marks the beginning of a new group, and step-
ping down indicates continuation of an old group. Hence, ceteris paribus, we
would expect tone-units in the same locution, and accents in the same tone-
units to step down. Stepping down rarely needs to be explained; the question
is usually why the speaker steps up.
5. Syncopated Contours
point. What we in fact described was the contour which is neutral with re-
spect to another variable. In fact, the tempo change can be anticipated or
delayed on the pitch contour, and the accent point typically follows.
If the accent point is brought forward, it is followed by the rise and the
fall, and this is what is known as the 'rise-fall tone'. The result is a synco-
pated contour, with the pitch peak just after the accent point. How soon af-
ter it comes is a sociolinguistic variable. For RP, it is usually shown on the
syllable following (e.g. O'Connor and Arnold, 1973: 17). But in Northern
English, and possibly outside RP generally, the peak is reached on the ac-
cented syllable itself, after the accent point; this pattern is in fact listed by
O'Connor and Arnold (pp. 9, 11) as a second possibility. In the case of the
RP type on a word like * excellent, the peak is reached on the second syl-
lable; unless the pattern is correctly interpreted, the accent (or 'stress')
might appear to have shifted to the second syllable. (In this connection it
is interesting to note that the English pronunciation of Calcutta might
well be a misinterpretation of the Bengali "Calcutta accent contours
get complicated, one really needs to know the language to interpret them
and recognise the accented syllable.) The effect of a delayed accent point
is to put it after the peak, and towards the end of the fall; this is probably
one of the things classed as a 'low fall' (.n).
The use of syncopation has to do with the relationship between the con-
tent of the tone-unit and the assumed content of the addressee's memory.
The neutral contour announces the new content irrespective of what is al-
ready there; the rise-fall indicates that the new content exceeds or extends
existing content, and the low fall that it does not (i. e. the low fall suggests
something like 'you already know this'). For a good illustration of a rise-
fall consider:
SI: You're fond of 'dahlias, aren't you?
S2: I'm Very fond of dahlias.
The nucleus falls on very as the variable in the frame (S2) is fond of dahlias,
and because very exceeds this, it takes a rise-fall.
normal position before the verb. This fall-plus-rise can be very useful in
avoiding awkward or impossible syntax, e.g.:
GK: . . . for when we go camping. REK: I don't want a .tent.
Here a tent links the locution to camping, but a tent isn't wanted by me or
a tent I don't want would be rather odd, and the speaker (then aged 3.3)
avoided the problem by putting a tent in its syntactically preferred posi-
tion, marking it as the link with the rise. Many kinds of linking item - and
sentence adverbs, vocatives, subjects, time and place phrases, i/clauses etc.
- can be postponed and marked in this way.
Scouse normally uses the same pattern here, but the very inverse is
sometimes found, with a rise (") that stays up until the nucleus of the fol-
lowing tone-unit is reached, e.g. He be' longs to Liverpool, Bamber, Gas-
goigne, which I would intuitively translate as He be 'longs to Liverpool, Bam-
ber Gasgoigne. This is a very familiar pattern in Irish intonation. Here is
an example from a Liverpool Irishman:
. . . when I liked it was going away was mostly on the ' fishing trawlers
.mostly. . .
If we disregard the problem of analysing the syntax here, the RP equiv-
alent would be . . . on the "fishing trawlers, mostly.
Links only work if they are stored in the hearer's memory. Locutions
which as a whole are directed at some item in the addressee's memory of
the preceding context will often be found to have a rise. An example of
this is the yes/no question. These are used for a variety of purposes in real
life, but if one sits and thinks about them, typical constructed examples di-
rect a proposition to the hearer's memory to check its truth. This is a
candidate for the rise, and add to this the pitch raising for a question, and
the result is the stereotype yes/no question intonation.
These derived rises can combine with other patterns to form difficult
puzzles for the analyst. For instance, it may appear superficially that the
rise is a contradiction contour. Real examples are easily found:
GK: I'll get some muesli and some cocoa from the Co-op.
RK: We don't get muesli from the ,Co-op.
To interpret this, we need to observe that in most cases, positive and ne-
gative operators do not count as variables for the purposes of the parallel-
ism rule, and consequently there may be nothing in a contradictory locu-
tion to take the falling nucleus. By its very nature, a contradiction is a
link, and qualifies for the rise. Note that rises of this kind can be very
economical in answering questions with unacceptable presuppositions, of
the kind Have you stopped beating your "wife? or Are you completely stu-
pid? Here is a real example:
RK: Have we got any per"oxide?
GK: Why, are you going to do your "hair again?
RK: ,No.
The answer no on its own would leave unchallenged the presupposition
238 G. Knowles
6. Gradient
stant, it would be an easy matter to decide which was primary and which
was dependent: if width was primary, the longer the contour the greater
the width. But since gradient is clearly not constant, its relationship to
width remains an open question. For the present, I shall assume gradient
to be primary.
The falling part of the accent contour can be steep or gentle; the details
of the slope depend on whether it is basically of the 'concave' or 'convex'
type. In Scouse, the gradient can be made gentle to the point of being vir-
tually level (~n), e.g. And we're getting 'left here, where the tail on here is
not significantly lower than the accent point of left. The fact that it is a
gentle fall emerges with a very long tail, as in:
Was it "Yorkshire Hilda went to?
where went is audibly lower than York.
We argued above that the fall plus trough is equivalent to the traditional
rising tone, a kind of Irish rise that goes down. If we take the rise-fall and
level out the fall, we get an Irish fall that goes up (,"). This contour, which
steps up from one level to another, is extremely common in Scouse, e.g.
I'm interested in 'bingo; What 'time is it please?; it's ,~all kinds of people go
to it. Again, long tails will fall audibly. Almost identical patterns are found
in Irish intonation, especially South Ulster, in Glasgow, and perhaps New-
castle and Birmingham.
Irish falls shows up a major weakness in the British tone analysis.
(When I first transcribed them, my ear told me they were rises, and my
brain told me they were falls, and it was a long time before I could recon-
cile the two.) It is simply not good enough to trace the pitch movement; it
is essential to see how the segments fit the contour, and take account of
the rhythm and gradient. Now in segmental phonology it is a matter of
elementary knowledge that segments we class as 'voiced' are not always
produced with a full vibration of the vocal folds, and the part played by
segment duration, voice onset time, etc. are well understood. What I am
suggesting here is that we cannot expect a one to one match between in-
tonation patterns and up and down pitch movements. The fact that a
Northern English final / d / is more voiced than an R P one is a minor
phonetic detail, and is of no consequence when we come to the phonologi-
cal relationship between / d / and / t / . Similarly the fact that a gentle rise-
fall comes down a bit more in RP than in the Irish type is a mere detail: if
the British tone analysis insists that one is a 'rise' and the other a 'rise-fall'
then that analysis is fundamentally misconceived. The Irish fall is called a
'rise' by Jarman and Cruttenden (1976), which may be true as far as pitch
movement alone is concerned, but leads to totally unacceptable conclu-
sions at the level of universals. The suggestion that the Irish fall has 'no
special value' or used to have a value which has now been 'lost' (Bolinger,
1978: 510) is simply wrong. True, RP speakers and perhaps Americans are
not sensitive to its value, but that is an entirely different matter. If glosso-
240 G. Knowles
whereas the calling contour ends rather higher. It is noteworthy that call-
ing contours are appropriate in the same situations as gentle gradients,
and would be inappropriate in situations where steep gradients are used
(cf Ladd, pp. 175-6); more particularly, calling contours are used in the
same situations as gentle fall-rises. Now Ladd, following a common-sense
taxonomy based on pitch direction, refers to the calling contour as a styl-
ized fall: I would suggest that it is in fact a stylized fall-rise.
7. Conclusion
The view of English intonation which has been sketched in outline above,
began as a set of crude hypotheses which contained an element of truth:
statements have falls, contradictions have fall-rises, and so on. These hy-
potheses have been continuously refined over a number of years in the
light of detailed study of data from several varieties of English, and in-
deed other languages. This sketch is restricted to what I judge to be the
central patterns of intonation; it is inevitable that as further data is ex-
amined, and more peripheral patterns included, many of the present sug-
gestions will have to be revised. It goes without saying that nobody yet
knows enough about intonation to give a correct or definitive account of
any part of it.
However, unless the diasystemic approach can be shown to be funda-
mentally wrong in principle, it does render a number of orthodox assump-
tions out of date. When linguists set up systems of contrasting phonemes,
it is reasonable to look for a set of contrasting 'tones' (Halliday, 1967;
242 G. Knowles
Brazil et al, 1980), which enter into 'tunes' or 'tone-units' which in turn
have an arbitrary relationship with their meaning. It is difficult to argue
for or against this view, because it is irrelevant, unless one makes it rele-
vant by forcing intonation patterns into prearranged categories in an en-
tirely Procrustean fashion. Tones are not a discovery but an invention. It
follows that there is no point in looking for the meaning of any tone, or
set of tones: it is a waste of time looking for a common meaning for the
set of rising tones.
In order to get at the meaning of intonation, we must investigate the
different patterns that modify the accent contour. Having identified syn-
copation, we can ask what the rise-fall and rise-fall-rise do as a class as
opposed to the low fall and low rise. The notion of gradient establishes
the most unlikely link between the Irish fall, varieties of the fall-rise, and
silent stress, and we can then ask how they are related in meaning. It is
difficult to see how the tone analysis can even lead to the right questions,
let alone find an answer. If the diasystemic approach does little more than
improve the quality of the questions, it will have been worth while.
References
1. Introduction
hearing extracts loudness and timbre impressions from the acoustic signal,
as well as the suppression or masking and the dominance of certain fre-
quency components.
The temporal structure of acoustic events plays an important role in the
evaluation of sound by human beings. Unlike the technical acoustic ana-
lyzers which have been developed so far and which do not change their
properties during signal input, i.e. are time invariant, hearing is a time
variable analytic system with a storage facility.
That is, the result of the analysis depends on the history of and internal
relationships between the acoustic events. Depending on the duration and
development of the events, a human being is apparently able to analyse
their temporal and spectral structure more precisely by concentrating at-
tention on particular aspects; in this way he can apparently overcome the
theoretical limits of technical systems, which are expressed in an indeter-
minacy relation Äf.At = 1. By these means, pitch perception is already
fully developed after a few periods of the fundamental frequency; recog-
nition time is between 4 and 10 msec.
Technical pitch analysers require longer measuring intervals, between
about 20 and 50 msec. Because of the averaging process small pitch varia-
tions are lost. These small variations, however, are evaluated by the sense
of hearing as individual voice characteristics. Depending on the standards
required, differing degress of technical effort are required in order to cap-
ture the details.1
As analogue and digital signal processing techniques have been refined,
a variety of pitch measurement methods have been developed. These cover
both time domain methods which process the signal envelope shape, and
frequency domain methods, which process the signal spectrum.
It is evident that digital systems are attracting the greatest attention at
present. Digital signal processing offers considerable advantages against
analogue processing. The most important of these is that the acoustic data
can be stored for arbitrary lengths of time without degeneration and used
for subsequent numeric and comparison operations without random inter-
ference from noise (e.g. tapes or amplifiers). Since digitally represented
data are in strict temporal relationship by virtue of the sampling fre-
quency, which is lost in analogue-mechanical reproduction from tape, for
example, because of wow and flutter, methods in which signal values have
1
It has turned out that the low precision of pitch identification has proved to be an obstacle
in using high quality speech synthesis for speech output systems (reading devices, informa-
tion systems).
Pitch Extraction 245
2.1. Autocorrelation
In autocorrelation analysis (Dubnowski, 1976), a single frame of the
speech signal of duration At, which lies between 10 and 50 msec, is com-
pared with itself, whereby in effect a copy of the frame is systematically
delayed with respect to the original by a time Δτ. For each value of Δτ, the
autocorrelation value is calculated from the sum of the products of the re-
spective sample values of the original and the copy.
T h e highest value obtains when original and copy are co-extensive, i. e.
when the temporal delay is Δτ = 0. The envelopes of voiced sounds differ
little from one period to the next, of course, so that a maximum re-occurs
when the delay factor Δτ corresponds to one fundamental frequency pe-
riod. This fact is utilised in pitch extraction.
However, some signal envelopes contain side maxima in the autocorre-
lation function, which lead to errors in pitch measurement. These are nor-
mally the same errors as those which occur in older methods, namely oc-
tave leaps during sounds whith low frequency formants.
In more recent pitch extraction methods the attempt is therefore made
to suppress harmonics in the vicinity of the fundamental frequency by pre-
processing the signal envelope. Centre-clipping, in which low amplitude
values are suppressed, has proved to be a usable technique. If the signal is
simultaneously limited to the values + 1 and — 1, which amounts to peak-
clipping, calculations are considerably simplified. In this way, processing
speed can be increased to make real time analysis possible.
2 In digital signal processing the continuous waveform is converted periodically into sample
values. The individual sample values are then converted into a numerical value, usually bi-
nary. The sampling frequency must be at least twice as high as the highest required fre-
quency component in the signal. The number of binary places (bits) required for sample
value conversion depends on the dynamic range required. One binary place is required for
each 6 dB of dynamic range. In speech processing, it is often sufficient to use an 8 kHz
sampling frequency and 8 bits. For high-fidelity music transmission, for instance on digital
audio disks, about 40 kHz and 14 to 16 bits are necessary.
246 Μ. Krause
11 (1
©
.1! 1 11
i1
ι II ' - I
1 PRODUCTS © ©
01141014110114101 SUM = 22
1> 11
©
ill I II
<
II
ί
ACF ( τ = 0 ) = 2 2 / 2 2 = 1
\
11
Λ Λ ~ . Λ ~ PRODUCTS © ®
0-1 0 2 1 0-1 0 1 2 0-1 0 2 1 0-1 SUM = 6
\
ίΤΙί Πί
\
Uli'
ι ι
3 There are currently very efficient algorithms for spectrum calculation in the form of the
Fast Fourier Transform (FFT), which can be executed on large computers or on special
processors. There are now processors which, despite the enormous complexity of calcula-
tions, which increases with frequency resolution requirements, perform the calculation at
approximately real time speeds.
Pitch Extraction 247
w ΓΪ.
"1
cc-
THRESHOLD
: f — — — I - * - * * PC-
1
THRESHOLD
RF
C -o—o-
Figure 2: Outline of centre-clipping (a) and subsequent peak-clipping (b). The result is the
data sequence (c).
* Linear prediction is now one of the most important digital vocoder methods.
248 Μ. Krause
SIGNAL
TIMEt
SIGNAL
SPECTRUM
FREQUENCY -
FICTIVE TIME — t '
CEPSTRUM =
SPECTRUM OF
SPECTRUM
FICTIVE FREQUENCY
= QUEFRENCY
COMPONENTS Τ COMPONENT OF
FROM FORMANT I HARMONIC SERIES
STRUCTURE
For this reason, the error function contains periodicities due to glottal ex-
citation which can be evaluated by analogue or digital means. Uncertain-
ties can occur in pitch extraction, since omission of nasal tract properties
means that not only fundamental frequency components are present.
Center-clipping and peak-clipping allow more favourable results to be
obtained with this method, too. By restricting calculations to a small num-
ber of prediction coefficients, usually 10-15, calculations can be kept rea-
sonably manageable, allowing real time operation.
PREDICTION ERRORS
cally varied (Wise & al., 1976). Deviations within the frame are registered
and statistically evaluated. T h e test waveform with lowest deviation is
taken to be the most probable, and its fundamental frequency is taken to
be that of the tested speech waveform. T h e advantage of the method lies
in the relatively simple calculations which allow fast evaluation, although
on occasion a large number of test waveforms have to be processed.
Since the fundamental frequency of speech sounds only changes rela-
tively slowly, pitch variation can be tracked with a restricted number o f
test waveforms once analysis has been "locked" at the beginning of a
voiced sequence, thereby increasing processing speed.
It follows from the preceding discussion that the last word has not yet
been spoken in the area of research into fundamental frequency measure-
ment in speech. Most digital methods are not yet suited to real time opera-
tion, in which results are continuously displayed with minimal delay due
to the calculation process. Special devices using microcomputers will be
available for this purpose in the near future. Assuming that no new
method is discovered for error-free pitch extraction (and research is con-
tinuing on this), error correction procedures based on other properties of
the speech signal such as formant or amplitude configurations will cer-
tainly be developed. Partial results in this area have already been obtained
(Kretschmar, 1980).
Many methods of pitch extraction require the speech signal to be of
high quality, that is, the fundamental frequency must be present in the sig-
nal. This is not the case, for example, with telephone transmission. Envi-
ronmental noise may not exceed a given level either, a condition which
cannot be met for many speech analysis purposes. This area requires com-
prehensive research in order to overcome these obstacles.
In conclusion, it may be said that there are several techniques which are
well suited to pitch extraction for investigations in the area of intonation.
It is necessary for the researcher in this area to be able to judge the quality
of the methods as far as possible in order to put them to optimal use for
his own purposes. At the present state of technological development he
can assume that "the" method does not exist, so that it is desirable to use
more than one method and to make his own analysis of doubtful cases, if
necessary, by observing the oscilloscope trace of the waveform, particu-
larly in the area of the transients between voiceless and voiced sounds.
References
Dubnowski, J. J. et al. (1976). Real time digital hardware pitch detector. IEEE Trans. Acoust.
Speech and Signal Proc. ASSP-24: 2-8.
Erb, H . J . (1974). Ein Verfahren zur Bestimmung der Sprachgrundfrequenz in Echtzeit. Fre-
quenz 28: 23-28.
Frekjaer-Jensen (n.d.). F-J Electronics ApS. Prospectus for the Pitch Computer Type PC
1400.
Hettwer, G. & K. R. Fellbaum (1981). Ein modifiziertes Sprachgrundfrequenz-Analysever-
fahren für lineare Prädiktionsvocoder. Fortschritte der Akustik 1981: 633-636.
Kretschmar, J. (1980). Untersuchungen zur Natürlichkeit synthetisierter Satzmelodien. Fort-
schritte der Akustik, DAGA 1980: 699-701.
Markel, J. D. & Α. H. Gray (1976). Linear prediction of speech. Springer: New York.
Noll, Α. M. (1967). Cepstrum pitch determination. J. Acoust. Soc. Am. 41: 293-309.
Rabiner, L. R., et al. (1976). A comparative performance study of several pitch detection al-
gorithms. IEEE Trans. Acoust. Speech and Signal Proc. ASSP-24: 399-418.
252 Μ. Krause
Ross, M . J . et al. (1974). Average magnitude difference function pitch extractor. IEEE Trans.
Acoust. Speech and Signal Proc. ASSP-22: 353-362.
Terhardt, E. (1979). Calculating virtual pitch. Hearing Research 1: 155-182.
Wise, J. D. et al. (1976). Maximum likelihood pitch estimation. IEEE Trans. Acoust. Speech
and Signal Proc. ASSP-24: 418-423.
D. ROBERT LADD
X X
The treatment of cases like stiel warehouse vs. steel warehouse under this
analysis is somewhat obscure, since both seem to be noun-noun com-
pounds; here, however, reference is often made to deep syntactic differ-
ences - i. e. 'warehouse made of steel' vs. 'warehouse for storing steel' -
and, though details of such an analysis have never actually been worked
out, the assumption continues to be held that ultimately the whole pheno-
menon will be shown to depend on syntax at one level or another.
The tenacity of this assumption is quite remarkable in view of the exist-
ence of large numbers of problems such as those shown in (3), distinctions
which, in the words of Chomsky & Halle, are 'widely maintained but syn-
tactically unmotivated' (1968: 156). In general, analysts seem content to
This paper was presented at the 11th NELS meeting at Cornell University in November
1980. A number of colleagues at Cornell and Bucknell read an earlier version and gave me
much useful criticism, while both Ruta Noreika and the Wednesday night intonation seminar
at Penn (Fall 1980) gave me their reactions as the present version took shape. To all, my
thanks; blame only me.
254 D. R. Ladd
write off the exceptions to lexical arbitrariness - Chomsky and Halle sug-
gest the possibility of treating them in the 'readjustment component' - or,
in short, to take the syntax-based analysis as far as it will go and then fix
up the rest of the data ad hoc.
As long as the number of the leftovers is not overwhelming, the basic hy-
pothesis about the relation of syntax and prosody is effectively unfalsifi-
able.
My goal in this paper is not to try to patch up the syntactic analysis, but
simply to abandon it and present an explanation of a different kind. As I
will show, this explanation predicts the existence of the exceptions to the
syntactic treatment and accounts for the types of cases in which they oc-
cur. The paper is divided into two parts: first it shows how compound
stress is not just a footnote to the normal stress rules, but part of the
English Compound Stress 255
The stress patterns in Speaker B's replies in these exchanges would often
be called 'contrastive'. Yet it is obvious that the meaning of B's reply in
(4a) is not something like 'John doesn't read books, he burns them' - that
is, it is not contrastive in any explicit sense. Instead, the point of the stress
pattern is to move the stress off books, to deaccent it and refer it to the
context.
The Liberman-Prince theory makes it possible to represent deaccenting
very elegantly as the simple reversal of the s and w assigned to a given pair
of sister nodes in the rhythmic structure. Thus, the normal stress on B's
reply in (4a) (John doesn't read books) would be represented as shown in
(5a). For the deaccented version (John doesn't read books), we simply re-
verse the circled nodes in order to put the w on books; if the contrastive
version were intended, (John doesn't read books, he burns them) we would,
in effect, reverse the circled nodes in order to put the s on read. What the
Liberman-Prince representation makes plain is that there is only a single
phenomenon of marked stress, with contrastive and deaccenting as two
different functions (5b).2 (Notice the Hallidayan viewpoint at work in the
notion of 'functions' of a stress pattern.)
1 A number of caveats must be entered into the record at this point. First: I will be discussing
only compounds whose head is a noun (complex nominals, in Levi's term), but the ap-
proach, if not the specific analysis, can be extended to cover other cases as well. Like Levi,
I find no difference relevant to my concerns here between compounds where the first mem-
ber (henceforth the 'attribute') is also a noun and those where it is an adjective. Second: I
am ignoring differences in the weaker or less-stressed half of the compound, differences
often analyzed as distinctions between 'secondary' and 'tertiary' stress, e.g. long island vs.
Long Island (Trager and Smith, 1951: 69) or butter cup 'cup for butter' vs. buttercup 'type of
flower' (Kingdon, 1958: 195). This decision is based in part on the implicit claims of the
Liberman-Price stress analysis, but it also follows most earlier studies of compounds. Ulti-
256 D. R. Ladd
(5) /-R-
W W
•®cS \
®>
John doesn't read books John doesn't read books,
a. 'normal' b. 'marked' (deaccented or
contrastive)
T h e problem that Schmerling points out is that both of these are in some
sense 'normal stress'; out of context (6a) seems 'normal', but (6b) seems
just as normal in the context of a hospital or a medical convention. Con-
fined as she is to the Trager-Smith-Chomsky-Halle view of normal stress
as a merely automatic consequence of the syntax of a sentence, Schmer-
ling is prepared to use examples such as these as the basis for abandoning
the notion of normal stress altogether.
But to do that would be to throw away a valuable concept. Indeed, the
first step toward treating this puzzle is to take 'normal stress' in the Hal-
lidayan sense of the stress pattern that signals an unmarked focus. 3 This
makes it possible to speak of both stress patterns as normal, in the sense
that both convey the focus this is NP. This focus is reflected in the rhyth-
mic structure by the fact that at a higher level in the tree both versions, as
mately, of course, an explanation will have to be given for these distinctions as well. Third:
It is well known that there is a certain amount of individual and dialect difference in as-
signing stress patterns to compounds. The data here reflect my own speech, but I have
checked with other informants to avoid basing my statements on some idiosyncratic usage.
In particular, I have checked not only individual terms, but, in accordance with the analysis
presented here, pairs and groups of items as well (e. g. I have checked chocolate cake and ap-
ple cake together and find that many speakers make the distinction noted here. Question
marks next to individual items in the data tables indicate those items in which there seems
to be considerable disagreement about the stress pattern.
2 Any discrepancies between standard Liberman-Prince trees and these are intentional, but
cannot be justified here.
3 For stress and focus see Halliday, 1967; Chomsky, 1971; Jackendoff, 1972; Wilson &
Sperber, 1979; Ladd, 1980.
English Compound Stress 257
expected by the normal stress rules, have the s assigned to the rightmost
NP:
(7)
(8) a.
W S
They live in a green house, ('normal')
(10) / \
s w
They live in a green house, not a grey one.
('marked' — contrastive)
258 D. R. Ladd
(11)
s w
I grew them in a greenhouse, ('marked' — deaccented)
As I just showed, the deaccenting can apply within the compound without
affecting the focus information conveyed at a higher level in the rhythmic
structure of the sentence; that is, compound stress can be treated as
marked or non-normal without in any way implying that it is thereby im-
possible for it to occur in a sentence with 'normal stress'.
This meshes very well with recent work on the semantics of compounds
by Downing (1977)*, Kay and Zimmer (1976), and Dowty (1979). What
distinguishes these writers from earlier generative work on compounds
(notably that of Lees [1960, 1970], Levi [1978], and Mötsch [1970]) is that
they do not seek to explain the specific relationships seen in compounds
by positing some sort of underlying predicate relation between the two
parts of the compound. (For instance, steel warehouse is not represented as
being underlying 'warehouse for steel', nor apple tree as derived from 'tree
with apples'.) Instead, they posit a single general compounding relation-
ship that leaves the specific relation to be inferred on the basis of the indi-
vidual lexical items involved. To put it another way, the compound con-
struction does not corfvey an explicit meaning that fully determines the
4
While Downing's experimental study was primarily concerned with the creation of novel
compounds, she found little support for the underlying-predicate approach to compound
semantics; I do not feel that I distort her findings by including them here.
English Compound Stress 259
tions to the traditional rule exhibit some kind of mismatch between what
the compound relation and the stress pattern convey. If my explanation is
correct, then compounds with phrasal stress ought to be cases where the
information conveyed by the deaccenting would be somehow inappropri-
ate - say, cases where any subcategorizing effect of the attribute is rela-
tively small. I will discuss three groups of cases which I think show this
quite clearly.
3.2 The first set involves place names like those shown in (12). We
might predict that these would take phrasal stress, since the head (Avenue,
Road, etc.) is in no sense subcategorized by the attribute: Madison Avenue
does not name a particular type of avenue, Olin Library does not denote a
special category of library, the Golden Gate Bridge is a bridge, etc. As the
data in (12) show, the prediction of phrasal stress on these is largely borne
out. There are, however, a few nouns that are deaccented in such com-
pounds: street, house, town, land, and perhaps a few others. Considering
these each in its own general semantic group, though, one can see that
they are always the least specific or least marked. In city thoroughfare
names, for example, we get at least vague expectations about the nature of
the thoroughfare being named from most of the possible head nouns - we
would expect an Avenue or Boulevard to be wide or important; a Road
probably leads out of town; a Place or a Crescent is probably residential;
and so on. Street, however, gives us no such information. It could be State
Street, in the heart of downtown, or it could be Dogwood Street in some
quiet suburb. There is, in other words, a real sense in which we do get less
information about the category of things being named from Street than
from any of the others, and hence more from the attribute; this is more
typical of ordinary compounds, and is exactly what is signalled by the
stress pattern. 5
Comparable observations can also be made about the cases in (13), in
which the head is the proper name of the inventor or discoverer of the en-
tity or category named by the compound. The case of disease names is
typical here: the relatively vague Syndrome and Disease (like Street) are
deaccented but more specific words like Chorea and Palsy are not. While I
5 Quite some time after presenting this paper, I discovered that both this phenomenon and
its explanation have been noted by non-linguist native speakers, as can be seen from the
following passage:
"Why, in speaking of thoroughfares,' asked a correspondent of John o'London's Weekly in
1936, 'is it the custom to accent the proper name only in the case of a street? It is always
Fleet street, Southampton Street, but Shoe lane, Farrington road, Fetter lane.' The paper's
lexicographer, Jackaw, answered: 'In a town the great majority of thoroughfares are streets;
street, therefore the expected word, needs no emphasis, and the stress goes on the street's
name. Lanes and roads, being much less common, these words are naturally given at least
equal stress with their distinctive names; convenience begets habit.'
('The Street and the Stress', John o' London's Weekley, April 18, 1936, cited by Mencken,
1948.)
English Compound Stress 261
If this seems too facile, there is a simple pragmatic test that seems to
suggest that the distinction between flavors and categories is a real one. If
the head of such a compound can be inserted into the frame ' D o you want
a ?' or ' D o you want some ?' without misleading the ad-
dressee about what is being offered, then the attribute is a flavor. For in-
stance, 'Do you want a sandwich?' is fine even if all the speaker really has
available is, say, a cheese sandwich. On the other hand, if both the attri-
bute and the head must be included in order not to mislead the addressee,
then a separate category is involved; Do you want some bread? is decidedly
infelicitous if what the speaker has in mind to offer the addressee is ba-
nana bread. The reader is invited to try this test on the data in (14); while
the results are not 100% consistent with the stress patterns, the correlation
is quite considerable.
3.4 The final group of cases is provided by expressions where the head
names an artifact of some sort, and the attribute names the material of
which it is made. In general, these also have phrasal stress, as shown in
(15). This suggests that in these cases, as in those involving culinary fla-
vors, the category named by the compound is essentially the category by
the compound is essentially the category named by the head alone. To put
it another way: the material of which an artifact is made, generally is not
relevant for classifying or categorizing it.
There is independent evidence for this in Downing's study of the crea-
tion of new compounds. She suggested that 'naturally existing entities
264 D. R. Ladd
(plants, animals and natural objects) are typically classified . . . on the ba-
sis of inherent characteristics; but synthetic objects are categorized in
terms of the uses to which they may be put. This would seem to correlate
with the fact that synthetic objects are typically created with some goal in
mind, while natural entities generally are not' (Downing, 1977: 831). In
those few cases of (15) which do have compound stress, it seems for the
most part - e. g. glassware, leather goods, gingerbread man - that the ma-
terial really is relevant for specifying the category being named.
syntactic relation between the two. The relevant factor is whether the at-
tribute categorizes or merely describes the head; to determine that, we
may have to consider individual cases against the background of other
possible attributes or other possible heads. Both apple cake and steel ware-
house represent Β made of A, but in the case of cake, the fact that it is made
of apple categorizes it, when compared to other possibilities, whereas for
warehouse, the fact that it is made of steel only describes, especially when
compared to other possible relations between the two lexical items ware-
house and steel.
4. The foregoing analysis of stress patterns in compounds has several
points of interest. First, it explains rather than merely describes the rough
correlation between compound syntax and so-called compound stress.
Second, it makes the description of English simpler, by removing com-
pound stress from the cases to be covered under 'normal stress' and sub-
suming it under the independently needed rubric of deaccenting. Third, it
tends to provide independent confirmation of analyses of compounds like
Dowty's which have a relatively impoverished semantics and a richer prag-
matics, and gives no support to generative models like those of Levi and
Lees. Finally, it may be possible to turn the analysis around - as in the case
of 'flavors' vs. 'categories' - and use it as a tool for investigating taxono-
mies and markedness relations in the structure of the lexicon. For all these
reasons I think it provides some genuine new insight into an intractable
old problem. 6
' Limitations of space make it impossible for me to do more than mention the existence of
two complicating factors. First is the likelihood that any treatment of the semantics of
compounds must distinguish between the 'ordinary' semantic opacity in a compound like,
say greenhouse, and the semantic opacity involved in what may best be described as idioms,
such as white elephant, French letter 'condom' (so also a number of other expressions involv-
ing ethnic slurs), swan song, wallflower, etc. (Note that both stress patterns are found in
these.) Levi (1978: 11-12) argues for just such a distinction in connection with the semantic
opacity of compounds. The implications of this for the analysis presented here are not en-
tirely clear.
The second complication is that purely phonological factors are sometimes involved to at
least some extent in determining compound stress patterns. At least two types of cases
come to mind. First, there is a tendency to stress very long compounds farther to the right
than might otherwise be expected (e. g. travel expense reimbursement voucher, not travel ex-
pense reimbursement voucher, or maple syrup container distributor, not maple syrup container
distributor). Second, it is likely that the leftward shift in short, common compounds such as
oatmeal and ice cream (which are still pronounced oatmeal and ice cream by conservative
speakers) is related to the general leftward shift in nouns in general (e. g. cigarette, still pro-
nounced cigarette by conservative speakers). One might say that such cases are being
treated in effect as non-compounds. This explanation is entirely consistent with the fact
that many monomorphemic .words in present-day English are known to have arisen from
earlier compounds (e. g. daisy < day's+ eye, hussy < house+ wife, sheriff < shire+ reeve).
266 D. R. Ladd
References
0 Introduction
It has always been a vexed question in the study of syntactic accents ('sen-
tence stress', 'contrastive accent' etc.) how to find a method by which the
semantic effects of accents can be established beyond vague generalities.
The adequacy of any method can be judged only after the following ques-
tion has been answered: What ARE the semantic effects of syntactic ac-
cents? In a recent study on 'Accent and Meaning' (forthcoming) I adopt
the following answer:
I have no space here to defend or further explain this view. Given this
conception, I was confronted with the problem of finding a method by
which the attitude/content pairs associated with accent occurrences could
be established in a systematic way. I developed the method characterized
in this paper, the METHOD OF DIALOGUE SCHEMATA, whose basic
features are as follows.
Expression (2) denotes a dialogue schema, or rather, set of dialogue
schemata:1
3
I will use variables of various kinds, all of them italicized letters with or without numerical
subscripts. To keep the degree of formality as low as possible I will interpret the variables
only by informal hints in the text. For the same reason concepts will be introduced by de-
finitions that are formally as unassuming as possible.
Syntactic Accents 269
Condition (c). There ist exactly one function e of the following kind.
(i) The arguments of e are parts / of syntactic units such that each / is
an argument of e if and only if for some (f,s,ü) in A or some (/^s^v) in B,/2
is a 'primitive' constituent of / ( o f / ) relative to the syntactic structure s
(ij). (A primitive constituent is one that doesn't contain any other consti-
tuents.)
(ii) If fa is an argument of e, e(/) is a lexical meaning of f2 in S.
(iii) If / and / are arguments of e and f2 and / are 'occurrences' of the
same word or words, e(^) = e(/).
(iv) For each (f,s,u) in A, if e1 is the subfunction of e whose arguments
are parts of / then u is a meaning of /relative to f,s, e, and S.
(v) For each {fi,sl,ύ) in B, if e1 is the subfunction of e whose arguments
are parts of / , then there is a meaning t^ of / relative to / , ely and S.
Intuitively, function e assigns lexical meanings everywhere in the dia-
logue schema where lexical meanings can be assigned. (On our conception
there are no 'meaningless' primitive constituents but some contituents may
have an 'empty' meaning.) The same word in different places is assigned
the same lexical meaning (condition [iii]: this guarantees constancy of lexi-
cal meanings throughout the entire dialogue schema. The meaning of a
sentence in A is based on the assignment of lexical meanings, and for the
partly interpreted sentences in Β it is possible to obtain sentence meanings
based on the assignment of lexical meanings. (Conditions [iv] and [v]; the
concepts of sentence meaning and meaning base are understood as in Lieb
forthcoming: [100] and [112].)
Condition (e). There is exactly one (f,s,v) in Β such that: (i) there is at
most one accent occurrence in ./given structure s, and (ii) every other ele-
ment ΟίΛ,ίΟ of Β is a PERMISSIBLE EXPANSION of (f,s,v) in S in the
following sense.
There is a SPISU (£,s2,v2) of 5 such that (f^s^ = (f,s) 'preceded by' or
'followed by' v1 is ν 'plus' v2, where "plus" denotes a certain seman-
tic operation on meaning bases; and for any meaning u, if is a base for u
and (f,suu) is uttered, the speaker expresses a propositional attitude
whose content is partly formulated by the {flysz,v2)-part {fyh^i) and
that involves the adressee of the utterance only if the addressee is expli-
citly referred to. (The concept of permissible expansion is exemplified by
[2] and [1] in [2B]).
(For the concept of normal utterance, cf. Lieb 1979: Sec. 2).
'Proper' dialogue schemata are, very roughly, those schemata in which
every completion of each element of the second component is response
compatible with each element of the first component, regardless of idio-
lect system:
2.1 Steps 1 to 5
Step 4. The linguist presents the informants with the pairs of sentences
((Bl), (A)), ((B2), (A)) of each schema (cf. [2]) and elicits for each pair
judgments on the response compatibility of the first sentence with the sec-
ond. This step requires great care.
The linguist may have taped normal utterances of each sentence by a
(the) speaker of idiolect system S*, produced in succession for the sen-
tences of each pair. This serves only for identification of the sentences
(taping may already be required by Step 3); it must not be mistaken as
production of a dialogue, which it is clearly not.
If orthographic representation is chosen, it is CLASSES of sentence
pairs that are presented (cf. Sec. 1.2); in this way 'irrelevant detail' can be
abstracted from right at the beginning. (On the other hand, the linguist
cannot be certain that not relevant detail is lost in the abstraction process.)
Elicitation of judgments. There are two different kinds of judgments that
may be elicited. In the first case, each informant is made to pass judg-
ments on the following question: Is every completion of the version of
the/a (B)-sentence in the German idiolect systems of the informant re-
sponse compatible with the version of the/an (A)-sentence in every Ger-
man idiolect system he knows of or can imagine? In the second case, the
question is generalized: Is every completion of the version of the/a (B)-
sentence IN EVERY German idiolect system he knows of or can imagine
compatible with the version of the/an (A)-sentence in every such system?
Naturally, the judgments cannot be elicited by asking such questions. In
the first case, an appropriate question might have a form such as: 'Imagine
you are having a conversation with another German. He tells you at one
point when this is appropriate: [replaying of recording of (A)-sentence, or
joint characterization of all (A)-sentences]. Would it be just normal for
you to continue by [replaying of recording of selected (B)-sentence, or
joint characterization of all selected (B)-sentences], provided that this is
what you believe and what you see fit to tell the other person, and tell him
in this tone of voice?'4
In the second case, an appropriate question might be of the following
form: 'Imagine two Germans are having a conversation. One tells the
other: [as before, for (A)-sentence]. Would it be just normal for the other
4
The final qualification is intended to make the informant abstract from intonational differ-
ences between the (A)-sentences and (B)-sentences that are unrelated to accent manifesta-
tions.
Syntactic Accents 277
5
We assume a meaning of die frau such that the speaker refers to exactly one object by die
frau in any normal utterance of the sentence.
Syntactic Accents 279
We have not yet shown that the attitude content pairs (8) - if accepted -
should be taken as semantic effects of the accent occurrence on mann. For
this, an additional step is required.
Step 7. The pairs identified in Step 6 are compared with the pairs obtained
by applying Steps 1 to 6 to some set Bi of triples (f,s,v) such that: f = der
mann ist gekommen; s is a structure identical with a structure t h a t / h a s in
B* except that given s, either mann has an accent other than downward
contrastive accent or a word other than mann has downward contrastive
accent; and ν is as in B*.
Put in a nutshell, we change either the accent or the accent place and
see what happens. If different attitude/content pairs are obtained, both
the old and the new pairs are tentatively taken as semantic effects of the
relevant accent occurrences. Otherwise, Step 7 is repeated.
2.3 Comments
the CENTRE: I have not been able to completely exclude the possibility
of expansions that introduce 'extraneous' attitude/content pairs which as
a matter of fact should not be included in meanings of the centre.
(i) An attitude/content pair is N O T associated with the set of centres if
it is ruled out by the following test: For any completion of a centre, any
utterance of the completion by the speaker, and any response by the
HEARER to the utterance, if the hearer assumes in his utterance that the
speaker has the attitude towards the content, then the HEARER's utter-
ance is not a proper response to the speaker's utterance.
(ii) An attitude/content pair IS associated with the set of centres if it
passes the following test: For any completion of the centre, and any utter-
ance of the completion by the speaker, there is a proper response by the
HEARER to the utterance in which the hearer assumes that the speaker
has the attitude towards the content. (Both conditions [i] and [ii] are suffi-
cient ones. The two conditions extend, so to speak, dialogue schemata by
allowing not just for a simple exchange Α-B but for an exchange A-B-A.)
Comment 4. The method does not entail the view that the semantic effects
of syntactic accents are exclusively 'discourse phenomena'. On the con-
trary, it accepts view (1) by which the semantic effects of an accent occur-
rence essentially consist in contributing attitude/content pairs to sentence
meanings. The method exploits the possibility of discourses of a special
type. A speaker uses a sentence with a meaning by which the speaker must
have a certain belief concerning a hearer attitude, and he uses this sen-
tence AFTER an explicit statement by the hearer that he (the hearer) actu-
ally has this attitude. We may thus get at necessary speaker assumptions
by means of a discourse. However, the assumptions are required by the
meaning of the speaker's sentence; they are not required by the hearer's
previous utterance. Put differently, we study phenomena in the domain
of sentence meaning by means of discourses; these phenomena do not be-
come discourse phenomena by the fact that they can be so studied. True
enough, they are extremely important for the structuring of discourse but
this does not force us to exclude them from sentence meaning. I take the
position that discourse structure should be partly explained by sentence
meaning; my conception of sentence meaning is construed accordingly.
References
Lieb, Η. (forthcoming). Accent and meaning. A study of syntactic accents, stress, and rhythm,
with special reference to German.
Lieb, Η. (1979). The universal speech function. A functional account of the relation between
language and speech. In: Ezawa, K., Rensch, Κ. H.: Sprache und Sprechen. Festschrift für
Eberhard Zwirner zum 80. Geburtstag. Tübingen: Niemeyer. 185-194
H E L M U T RICHTER
The present pilot study has its main interest in the applicability to intensity
of methods such as the trend technique, and in their descriptive power.
This technique consists in finding a strainght line which is optimal accord-
ing to Legendre's criterion of least squares 1 . It is most closely related to
the theory of Bravais-Pearson's correlation coefficient 2 and therefore in-
volves all the problems of adequacy in cases of a curvilinear relation be-
tween two variables, well known in behavioural statistics. The focus of the
study will be on the empirical data rather than on reflexions on statistical
methodology. Here, however, it must be said that
(1) the incomplete fitting of a mathematical function is less importantly
suspicious on account of its suggestion of a structure where there is none,
than on account of its liability to fail to grasp a structure;
(2) the relevant acoustic information about intonation may be "coded" in
features of the speech signal which lie beyond uni-directional processes
(involving either an increase or a decrease of one parameter).
I can now add that fitting straight trend-lines to intonation curves really
does seem to do a good job for purposes of automatic phonetic analysis
(Rietveld and Boves, 1979, Takefuta, 1979). This illustrates the first point
above, for there is no original straightness in, for example, the i^-curve.
Moreover, it can be said that intensity is (re-)gaining considerable interest
in intonation research. As to the above point (2), inspection of the mate-
rial available to me soon led me to the question whether the relation be-
tween the slopes of the trend-lines preceeding and succeeding an intensity
peak might be indicative of the dynamics of speech.
First branch is introduced to refer to the part of the intensity curve
which precedes an absolute maximum (peak), the curve being the graph of
a grosso modo monotonously rising function, or, on the other hand, to the
straight line adapted to that curve section, and second branch to refer to
1
The present applications of the technique have a classical forerunner in 'phonometry', a
statistically oriented phonetics of the thirties (Zwirner and Zwirner, 196611, p. 186-188).
2
For a detailed discussion of this theory with respect to phonometrical applications see
Richter, 1974.
284 Η. Richter
Left-asymmetry Right-asymmetry
—1/2—•+•—1/2 —
2 4 6 8 2 4 6 8
Figure 1
3
In fact my wife (a mathematician who segmented and measured registrations and wrote
transcriptions for Zwirner's institutes for years) and myself were these judges.
Intensity as a Predictable Feature 285
culiar function, or that if fitting a straight line to a curve does not obscure
a structure, reacting to the whole configuration will not have this effect
either.
Judge 2 1 r η
Judge 1
1 201 2 25 228
r 3 77 22 102
η 22 9 79 110
226 88 126 440 (number of
curves to be
judged)
4
In a more detailed investigation special attention would have to be paid to judge-specific
tendencies. Obviously judge 1 was polarizing more sharply than judge 2. Another critical
point is the obvious violation, as a result of our convention, of a consistent order
# 1) # n) # r in the diagonal and in both the marginal column and marginal line of Table 1
notwithstanding the difference in individual tendencies. At present, however, the only
technical alternative to giving the non-neutral judgements priority would have consisted in
haphazardly splitting the mixed judgement pairs such as (l,n).
286 Η. Richter
5
I did the experimental work at the Institut für Kommunikationsforschung und Phonetik
(IKP), University of Bonn, together with Rainer Seidel and Dirk Wegner. Another col-
league of the IKP, Bruno Fritsche, provided us with pitch and intensity registrations of the
yes-no answers. The experiment, not primarily undertaken for intonation purposes, was
sponsored by the Deutsche Forschungsgemeinschaft, Bonn-Bad Godesberg.
Intensity was analyzed with an integration time of 10 ms and an input resistance of
500 Ohm, and registered with a speed of 5 cm/s on a logarithmic scale ranging from 0 to
40 dB.
' In both experiments there was a second run prescribing nein where the subject had origi-
nally chosen ja, and ja where he or she had originally chosen nein. The data in the present
paper are from the first run only. Experiments I and II in a sense complete the design of
Richter, 1967 and Seidel, this design being
A's question -<— B's answer (ja and nein intonational • A's commently specified)
Intensity as a Predictable Feature 287
formula indicates how many components and which ones are intended to
change:
l u intended alteration concerning the environmental event (Umge-
bungsausschnitt, first components),
I K intended alteration concerning the communicator (Kommunikator;
second components);
2 intended alteration of both components
The formulations were generated by means of an explicit ad-hoc-grammar;
the event is going to the cinema (or not), the personal concern being to-
gether (or not)7 (cf. Table 2).
As to experiment II, the comments were construed so as to paraphrase
explicitly B's ja or nein in A's perspective, i. e. in terms of the presupposed
and intended situations plus B's inner situation resulting after A's question
(third pair in the characteristics of Table 3; (ao, aa, ad) is to mean that Β
has remained attached to A though A intended to change the presupposed
ο into a). Omitting comments on paradoxical results (with unintended al-
terations),8 we obtain the 31 comments listed in Table 3 in the order of
their occurrence in the experiment, plus the comment
ALSO DU WILLST DIR TROTZDEM DEN FILM ANSEHEN UND
IMMER NOCH MIT MIR ZUSAMMEN SEIN (characteristic (oo, aa,
oo), alteration formula 2 (Ο)).
This should have been number 1 of the corpus in the first half of the com-
ments, but was erroneously made equal to number 17, second half, by in-
serting NICHT between NOCH and MIT9. So number 17 was applied
with all 20 subjects. The alteration formula is now
number of changes intended (number of changes resulting);
(ao, aa, ao) thus has the alteration formula Ικ(Ο). The formulations are as
with experiment I mutatis mutandis.
In terms of L, O, and R the overall results are shown in Table 4. It can
easily be seen that the right-asymmetrical intensity curves are concen-
trated in occurrences of ja. But though the sound patterns [fricative or
semi-vowel + vowel] and [nasal + diphthong + nasal] are sufficiently
different to seem to account for a typological difference such as that be-
tween R, O, or L, the asymmetry-type of the intensity curve cannot be
7 For the present purposes we can leave open taxonomical questions of whether semantic,
pragmatic, communicative and/or psycholinguistic properties are involved in our stimuli.
Another question open for discussion is, whether the question is the means of changing B's
interest or rather the final attempt to make sure that previous interest-changing activities
had been successful.
' In one of the experiments in the project use was made of the paradoxical comments (see
note 7).
' Similarly (though by far not so incisively) the formulation of question 11 deviates in that
according to our stimulus-grammar it should to have been . . . AUCH WENN WIR
DANN NICHT ZUSAMMEN SEIN KÖNNTEN?.
Intensity as a Predictable Feature 289
Experiment I Μ nein
Item number L o R L Ο R
1 1 2 3 4 _ _
2 2 3 — 5 — —
3 3 4 1 2 — —
4 3 1 — 6 - -
5 3 2 3 2 — —
6 2 1 2 4 1 —
7 1 1 4 3 1 —
8 2 2 1 4 — 1
9 — 1 — 6 2 1
10 4 2 2 2 - —
11 — 2 2 6 — —
12 1 2 3 4 - -
22 23 21 48 4 2 (Total: 120)
Experiment II ja nein
Item number L Ο R L Ο R
2 _ 2 1 3 3 1
3 3 1 4 1 — 1
4 4 3 3 — — —
5 2 — 5 2 — 1
6 2 — 2 3 2 1
7 1 2 2 4 — 1
8 - 1 3 4 2 -
Intensity as a Predictable Feature 293
10 5 1 1 2 1 —
11 4 1 1 4 — —
12 3 1 5 — — 1
13 3 2 2 3 — —
14 4 - 1 4 1 —
15 4 1 1 3 — 1
16 4 3 — 2 1 —
17 4 1 5 7 1 2 (Total: 20)
18 1 2 2 2 2 1
19 3 1 1 3 1 1
20 2 1 2 4 — 1
21 2 1 2 4 1 —
22 1 1 — 7 — 1
23 2 1 6 — 1 —
24 2 — 2 4 1 1
25 3 1 1 4 - 1
26 2 2 2 3 1 —
27 3 2 1 4 — —
28 4 1 3 2 — -
29 2 1 2 3 1 1
30 4 — 2 3 1 —
31 4 3 3 — — —
32 1 - 3 6 - -
84 37 69 94 20 16 (Total: 320)
η being the number of cases (120 or 320, respectively) and h and ν being
the number of lines and columns 10 .
A question to be answered before direct reference is made to the a! o-
charcteristics and alteration formulae seems to be: can the asymmetry-type
be predicted better when one knows to what degree the item in question
tends to make the subjects choose ja rather than nein in their simulated di-
alogues?
a) Asymmetry-type andja-affinity
The items were trichotomized for each of the experiments (m: number
of subjects responding with ja) into
7'd-affinity weak m< 4
jd-affinity medium 11 4 < m < 8
ja-affinity strong m > 8
These criteria have the result that 'weak' and 'strong' comprise the ex-
treme quartiles in experiment I and the extreme 6 items ( = 19.4 percent
each) in experiment II; thus 'medium' comprises the middle half of ranked
items in experiment I and 61.3 percent in the 'middle' of the ranking order
of items of experiment II.
Table 5 combines the four contingency tables to be distinguished (two
experiments, ./d and nein). Each contingency table gives the distribution of
ja- or wei»-occurrences upon pairs of asymmetrytype and level of ^ - a f f i n -
ity. Percentages adding up to 100 per column are given in parentheses.
Experiment I
ja nein
weak medium strong weak medium strong
R 2 13 6 21 R 1 1 _ 2
(22.2) (39.4) (25.0) (4.8) (3.7) (0)
Ο 4 11 8 23 Ο 2 2 - 4
(44.4) (33.3) (33.3) (9.5) (7.4) (0)
L 3 9 10 22 L 18 24 6 48
(33.3) (27.3) (41.7) (85.7) (88.9) (100.0)
9 33 24 66 21 27 6 54
(99.9) (100.0) (100.0) (100.0) (100.0) (100.0)
10
The reader is assumed to be familiar with the x2-procedure applied to contingency tables.
Concerning the correlation coefficient as a measure of the goodness of estimating or pre-
dicting values of one variable on the basis of given values of another variable, see e.g.
Richter, 1967, p. 334/5.
11
The double criterion value was applied to item 17 of the comments (20 subjects).
Intensity as a Predictable Feature 295
Experiment II
ja nein
weak medium strong weak medium strong
R 11 34 24 69 R 4 10 2 16
(52.4) (29.6) (44.4) (10.3) (11.8) (33.3)
Ο 4 23 10 37 Ο 8 11 1 20
(19.0) (20.0) (18.5) (20.5) (12.9) (16.7)
L 6 58 20 84 L 27 64 3 94
(28.6) (50.4) (37.0) (69.2) (75.3) (50.0)
21 115 54 190 39 85 6 130
(100.0( (100.0) (99.9) (100.0( (100.0) (100.0)
100-1
Figure 2
296 Η . Richter
12
None of the chi-squares is significant, even though the tables for nein give rise to rather
small 'expected values' in some cells which might lead to overestimations of the respective
deviation from the 'observed values'. According to Lorenz's rule of thumb for evaluating
single chi-square summands (comparing the summand with a criterion obtained by divid-
ing the threshold value for significance, here for Ρ = .05: 9.49, by the number of (inner)
cells of the table, here: 9), in the table for ja, experiment II,
(R, weak) can be said to be overrepresented more than accidentally,
(R, medium) and
(L, weak) can be said to be underrepresented more than accidentally;
(R, strong) and (L, medium) (overrepresentation) and (L, strong) (underrepresentation)
come close to fulfilling the criterion:
R 32 58 90
Ο 26 34 60
L 38 68 106
96 160 256
I rI = .09 (χ2 = 1.1385, df=2\ n.s.). A "factual" correlation of .09 would not signifi-
cantly differ from 0 for df = 254.
Intensity as a Predictable Feature 297
Table 6
Experiment I
lu IK 2
R 10 9 2 21
Ο 6 10 7 23
L 8 9 5 22
24 28 14 66
Experiment II
13
This is in nice agreement with the reciprocal reconstruction tasks ('simulation α fronte'vs.
'simulation a tergo', see Richter, 1982), though no simple explanation seems feasible.
298 Η. Richter
timated correlation for questions equal, or even raises it, but lowers it for
comments, if compared with b:
'presupposed', experiment I .42 (χ 2 = 8.8472, 6;n.s.)
'intended', experiment I .29 (χ 2 = 4.0319, df= 6; n.s.)
'presupposed', experiment II .01 (χ 2 = 1.6518, df= 6; n.s.)
'intended', experimentII .13 (χ2 = 2.3823, df= 6; n.s.)
'resulting', experiment II .15 (χ 2 = 2.9728, df= 6; n.s.)
The low values for experiment II are not surprising when one realizes
that the classification, levelling the differentiation of the stimuli in two
components, results in an underspecification when compared with experi-
ment I, where only the differentiation of the stimuli in one component is
neglected. Indeed when leaving only one component of the comments un-
specified, we arrive at correlation estimates of a size comparable to those
of Table 7, experiment I (cf. Table 8).
'presupposed', 'intended' I f I = .05 (χ2 = 8.9449, df= 14; n. s.)
'presupposed', 'resulting' \f\ = .29 (χ2 = 11.3795, 12; n.s.)
'intended', 'resulting' I rI = .27 (χ 2 = 11.1086, 16; n.s.)
The unions of classes were formed in order to have enough cases in the
cells of the contingency tables (compare note 12). That it was possible to
do this on non-arbitrary grounds is due to our combinatorics, which had
Table 7
Experiment I
asymmetry-type asymmetry-type
vs. 'presupposed' vs. 'intended'
oo oa ao aa oo oa ao aa
R 4 5 9 3 21 R 7 4 5 5 21
Ο 7 6 5 5 23 Ο 4 8 7 4 23
L 6 5 2 9 22 L 5 5 9 3 22
17 16 16 17 66 16 17 21 12 66
Experiment II
asymmetry-type
vs. 'presupposed' vs. 'intended' vs. 'resulting'
oo oa ao aa oo oa ao aa oo oa ao aa
R 16 16 17 20 69 R 21 15 17 16 69 R 19 18 15 17 69
Ο 9 11 9 8 37 Ο 8 9 13 7 37 Ο 10 9 10 8 37
L 22 23 21 18 84 L 21 22 22 19 84 L 22 23 26 13 84
47 50 47 46 190 50 46 52 42 190 51 50 51 38 190
Intensity as a Predictable Feature 299
Table 8
Experiment II
asymmetry-type vs.
'presupposed' 00 00, oa oa ao ao aa aa
'intended' aa oa/ao ao oo/aa oa oo/aa 00 oaJao
R 10 6 11 5 8 9 13 7 69
Ο 3 6 5 6 5 4 2 6 37
L 11 11 13 10 11 10 9 9 84
24 23 29 21 24 23 24 22 190
'presupposed' 00 00 00 oa oa oa ao ao ao aa aa aa others*
'resulting' 00 oa ao oo oa aa 00 ao aa oa ao aa
R/ 5 8 6 6 10 7 11 7 4 5 8 8 21 106
Ο
L 8 5 6 6 112 6 8 6 6 8 2 10 84
13 13 12 12 21 9 17 15 10 11 16 10 31 190
'intended' 00 00 oa oa ao ao aa aa others**
'resulting' 00 oaJao oa oo/aa ao oo/aa aa oa/ao
R 11 7 5 10 8 4 8 8 8 69
Ο 6 1 4 2 6 6 4 3 4 37
L 9 11 7 11 9 9 5 14 9 84
26 20 16 23 23 19 17 25 21 190
the following effects 14 . As to the first of the matrices in Table 8, there are
just as many complete inversions ((οο,αα,?), {οα,αο}), etc.) as partial inver-
sions taken jointly. Similarly in the third matrix there is a relative prepon-
derance of complete agreement of the type (?, οο,οο), (?, αο,αο). The second
matrix shares with the third a weak representation of complete inversions
(types (?, oo,act) and (oo, ?, ad), but shows no marked preponderance of to-
tal vs. partial agreement (so the union had mainly to be made here in the
asymmetry-type dimension).
14 T h e combinatorially expected column sums are (18.4, 24.5, 24.5, 24.5, 24.5, 24.5, 24.5,
24.5) for the first matrix ( χ 2 [test vs. observed numbers:] = 3.4901, df = 7; n.s.); (12.3,
12.3, 12.3, 12.3, 18.4, 12.3, 12.3, 18.4, 12.3, 12.3, 12.3, 18.4, 24.5) for the second matrix
( χ 2 [test vs. observed numbers:] = 11.0110, df = 12; n.s.); and (18.4, 24.5, 18.4, 24.5,
18.4, 24.5, 18.4, 24.5, 18.4) for the third matrix ( χ 2 [test vs. observed numbers:] = 7.2394,
df= 8; n.s.).
300 Η. Richter
»
( N being the number of items, tn^, m the arithmetical means, ίμ, ίξ the
standard deviations of μ and ξ), in oraer to approach the relevant degree
of predictability.
Experimer It I
'presup posed' 'intei ided' sum of ph
aa 00 oa ao ao oa 00 aa line
sum of
column -1.20 - . 2 5 - . 0 7 + 1.33 - . 6 5 + .05 + .08 + .33
pv 1 2 4 1 3 4
Experit nen it II
*presu| posed* 'inter ided' 'resu ting' sum of ph
oa 00 ao aa ao oa aa 00 ao oa 00 aa line
2(2) -.13 + .22 + .20 + .44 - . 1 3 + .20 + .22 + .44 - . 1 3 + .20 + .44 + .22 + .73> 8
l„0u> + .33 + .20 - . 1 0 -.50 + .20 - . 5 0 + .33 - . 1 0 + .20 - . 5 0 -.10 + .33 -.07' 6
s
1„<°) -.50 -.33 0 + .75 - . 3 3 + .75 - . 5 0 0 0 -.50 -.33 + .75 -,08 5
2(1*) -.40 + .13 - . 6 0 -.33 -.40 -.60 + .13 - . 3 3 - . 3 3 + .13 - . 4 0 -.60 -1.20' 2
'K(°> -.40 -.57 0 -.50 -.50 -.57 0 -.40 0 -.40 -.57 -.50 -1.47' 1
pv 1 2 3 4 1 2 3 4 1 2 3 4
15
Item 1 was evaluated by m^ = - . 0 8 .
302 Η. Richter
rank numbers ξh and ξ ν , and ξ was defined as the sum of these numbers
for a cell:
Shv: = Sh + ξν (4)
Some correlation between μ and ξ must lead to a concentration of nega-
tive μ in the lower left area (low ξ^, ξ ν combining to low ξ) and a concen-
tration of positive μ in the upper right area (high ξ^, ξ ν combining to high
ξ) of the matrices. Cells with negative μ are marked by small triangles; so
it can be seen empirically to what degree the concentration takes place16.
As a result, we have a considerable further increase of coefficients:
'presupposed', experiment I Γμξ = + .80 (s. at 1-percent level, df=
10)
'intended', experiment I ίμξ = + -40 (n.s.)
(N= 12; τηξ = 4.5, ίξ = 1.3844; m^ = -.016, ί μ = .3654)
'presupposed', experiment II ?μξ = + .46 (s. at 1-percent level, df=
29)
(N= 31; m^ = 6.936, ς = 2.5645; m^ = -.079, ί μ = .3566)
'intended', experiment II Γμξ — + .46 (s. at 1-percent level, df=
29)
'resulting', experimentII Γμξ = + . 5 1 (s. at 1-percentlevel, d f =
29)
(N = 31; τηξ = 6.903, ίξ = 2.5318 [different location of the empty cell
for item number 1], mμ and as above) 17
What has been observed then, concerning the predictability of intensity
patterns of ja on the basis of our interest-changing questions and com-
ments on answers to such questions, is a general tendency for quantitative
expressions of predictability (correlation coefficients I r l a n d Γμξ) to in-
crease when the number of non-specified componente of the vectorially
14
The correlation indeed depends on an order in the data, which can be shown with the fol-
lowing possible matrix where the technique would obviously fail:
-1 +1 0
+1 -1 0
0 0 0
Since the numbers of responses per item are small, μ might have come out pseudo-exact.
Therefore also correlations r ^ were calculated, μ ^ ε ^ round values attributed to the ele-
ments of the equivalence classes of positive μ (μ' = + 1), negative μ (μ = — 1); for μ = 0,
μ' = 0. The outcome is very similar:
'presupposed', experiment I 'ξμ = + -73
Γ =
'intended', experiment I ξμ + -45
'presupposed', experiment II rξμ = + .42
'intended', experiment II 'ξμ = + -43
'resulting', experiment II 'ξμ = + -50
(significances like those for Γξμ, with the exception of .42 and .43 which are significant at
the 5-percent level for df = 29).
Intensity as a Predictable Feature 303
Maximal
coefficients
obtained
1.0 τ
questions
0.5 - comments
./'a-affinity
A Β C Number of
χ
0 Η non-specified
0 components
a alteration formula
b d/o-characteristic
c combined
Figure 3
18
In this sense also a grouping according to 'type of transition' (sc. of change in equal posi-
tions of a pair in different components) is less effective than the groupings according to
alteration formula and characteristic (questions):
304 Η. Richter
For similar reasons I tend to be cautious about explaining the lesser de-
gree of predictability for the comments in terms of phenomena such as
difficulties on the subjects' side in coping with a more complex organiza-
tion compared with the questions. In experiment II the verbal reactions to
the comments were highly specific, so that missing a correlation as high as
.80 may well be due to the research worker's difficulties in finding an ef-
fective or appropriate stimulus metric.
This leads to another ground for caution. When at the beginning of this
text auditory perception was mentioned, any formulation was avoided
which might suggest a direct perception of right-asymmetry vs. left-
asymmetry. I shrink from interpretations in terms of, say, hesitating vs.
precipitating sound gestures. These must be possible with nein as concom-
itant too; nein, however, turned out almost exclusively to be left-asymmet-
rical. Strictly speaking, the predictable entities are, as has been seen in the
case of both the contingency tables and μ, proportions of responses rather
than shades of the individual's reaction.
e) Asymmetry-type and F^
Intonation is carried by more parameters than by intensity, and the
present text was not intended as an argument "against F0 (or pitch)". In a
forthcoming article19 I will, instead, study the covariation of situational
characteristics, stated in terms of 'open' vs. 'closed', with in greater de-
tail than when compared with the present observation regarding the
asymmetries of intensity. There is one question left, however, to be
answered immediately: to what degree is there a correlation between in-
tensity and Fq} Were this degree a high one, the observed regularities
might eventually have to be attributed to Fq rather than to intensity.
18
(Fortsetzung)
<J* a - er** ο —• a***
a ο
R 9 10 1 21
Ο 10 8 5 23
L 9 11 2 22
28 29 9 66
* items 1, 3, 5, 6, 9
** items 4, 7, 8, 10, 12
*** items 2, 11
I f I = .23 (χ2 = 2.4725, df = 4; n. s.), the amount of the coefficient coming close to the
5-percent threshold for factual correlations with df= 64 but not providing any gain rela-
tive to the estimated correlation for ja- affinity.
19
See Richter, in Winkler (ed.), 1983.
Intensity as a Predictable Feature 305
Ex perime it I
f If rf fl 1 rl fl lr r
1 5 2 2 - 5 6 - 21
R
1 1 2
- 2 9 - - 4 8 23
Ο
1 —
3 4
2 2 11 1 3 2 22
L
2 14 18 1 9 1 1 1 1 48
3 9 22 4 12 16 66
2 14 20 1 10 1 4 1 1 54
20
T w o judges as above were in accordance with 94.8 percent of the occurrences (417 out of
120 + 320 cases). T h e 23 cases in which the judgements diverged were decided about in
overt discussion. Three ternary configurations having been discovered by the two judges
independently, were subsumed under binary types by omitting the non-initial/non-termi-
nal segment: f l r : f r , r f r : r , f r f : f (compare, in this connection, Table 11 and the relevant
computations).
Perceptually, most of the configurations will, as a consequence of low stimulus duration,
result in pitch gestalts in which no "falling", "level", or "rising" states o r movements are
experienced. Following Richter, 1967: 360, these auditory results could tentatively be la-
belled as follows:
hang (Hang) « - f jump (Sprung) «-If lift (Hub) « - r f
bend (Knick) «- fl jerk (Ruck) «-1 fly (Schwung) «- rl
turn (dreh) « - f r push (Schub) « - l r stand (Stand) «-r
It is a different question, how untrained subjects will label the phenomena that are given
them in their auditory experience. Nomenclatoric traditions from music may function
here in a way similar to the functioning of orthographic traditions in segmental phonetics,
and the gestalt of 'stand', to take one example, was assumed to be the response to the sig-
nal property of short rise not only on behalf of our own impression labelled with relatively
little innocence, but also as an accounting for many innocent people's labelling their im-
pression evoked by short rise in the signal as "level". So when we take holism seriously,
this does not lead us to a phenomenalist methodology but makes us favour the discrimini-
tion experiment. Note, by the way, that jumps or jerks in perception are not by necessity
evoked on the (sole) basis of FQ.
306 Η. Richter
Ex1 perimei it II
f If rf fl 1 rl fl lr r
10 4 23 2 2 - 13 12 3 69
R
2 4 6 1 2 1 16
3 6 11 - 1 1 9 5 1 37
Ο
5 4 4 1 3 2 1 20
20 9 18 4 9 15 5 4 84
L
16 21 34 1 10 5 4 3 94
33 19 52 6 12 1 37 22 8 190
23 29 44 1 12 10 6 5 130
As can be seen from the df, χ 2 was allowed to exceed the conventional tolerances (see
note 12) in order not to enforce too small correlation estimates.
22
For a more detailed discussion of ras a parameter of R see e. g. Richter, 1967: 334/5.
Intensity as a Predictable Feature 307
It is evident that at least the order of these numbers can be given consist-
ent meaning in the two series (sign of i^-slope and of the difference be-
tween intensity peak abscissa and interval centre, respectively). After tech-
nically extending f to ff, 1 to 11, and r to rr, we accordingly get the
correlation diagrams of Table 11 where figures on the upper left are again
for ja, figures on the lower right for nein; i meaning 1st F 0 -component, t
meaning FQ-component, A meaning asymmetry-type.
These combinations of data allow for the calculation of product-mo-
ment correlations (compare formula (3)) the underlying structure of
which is clearly brought about by extracting the one common factor for
which loadings g of variables can approximately be determined with the
'centroid method' of factor analysis (see e.g. F, 1980) 23 . As can be seen in
Table 12, some variation as to the two experiments and as to the two sen-
tence adverbs occurs, but the pattern is a general one:
- Correlations r,·, between the two FQ variables are negative; their amounts
are greater than those of r ^ and of r^; the negative sign is due to the high
proportion of signals resulting in 'lifts' and 'turns'.
- Correlations r ^ between the second /^-component and asymmetry-type
are positive; as a rule, their amount comes second. Correlations r ^
(asymmetry-type vs. first component) centre around zero.
- Factor loadings g are negative for i, positive for t and for A; since they
express the correlation between the empirical variables and the factor, the
factor itself appears to "push" terminal FQ upward, the intensity peak to
Experiment I
W t -1 0 + 1 \ A -1 0 + 1
i n. (1) (r) (L) (O)
(f) i (R)
22 11 9 2 22
+ l(r)
20 1 1 22 22 20 1 1 22
9 4 16 6 10 13 29
0(1)
14 10 1 29 25 24 1 25
3 12 5 4 6 15
-1(f) 15 7 "1(f)
2 1 4 4 3 7
34 4 28 66 22 23 21 66
36 12 6 54 48 4 2 54
23 That I riA\ and I rtA\ are smaller than I f| estimated on the basis of Table 10 might be due
to some non-linear relation in Table 11; our basing the following argument on a factor an-
alysis will to a certain extent compensate for not having tried to increase the correlation by
transforming i, t, or A.
308 Η. Richter
\ A -1 0 +1
t (L) (O) (R)
5 12 11 28
+ l(r)
3 3 6
2 - 2 4
0(1)
11 1 12
15 11 8 34
-1(f)
34 1 1 36
22 23 21 66
48 4 2 54
Experiment II
i
t -1
<f)
0
(1)
+1
(r)
\i A -1
(L)
0
(O)
+1
(R)
52 1 8 61 22 13 26 61
+ 1(0 44 + l(r) 7 49
5 49 37 5
19 12 22 53 23 12 18 53
0(1) 0(1)
29 12 6 47 35 7 5 47
33 6 37 76 39 12 25 76
-1(f) -1(f)
23 1 10 34 22 8 4 34
104 19 67 190 84 37 69 190
96 13 21 130 94 20 16 130
\ A -1 0 +1
t (L) (O) (R)
24 15 28 67
+ l(r) 12 3 21
6
13 2 4 19
0(1)
11 1 1 13
47 20 37 104
-1(f)
71 13 12 96
84 37 69 190
94 20 16 130
Intensity as a Predictable Feature 309
lOO^for:
r r r i t A
it iA tA gi gt gA
Experiments \bcl\Ja - . 4 1 + . 0 4 + . 1 2 - , 5 6 + .68 + . 1 4 31.7 46.0 2.1
nein — .29 — .08 + . 1 1- . 5 1 + .54 + .23 26.4 28.9 5.5
Experiment I, ja - . 6 4 - . 2 5 + . 2 5 - . 7 8 + .78 + .38 61.4 61.4 14.8
nein - . 5 1 — .14 + .27- . 6 6 + .73 + .38 43.0 53.2 14.8
ja&c nein - . 6 0 - . 2 4 + . 3 5 - . 7 3 + .78 + .47 52.8 61.1 22.5
Experiment II, ja — .34 + .12 + .07- . 4 8 + .64 + . 0 6 22.7 40.8 .4
nein - . 2 2 - .04 + .07 — .44 + .47 + .17 19.7 22.2 2.8
ja Si nein — .32 + .03 + .13- . 4 8 + . 6 1 +.18 23.1 36.8 3.3
Experiments I&II,
ja&c nein - . 3 8 - . 0 4 +.18 - . 5 5 + .64 + .27 29.9 41.3 7.5
the right, and initial FQ downward, when increasing, and to "push" termi-
nal FQ downward, initial FQ upward, and the intensity peak to the left,
when decreasing 24 .
— What counts most with regard to the covariation of FQ with intensity:
the 'explanatory power', measured as the square (multiplied by 100) of its
loading for a variable is by far greater for t and for i, than for A. In a ma-
jority of the cases distinguished, more than 90 percent of the variance in A
can not be attributed to a factor which, on the other hand, accounts for
about one third of the variance in each i^-component.
So determinants other than those relevant for FQ will occasion the rela-
tive location of the intensity peak. The sophisticated instrumental phoneti-
cian will not have overlooked an early hint at segmental influences likely
to play a röle; it is the bias of the occurrences of nein to left-asymmetry
that caused us to confine the subsequent tests to ja, but it is the asymme-
try-variation in ja that had rendered the status of bias to a uniformity
which would otherwise not have surprised us. To sum up: the shaping of
intensity is not sufficiently explained by reference to /^-configurations
24
It should be noted that for experiment II, ja, the solution in terms of something like an
"open cadence & intensity deferment factor" just fails to be the optimal one: a solution in
terms of a "closed cadence & intensity deferment factor" (g, = + .64, & = — .49, gA =
+ .14) is slightly better than the one given in Table 12 for reasons of conformity. (Cumu-
lating the squares of loadings, cf. the last three columns of Table 12, we get the ratio
.659: .639 = 1.03 [ = 3 percent gain by the alternative solution] as a rough criterion.)
Since for experiment I, ja, the solution given is the one with roughly the highest overall
explanatory power, and as the sums of 100 g2 for experiment II singled out (lines 6-8) irre-
spective of their optimality (with the faint exception of line 6) amount to only one half of
the sums of 100 g2 for experiment I (lines 3-5), one is tempted to think of a mapping, inex-
plicable yet, of 'a fronte' vs. 'a tergo' into the numerical data. (Compare the reflected im-
age traits of Figure 2, commented in note 13.)
310 H.Richter
and to the basic segmental make-up of the types in our sample, and its sta-
tistical linkage with situational characteristics can not be reduced accord-
ingly. It is an open question what mechanisms of phonetic substance medi-
ate the observed influence.
References
Faverge, J.-M. (1980). Mathematisch-statistische Methoden in der Psychologie. Huber, Bern etc.
Hofstätter, P. R. (1953). Einführung in die quantitativen Methoden der Psychologie. Barth,
München.
Richter, H. (1967). Zur Intonation der Bejahung und Verneinung im Hochdeutschen; in: Eg-
gers, H., et at. (eds.), Satz und Wort im heutigen Deutsch. Schwann, Düsseldorf.
Richter, H . (1974). Eine anschauliche Interpretation des Korrelationskoeffizienten nach Bra-
vais-Pearson. In Richter, Η., K. H . Rensch, M. Sperlbaum and E. Knetschke (eds.), Spe-
zielle methodische Untersuchungen sprachlicher Phänomene Niemeyer, Tübingen 1974 [Pho-
nai, suppl. 2].
Richter, H . (1980). Commenting on question-answer pairs. Grazer Linguistische Studien 17.
Richter, Η . (1983) FQ variation dimensionally reconstructed; in: Winkler, P. (ed.), Investiga-
tions of the speech process. Brockmeyer, Bochum 1983.
Rietveld, A.C.M., and L. Boves (1979). Automatic detection of prominence in the Dutch lan-
guage. Abstract in: Proceedings of the Ninth International Congress of Phonetic Sciences,
vol. I. Institute of Phonetics, University of Copenhagen, Copenhagen.
Seidel, R. (n.d.). Zur sozialen Variabilität bei Frage-Antwort-Interaktionen. Μ. A. thesis,
University of Bonn.
Takefuta, Y. (1979). Some experiments in the digital extraction of American intonation pat-
terns. Abstract in: Proceedings of the Ninth International Congress ofPhonetic Sciences, vol. I,
Institute of Phonetics, University of Copenhagen, Copenhagen 1979.
Zwirner, Ε., und Κ. Zwirner (1966). Grundfragen der Phonometrie Karger, Basel/New York,
2nd ed.
MITSOU RONAT
First I would like to thank my colleague and friend Jacqueline Gueron for her very important
criticisms and comments. Thanks also to Barry Schein and Thelma Sowley for having read
and criticized the present version.
1 For those readers who may be not familiar with the current framework in generative gram-
I The Data
the meaning of B's reply in (1) is not the explicit contrast: John doesn't READ books, he
WRITES (REVIEWS, COLLECTS, BURNS) them. Rather, the point of the accentual pat-
tern is that books is deaccented; the focus is broad, but the accent falls on read by default.
2
Dafydd Gibbon communicated his own criticisms on Ladd's hypothesis to me in this con-
text.
Logical Form and Prosodic Islands 313
( )
^" FOCUS"'
(10) a. [[^ ][ ]]
b- [[χ- •][ ]]
3
In Ronat (in preparation), I propose a general analysis for the various phonological, syn-
tactic and semantic properties of prosodic binding.
4
See Adjemian (1978).
Logical Form and Prosodic Islands 315
II The Problem
(22) (Aux Beaux-Arts, les etudiants font le portrait les uns des autres)
a. Moi le garfon que j'ai vu (e) [peindre un portrait], c'est lui.
b. Moi le garg:on que j'ai vu [PROARB peindre (e) aux Beaux-Arts],
c'est lui.
([At the School of Fine Arts, students paint each other's portraits]
a. In my opinion, the boy who I saw paint a portrait, it's him.
b. In my opinion, the boy who I saw painted at the School of Fine
Arts, it's him.)
(22.a) is good, contrary to (15.b), because the empty element is not in-
side the prosodic island; (22.b) is bad because the empty category is inside
the prosodic island. Consequently, (23) not interpretable at all:
(23) (. . . = 22) Moi le tableau que j'ai vu [peindre aux Beaux Arts],
c'est celui-ci.
(In my opinion, the painting which I saw painted at the school of
Fine Arts is this one)
(23) is out, either because of the prosodic island constraint if tableau is
understood as the underlying object of peindre, or else because tableau is
not, semantically, a correct subject for this verb, which requires an ani-
mate one.
Quantifier-Movement is subject to the same constraint: quantifiers can-
not be separated from their sources by a prosodic island boundary:
(24) (Dans ce magasin, les objets soldes coütent dix francs).
a. Paul aurait bien tout voulu acheter (e) [pour dix francs]
b. Paul aurait bien voulu [tout acheter (e)pour dix francs]
c. *Paul aurait bien tout voulu [acheter (e) pour dix francs]
([In this store, objects on sale cost ten francs) Paul would indeed
have liked to buy everything for ten francs.)
(25) (. . . = 22)
a. Combien as-tu vu peindre (e) [de tableaux]}
b. *Combien as-tu vu [peindre (e) de tableaux]?
(How many paintings have you seen painted?)
Interestingly, the prosodic island constraint functions only for the ele-
ments embedded in the deaccented constituent. That is, the empty-cate-
gory/antecedent relation is not blocked if the deaccented element precedes
or follows, but does not contain, the empty category. In (26.a) the relation
is blocked, in (26.b) it is not:
(26) a. * [ X (e)]
b. [(X) (e)]
Since the deaccented element always coincides with the last constituent of
the sentence, the difference between (26a) and (26b) is difficult to test. Yet
the intuitive judgements are clear. Consider example (27): it is ambiguous;
it can mean either that Mary appreciates the beauty of Paris or that she
proved to be sensitive when she was in Paris.
(27) Marie s'est montree sensible ä Paris
318 Μ. Ronat
namely PRO and NP-trace. One may ask whether or not their relation to
their antecedent is affected by prosodic binding. The answer is no:
(33) (A: Veux-tu venir au bord de la mer?)
B: Avec plaisir; j'aime beaucoup [PRO nager], c'est Jean qui m'a
appris [a PRO nager]
(A: Would you like to come to the beach?
B: With pleasure, I like swimming very much; it's John who
taught me how to swim.
(34) L'enseignement actuel ne favorise pas les vocations d'ecrivains.)
Le fils de Jacques [voulait PRO ecrire]
(Modern teaching does not favour the writer's vocation. James'
son wanted to write.)
(35) (A: Tu sais la nouvelle? Paul a battu Marie)
B: Jean a ete [battu (e) par Paul]. Rien d'etonnant ä ?a.
(A: Have you heard the news? Paul beat Mary.)
B: John was beaten by Paul. Nothing surprising about that.)
(36) (A: Robert ne s'interesse pas aux gens bien portants)
B: Le fils de Jean-Jacques [semble (e) etre malade] Presente-le-lui!
(A: Robert is not interested in healthy people.
B: Jean-Jacques' son seems to be sick. Introduce him to him!)
Consequently, examples (33)-(36) illustrate a new distinction between
NP-traces and P R O on the one hand, and A-bound traces on the other.
(For the importance of the opposition A-bound/ A-bound( x ), see
Chomsky, 1981). This is not surprising, since other distinctions estab-
lished on independent grounds are now well-known, particularly the "in-
visibility" of PRO and NP-traces to phonological rules, e.g. the wanna
contraction^) (see Chomsky, 1981: Ch. 3.2.2., Jaeggli, 1980).
After presenting the basic data concerning prosodic binding and empty
categories, I will turn to a more interesting question: how is it possible to
explain such phenomena in the current framework?
We immediately have two candidates, namely the ECP, the empty cate-
gory principle proposed by Chomsky (1981) or, in another version, by
Kayne (1981a), and the CCC, the complete constituent constraint, pro-
posed by Gueron (1980). Roughly, the former is a "syntactic" hypothesis
(if LF is to be considered as syntax), and the latter is an interpretive one.
These conditions are each given in (37), (38), and (39), respectively:
(37) (Chomsky (1981)):
If α is an empty category, then
(i) α is PRO if and only if it is ungovernedW
(ii) α is trace if and only if it is properly governedW
(iii) α is a variable only if it is Case-marked
(38) (Kayne (1982)):
An empty category β must have an antecedent α such that
(1) α governs β or
(2) α c-commands β and there exists a lexical category X such that
X governs β and α is contained in some percolation projection of
X.
(39) (Gu6ron (1980)):
A complete constituent X' may not contain a variable not bound in
XK
(A complete constituent is an X' in which X*~l is governed by a
logical operator; examples: N P as names, S as propositions, VP as
properties).
(The relation (X\ X1-^) is that of 'immediate domination' in a constituent
structure tree, where both X> and are heads of constructions in the
sense of X-bar theory. X usually ranges over {5, NP}, and i over {1, 2} e. g.
5° = S; S1 or S immediately dominates S, S2 or S immediately dominates
S, etc. Ed.)
Chomsky's hypothesis, ECP, essentially requires that the empty element
be properly governed; Kayne's insists on the relation between the anteced-
ent and the empty element: the chain of 'superscripts' must not be broken;
Gueron's, CCC, is relevant in cases where a variable is free in the 'presup-
posed' part of the sentence.
Within the ECP framework, the natural claim is that prosodic binding
creates an abstract governing category**) (an abstract S?) which functions
as an absolute barrier for government and prevents empty categories from
being linked to their antecedents:
(40) Prosodic binding creates an abstract governing category7
Thus, (40), if true, might explain observation (18). In this case, one must
note that although structure building rulesW for LF may not be desirable
7
O n e can explain why the relation between an Α-bound category and its antecedent is not
affected by the 'governing category' created by prosodic binding, in suggesting that in this
case the coindexing takes place before the matching of syntax and prosody.
Logical Form and Prosodic Islands 321
8
Anaphors must receive special consideration. They are problematic in the context of this
paper, for two reasons. First, it is difficult to find agreement among native speakers on
such data, since it is difficult to construct natural discourse involving prosodic binding and
anaphors; informants generally tend to reconstruct contrastive situations. Secondly, the ac-
ceptability (or not) of such data may be important for the model itself. See Appendix 2 to
this article for further discussion.
' Riny Huybregts (unpublished work) explores the hypothesis that the clitic is an A-binder.
Similarly, Aoun, in his GLOW paper delivered at Göttingen (1981), explores this hypothe-
sis.
322 Μ. Ronat
IV Theoretical Considerations
On the basis of the data available so far, it seems very difficult to choose
between the two hypotheses, extended ECP or CCC, the former having
the advantage of empirical adequacy, and the latter the advantage of ex-
plaining the contrast (15.b)/(41). One can speculate that a solution may be
found in a third hypothesis, which would subsume the advantages of the
preceding ones.
This solution will depend crucially on the place attributed to intonation
in core grammar. Previous studies either rejected intonation as being out-
side of the domain of competence (and it is true that emotions, for in-
stance, are also expressed by means of intonation), or they included it in-
side the grammar, but as a part of the phonological component, very close
to stress phenomena. See Liberman's work (op. cit.). Selkirk's (1978)
proposals, however, constitute a first departure from the standard posi-
tion: they establish that prosody must be represented by a set of metrical
trees, independent of syntax, which expand autonomous prosodic catego-
ries. These prosodic categories are supposed to be matched with syntactic
trees, at the level of surface structure, on the left side of the model (cf. Ap-
pendix 1 to this article. Ed.)
But the data presented in this paper show that intonation must look at
S-Structure, at Logical Form and perhaps at deep structure, too: obvi-
ously, it is not sufficient to propose a matching between prosodic trees
and surface structure. The idea which seems natural, then, is that intona-
tion may function as a component which is autonomous from but related
to syntax, filtering sequences generated by the syntactic component from a
'third' dimension, to refer to a concept used by Vergnaud in another con-
text.
Logical Form and Prosodic Islands 323
The third solution might adopt a more abstract point of view than syn-
tax and prosody, and say that Universal Grammar may define what counts
as a governing category in each component. The rhythmic notion of 'repe-
tition', for instance, could subsume the recursive nature of syntactic gov-
erning categories (TVPand 5),10 and the nature of discourse binding. Then
(40) and (42) could be tentatively restated as (45):
(45) An Ä-bound category must be bound within all the governing catego-
ries in which it is embedded.
Needless to say, much work remains to be done in this area. In any case
the evidence presented in this paper strongly supports a theory which pos-
tulates the existence of empty categories over one which does not, and
strongly supports the linguistic reality of the antecedent/empty category
relation, since it can be 'heard', indirectly. Moreover, the same evidence
indicates that an important part of intonation must be treated as part of
linguistic competence; it would be strange to postulate that rules of per-
formance could be based on the PRO/WH-trace distinction.
BASE
1
Lexicon + Deep
syntactic structure
Π
SYNTACTIC
TRANSFORMATIONS
S( = Surface syntactic)-structure
PHONOLOGICAL SEMANTIC
INTERPRETATION INTERPRETATION
I
Surface structure
I
Logical Form
10 Notice that for the purpose of our discussion, the syntactic definition of governing cate-
gory does not include the notion of 'accessible subject'. In fact, the examples presented in
this paper show that the constituent under prosodic binding must contain the governer of
the empty category. This must be taken into account in case (29).
324 Μ. Ronat
Reference
1. Aims
XTR
RH0
WGL
PGL
see .
250 .
230 .
158 J
100 J
50
RKM
Figure 1: Acoustic parameters of
an emphatic utterance
(female speaker)
ose AKM = maxima of auto-
correlation function
PGL = intensity
WGL = speed
RH0 = zerocrossings
S 2. 2,23 ζ, Μ 3.« £.3C XTR - minimax
I I I I n n H I I I 1 1 II I I I I H I t I I I H I I 1 I I i I U I I l-ll I t I 1approximation
1ι1I
Acoustic Parameters of Emphatic Segments 329
replacements of the standard curve types by other curve types. The new
configurations are not totally 'new' patterns, but they start from the
phonematically predetermined constellation. The change has to produce a
contrast effect in order to be audible as non-linguistic marking. The re-
dundancy will be used to interpolate further information; this can be done
by intensification, interruptions, supplements etc., in respect of one or
more parameters. This results in changing the combination of curve types
within a segment or segment-boundaries. It also follows from this that
neither particular curve types nor particular parameters can carry the in-
formation alone, but rather the configuration as a whole. The resulting
new combination may perhaps be prototypical for some types of affect, si-
tuation and so on. This will be ignored in the following analysis; the pur-
pose is only to list and calculate the combination patterns and to look for
prototypical differences between neutral and emphatic sequences.
To this end, neutral and emphatic utterances of two speakers were se-
lected from a dyadic live conversation. The sequences were classified into
seven curve types of four acoustic parameters at the segmental level for
computing the probabilistic combinations and comparing the two sorts of
utterances. Only the dynamic characteristics of parameters will be consid-
ered (such as curve direction: increasing or decreasing, etc.), not absolute
frequency (Hz) or intensity (dB). The results describe the combinatory
possibilities of f G with other parameters in a comparison of neutral and
emphatic segments.
3. Results
The 244 neutral segments are, in detail: 48 open vowels, 43 closed vowels,
25 voiced plosives, 9 unvoiced plosives, 20 voiced and 52 unvoiced frica-
tives, 32 nasals and 15 liquids. The usual phonetic classification of the em-
Table 1: Combinations of parameters and curve types emphatic vs. neutral speech
type / \ V Λ I ί parameter
parameter Σ ϊ S KS Γ XJ
M Α
Fo emph.
neuL A
<= Ρ
Ρ P«
<= Μ
Μ
Α^
P-=
Ρ
Ρ
Ρ
Ρ
245
244
35
34.9
24.4
32.2
.34
.48
.87 17.89
/ e A A A 28 4 .17
η
<=
v^ A A V^
A^
V^ V^
A
-s
P^ 19 2.7
3.3
3.8 .35
.76 8.25
A
e A A A^ «= V A^ A 12 1.7 1.9 .25
.57 8.97
η A A Vs V s
V V^ A 5 0.7 1.0 .42
\ e A
<= A P^ A A P<= M^ 71 10.1 9.1 .25
.82 4.74
η V« A A^ A A A*= A « 29 4.1 3.8 .23
Λ e A P^ A A A A Α β 90 12.9 10.2 .14
.90 11.66
η A A^ A A A A M^ 161 23 22.5 .20
V e A A A A A A 19 2.7 1.1 .19
.51 4.58
η A A A A V^ A A 18 2.6 2.4 .19
1 e V A V A P^ 22 3.1 2.7 .36
.40 18.95
η V A v-= V A V« 5 0.7 1.3 .51
t e V A A^ V V v^ A 3 0.4 0.5 .34
.48 17.54
η V A V<= V V A«= A 7 1 1.3 .50
F„Z e 8 76 52 20 11 43 35
η 10 104 38 18 18 19 37
X e 1.1 10.9 7.4 2.9 1.4 6.1 5
η 1.4 14.9 5.4 2.6 2.6 2.7 5.3
s e 1.1 11.54 8.4 2.8 1.5 6.4 4.9
2.9 25.4 9.4 5.2 5.6 2.8 6.3
KS e .15 .17 .20 .13 .28 .19 .30
η .26 .07 .11 .18 .24 .37 .34
r .77 .90 .62 .90 .51 .72 .82
X> 13.21 17.29 15.34 11.4 24.8 9.35 10.03
Acoustic Parameters of Emphatic Segments 333
Table 3: Combinations of F 0 -types and speed-types (WGL) emphatic vs. neutral speech
F 0 -type speed
WGL-type / - \ Λ V 1 t Σ i S KS Γ XJ
A A A A A^ A 28 4 2.2 .28
/ e <= <=
P<= A v-= V^ A
.57 7.27
η A 16 2.3 2.5 .33
-*• e A A A A A A 51 7.3 6.1 .10
.85 6.97
η A A A V= A A A 26 3.7 4.7 .21
\ e A^ A A A A A A^
c
456 6.6 5.5 .12
.72 6.5
η V^ A A A A A P 28 4 3.3 .25
Λ e A A A A I^ A A 86 12.3 9.3 .09
.73 21.25
η A A A A A-= A A 154 22 21.3 .13
V e v^ A A A A A A 13 1.9 1.2 .28 .42 2.65
η A^ A A A A A A 10 1.4 0.8 .19
ι V A A«s V A^ A^ 13 1.9 1.4 .23
.37 35.69
η V A V^ V V v-= 3 0.4 1.1 .58
t V A V V 8 1.1 1.9 .36
.35 18.3
η V A v-= V V P«= A^ 7 1 1.3 .50
Fo-Wc
RH0-type / \ Λ V i t Σ χ S KS r x>
F
oS e 1.4 9.4 7.6 2.7 1.6 4.7 2.1
η 2.9 20.4 10.1 4.1 4.4 3.2 3.1
KS e .22 .14 .15 .34 .22 .14 .23
η .49 .18 .36 .18 .28 .41 .40
Γ .57 .93 .69 .70 .71 .63 .70
Χ1 15.24 20.03 15.97 13.92 14.01 8.57 2.15
4. Discussion
a single difference = fD -*•); the correlation of RH0 with all fQ types is the
lowest one. F 0 Λ in emphatic sequences is attached more strongly to R H 0
types than in neutral segments; this is a deviation from the general tend-
ency that emphatic combinations are more varied.
F 0 and intensity are combined distinctively only in the abrupt initial/
terminal feature. The concentration KS of the seven f 0 -types in relation to
intensity or intensity in neutral segments is fixed onto few types. F 0 -+ and
fDv in emphatic segments are differently distributed over the 7 intensity
types than the neutral segments. In this case, this is not caused by the con-
centration KS; there must be another structural attribute which is not dis-
cernable with the methods used. The concentration of f 0 | for the intensity
types 1-7 is different, but Chi-square indicates no difference. It is clear
that Chi-square is a global, equalizing parameter which obscures the inter-
nal structural differences.
The combination of fD + speed is different only with the curve types A
f J,. The difference with speed Λ is not caused by the concentration KS.
F0-<- shows a difference in the Chi-square parameter between emphatic
and neutral segments, but it is not clear for what reason (neither concen-
tration nor statistical differences occurred). The case of fQ V in relation to
all speed types showed the only one negative correlation; in addition, the
concentration KS is very different.
5. Conclusion
The curve types of the fundamental frequencies were selected in the pres-
ent study as a reference point because this parameter can be set as the in-
dependent variable. Certain other parameters are related to aspects of fQ:
lower periodicity and greater loudness or speed at voiceless consonants.
The hypothesis was that these combinations can be destroyed in order to
mark paraphonetic information. The analysis of the combination suggests
that paraphonetic information is not a function of a special parameter, a
special curve type or - in our case — of additional elements (timbre ex-
cluded). It is a function of dynamically changing combinations of parame-
ter-curve types. The structural displacement which occurs in phoneme-re-
lated sound structure is fluent and brief. In a statistical sense, in emphatic
sequences the combinations will be used freely, the neutral structures will
be infringed, the concentration of curve types and parameters diminished
and the special links between curve type/parameter or parameter/curve
type changed or removed. Whether particular combinations are related to
certain sorts of meaning, affect, situation or personal characteristic was
not treated in the present paper.
The purpose of the analysis was to describe structural attributes with
which in everyday communication 'conspicuous' or emphatic utterances
Acoustic Parameters of Emphatic Segments 337
could be identified. The results also describe the criteria for scientific-
phonetic procedures of the impressionistic type for distinguishing neutral
and 'other' material. The phonetician judges on the basis of his experience
of 'plausible' and 'implausible' curve types whether the phonetic material
is (a) speech, (b) contains emphatic sequences or noise, (c) is analyzed by
machines and computers 'correctly' and how (d) speech synthesis rules
must be formulated.
The statistical classification reported in this paper is also a description
of parts of intersubjective (phonetic) knowledge: observational assess-
ment, checking and selecting of certain types of phonetic material is based
on the intuitively known probabilistic and combinatory structure of
speech stimuli. With criteria such as these, the linguistic phonetician elimi-
nates material which does not satisfy the neutral condition typical of pho-
nemes. Linguistically oriented phonetic research has developed technolo-
gies for smoothing emphatic structures, which is rather difficult because
pure elimination of parameters is senseless (for this reason the often crit-
ized phonetic 'laboratory pronunciation' is used). Researchers in speech
synthesis and speech recognition often apply the plausibility criterion as a
heuristic step in constructing a 'natural' synthesis procedure, or in imple-
menting statistical decision rules for automatic speech recognition sys-
tems. Haggard (1979) describes his algorithm as a tool for discarding
. . large families of implausible shapes including those in which constric-
tion changes sharply from point to point. . (p. 264). Or, a further exam-
ple, the same underlying heuristic decision algorithm is used by Jassem
(1979) for classifying short-time spectra (which results in logicomathemat-
ical formulae). Jassem chose for the purpose of finding criteria for statisti-
cal attributes in automatic speech recognition (a) shape and position along
the frequency axis, (b) the area contained by the spectral envelope, (c) as a
material object of uniform thickness with top and bottom surfaces (p. 82).
These classification features were further transformed into binary codes
for the classical distinctive features of fricative phonemes.
Evident knowledge - based on a common sense understanding of what
is 'conspicuous' - is, from a methodological point of view, the more or
less hidden resource of scientific activity. To discover some of these struc-
tures of knowledge was one of the implicit aims of the present paper.
References
Fönagy, I. and Berard, Ε. (1972). "II est huit heures": Contribution ä l'analyse sfemantique de
la vive voix. Phonetica 26, 157-192.
Haggard, M. (1979). Experience and perspectives in articulatory synthesis. In Lindblom, B.
and Öhman, S. (Eds.): Frontiers of speech communication research. London, New York, San
Francisco, pp. 259-274.
't Hart, J. and Collier, R. (1975). Integrating different levels of intonation analysis. Journal of
Phonetics 3, 235-255.
Jakobson, R., Fant, G. and Halle, M. (1965). Preliminaries to speech analysis. Cambridge,
Mass., 6. printing.
Jassem, W. (1979). Classification of fricative spectra using statistical discriminant functions.
In Lindblom, B. and Öhman, S. (Eds.): Frontiers of speech communication research. London,
New York, San Francisco, pp. 77-91.
Lieberman, Ph. (1974). A study of prosodic features. In Sebeok, Th. A. (Ed.): Current trends
in linguistics, Vol. 12. The Hague: Paris, pp. 2419-2449.
Lieberman, Ph. and Michaels, S. B. (1962). Some Aspects of Fundamental Frequency and
Envelope Amplitude as Related to the Emotional Content of Speech. Journal of the Acousti-
cal Society of America 34, 922-927.
Nakatan, L. H. and Dukes, K. D. (1973). A sensitive test of speech communication. Journal
of the Acoustical Society of America 53, 1083-1092.
Martin, H. (1981). The prosodic components of speech melody. The Quarterly Journal of
Speech 67, 81-99.
Summerfield, O., Bailey, P. J., Seton, J. and Dorman, F. (1981). Fricative envelope parame-
ters and silent intervals in 'slit' and 'split*. Phonetica 38, 181-192.
Name Index
Abercrombie, D., 203, 205, 208 f, 222, 226 Collier, R. 21, 23ff, 86, 197ff, 327
Adams, C. 203 f Coulthard, Μ. 46, 120
Adjemian, J. C. 314 Cruse, D. A. C. 67, 74
Allan, G. D. 208 Cruttenden, A. 14, 16, 67, 239f
Allerton, D . J . 16,67 Crystal, D. 20f, 121 f, 166, 171, 178,
Altmann, G. 331 180ff, 238
Anderson, S. 99 Currie, K. L. 120, 200
Andrews, A. 96, 112 f Cutler, A. 78f, 8 I f f
Armstrong, L. E. 121, 124, 130, 170, 188
Darwin, C.J. 85
Arnold, G. F. 68, 159, 226, 235, 240
Davy, D. 178, 180 ff
Aston, C. H . 86
De Pijper, J. R. 197
Atkinson, J. E. 32
De Rooij, J. J. 198
Aoun, J. 318,321
Dinnsen, D. 106
Bachem, A. 177 Dodd, D. 74
Bailey, P.J. 327 Dogil, G. 103,114,116
Bansal, R. K. 79 Donegan, P. 101 ff
Bearth, T. 162f Dorman, F. 327
Benveniste, E. 147 Downing, P. 258 f, 263 f
Berman, A. 134 f Dowry, D. 258 f, 265
Bing, J. M. 11 f, 67, 73, 166, 171 f, 188, 311 Drachmann, G. 101
Bolinger, D. 11, 12, 15, 67, 134ff, 170ff, Dressler, W. 98 f, 101 ff, 116
203, 228 f, 239, 312 Dubnowski, J. J. 245
Bolozky, S. 92ff, 99f, 106, 116
Bond, Z. S. 79 Eckman, F. 106
Bouwhuis, D. 42 Eikmeyer, H.-J. 183
Boves, L. 35, 44, 166, 283 Engdahl, E. 80
Bowen, J. 92,98 Erb, Η . J. 250
Brazil, D. 46f, 120, 127, 131, 171, 183, Fanshel, D. 144, 148
187, 234, 242 Fant, G. 329
BresnanJ. 95, 105, 134f, 139ff, 154, Faverge, J.-M. 307
160 f Fay, D. A. 78
Browman, R. 78 f Fellbaum, Κ. R. 249
Brown, G. 120, 171 ff, 186, 200, 226, 234 Fillmore, C. J. 144
Brown, P. 154 Fodor.J. A. 87
Brown, R. 78 Foss, D . J . 85
Fox, A. 122,125,175
Chafe, W. L. 144
Fricke, H . 163
Chao, Y. R. 171 f
Fries, W. 98
Cheung, J. Y. 86
Fritsche, B. 286
Chomsky, N. 96f, 99, 105f, 109, 112f,
Frekjaer-Jensen 250
174, 205, 253 f, 256, 311, 318 ff, 324
Fuchs, A. 67, 135 ff, 144, 155 ff
Clements, G. 107f
Fujisaki, H . 201
Clifton, C. E. 81
Cohen, A. 21, 23 f, 175, 195 Gabriel, K. R. 211
Coleman, H . O. 132 G a n o n g . W . F. 81
340 Name Index
lexical elements/items 68 ff, 145, 258, 261, musical notation 209, 226
264 musical scale 21
lexical interpretation 270 musical tradition 305
lexicon 103, 113, 227, 265, 323
linear prediction method 247 ff naming of artifacts and material 263 f
liquids 331 narrow rhythm unit 206
locution 165, 182, 184, 186ff, 231, 236f nasals 112, 211 ff, 288, 331
loudness 203, 244, 336 native speaker 274, 281 f, 321
low-content elements 140, 154f, 159f natural discourse 321
low rising contour 10,12 natural speech 27
L-pattern (Fox) 122ff negation 148 f, 151, 166, 324
negatives 69, 73 f
markedness 131 f, 138, 171 f, 186, 189, negotiation 178 f
255 f networks 174 ff
— for deaccenting 83 noise 337
~ for focus 83 nominal group 47 f, 55
~ for stress 77 f, 82 f non-final intonation phrases 10
marking structure 271 nonsense sequences 80 f
masking 244 non-syllabic vocoids 211 ff
matrix of acoustic valuations 7 noun group 272
maximum likelihood method 248 f noun phrase 256 f, 320
meaning 238, 242, 258, 312, 323, 327, 336 noun phrase tags 128 f
l e x i c a l - 269, 272, 275 f nouns 67, 82, 136
~ of intonation 120f, 132, 198 NP-shift 317, 324
~ of sentences 268 f, 277, 280 NP-trace 319,326
~ of words 268 N R U (Jassem) 206 ff
meaning base 269 ff nuclear accent 140
meaning switch 329 nuclear pitch pattern 123
measure of concentration 332 nuclear fall 11
memorized patterns 207 nuclear syllable 11,12
memory 237 nuclear tone 12
musical ~ 207 nucleus 120, 131, 174, 180, 186 ff, 231 ff
mental categories 200 object 135ff, 317
mental lexicon 78 f, 81, 89 obligation 166
metacommunication 150 observation capacity 193
metalinguistic judgements 167 obstruents 106
metalocutionary hypothesis 166, 177 octave leaps 32, 245
metaphors 75 onset 39, 171, 174, 180
metonymy 313 opposition 145
metrical trees 311 f, 322 phonological ~ 48
microcomputers 251 oppositive vs. relational features 146 ff
mirror image rule 106 organisational frames 183ff
modalisation 146, 149 orientation (Brazil) 51 ff
modality 166 orthography 14
Modern Hebrew 92 f orthographic convention 86
monosyllable rule 114 orthographic tradition 305
monotonising 85 orthographic word 270
monotony 43 f oscilloscope 251
mora 206
morphology 108, 149f, 204 paralinguistic markers 329 f
morphophonemic syllabification 210 paralinguistics 227
morphophonological variation 104, 106 parameters of acoustic signals 7 f, 24 ff,
multidimensional scaling technique 26 330, 334 ff
Subject Index 347
Text Processing/Textverarbeitung
Papers in Text Analysis and Textdescription/
Beiträge zur Textanalyse und Textverarbeitung
Edited by Wolfgang Burghardt and Klaus Hölker
1979. Large-Octavo. X, 466 pages. Bound D M 156,- ISBN 3 11 007565 2
(Volume 3)
W
Walter de Gruyter DE
Berlin · New York
G
RESEARCH IN T E X T THEORY/
UNTERSUCHUNGEN ZUR TEXTTHEORIE
Editor: Jänos S. Petöfi, Bielefeld
Psycholinguistic Studies in
Language Processing
Edited by Gert Rickheit and Michael Bock
1983. Large-Octavo. VIII, 305 pages. Cloth. DM 124,-
ISBN 3 11 008994 7 (Volume 7)
W
Walter de Gruyter DE
Berlin · New York
G