Intonation Accent and Rhythm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 365

Intonation, Accent and Rhythm

Research in Text Theory


Untersuchungen zur Texttheorie
Editor
Jänos S. Petöfi, Bielefeld

Advisory Board
Irena Bellert, Montreal
Maria-Elisabeth Conte, Pavia
Teun A. van Dijk, Amsterdam
Wolfgang U. Dressler, Wien
Peter Hartmann, Konstanz
Robert Ε. Longacre, Dallas
Roland Posner, Berlin
Hannes Rieser, Bielefeld

Volume 8

wDE

G
Walter de Gruyter · Berlin · New York
1984
Intonation, Accent
and Rhythm
Studies in Discourse Phonology

Edited by
Dafydd Gibbon and Helmut Richter

wDE

G
Walter de Gruyter · Berlin · New York
1984
Library of Congress Cataloging in Publication Data
Main entry under title:

Intonation, accent, and rhythm.

(Research in text theory = Untersuchungen zur


Texttheorie; v. 8)
Includes indexes.
1. Prosodic analysis (Linguistics) — Addresses, essays,
lectures. 2. Discourse analysis — Addresses, essays,
lectures. I. Gibbon, Dafydd. II. Richter, Helmut,
1935- . III. Series: Research in text theory; v. 8.
P224.I5 1984 414 84-3212

CIP-Kurztitelaufnahme der Deutschen Bibliothek

Intonation, accent and rhythm: studies in discourse


phonology / ed. by Dafydd Gibbon and Helmut Richter.
-Berlin; New York: de Gruyter, 1984.
(Research in text theory; Vol. 8)
ISBN 3-11-009832-6
NE: Gibbon, Dafydd [Hrsg.]; GT

© Copyright 1984 by Walter de Gruyter & Co., Berlin 30.


Printed in Germany
Alle Rechte des Nachdrucks, der photomechanischen Wiedergabe,
der Herstellung von Photokopien - auch auszugsweise - vorbehalten.
Satz: Dörlemann-Satz G m b H & Co. KG, Lemförde
Druck: Rotaprint-Druck W. Hildebrand, Berlin
Bindearbeiten: Lüderitz & Bauer, Berlin
CONTENTS

The Authors and their Affiliations VII


Dafydd Gibbon and Helmut Richter
Phonology and Discourse: a Variety of Approaches 1
Janet Bing
A Discourse Domain Identified by Intonation 10
L. Boves, B. L. ten Have, W. H. Vieregge
Automatic Transcription of Intonation in Dutch 20
David Brazil
The Intonation of Sentences Read Aloud 46
Alan Cruttenden
The Relevance of Intonational Misfits 67
Anne Cutler
Stress and Accent in Language Production and Understanding . . 77
Grzegorz Dogil
Grammatical Prerequisites to the Analysis of Speech Style:
Fast/Casual Speech 91
Anthony Fox
Subordinating and Co-ordinating Intonation Structures in the Ar-
ticulation of Discourse 120
Anna Fuchs
'Deaccenting' and 'Default Accent' 134
Dafydd Gibbon
Intonation as an Adaptive Process 165
J. 't Hart
A Phonetic Approach to Intonation: from Pitch Contours to In-
tonation Patterns 193
W. Jassem, D. R. Hill, I. H. Witten
Isochrony in English Speech: its Statistical Validity and Linguistic
Relevance 203
Gerald Knowles
Variable Strategies in Intonation 226
VI Contents

Manfred Krause
Recent Developments in Speech Signal Pitch Extraction 243
D. Robert Ladd
English Compound Stress 253
Hans-Heinrich Lieb
A Method for the Semantic Study of Syntactic Accents 267
Helmut Richter
An Observation Concerning Intensity as a Predictable Feature of
Intonation 283
Mitsou Ronat
Logical Form and Prosodic Islands 311
Peter Winkler
Interrelations Between Fundamental Frequency and Other Acous-
tic Parameters of Emphatic Segments 327
Name Index 339
Subject Index 342
The Authors and their Affiliations

Janet Bing
Department of English,
Old Dominion University,
Norfolk, Virginia,
USA
L. Boves
Instituut voor Fonetiek,
Katholieke Universiteit,
Nijmegen,
Netherlands
David Brazil
Dept. of English Language
and Literature,
University of Birmingham,
England
Alan Cruttenden
Dept. of General Linguistics,
University of Manchester,
England
Anne Cutler
Medical Research Council
Applied Psychology Unit,
Cambridge,
England
Grzegorz Dogil
Fakultät für Linguistik
und Literaturwissenschaft,
Universität Bielefeld,
Federal Republic of Germany
Anthony Fox
Dept. of Linguistics
& Phonetics,
University of Leeds,
England
VIII The Authors and their Affiliations

Anna Fuchs
Seminar für Deutsche Philologie,
Universität Göttingen,
Federal Republic of Germany

Dafydd Gibbon
Fakultät für Linguistik
und Literaturwissenschaft,
Universität Bielefeld,
Federal Republic of Germany

J o h a n ' t Hart
Instituut voor Perceptie Onderzoek,
Eindhoven,
Netherlands

B. L. ten Have
Instituut voor Fonetiek,
Katholieke Universiteit,
Nijmegen,
Netherlands

D. R. Hill
Department of Acoustics,
University of Calgary,
Canada

Wiktor Jassem
Acoustic Phonetics Research Unit,
Polish Academy of Science,
Poznafi, Poland

Gerald Knowles
School of English,
University of Lancaster,
England

Manfred Krause
Technische Universität Berlin,
Federal Republic of Germany

D. Robert Ladd
Department of Experimental Psychology,
University of Sussex,
Brighton,
England
T h e Authors and their Affiliations

Hans-Heinrich Lieb
Fachbereich Germanistik,
Freie Universität Berlin,
Federal Republic of Germany
Helmut Richter
Fachbereich Germanistik,
Freie Universität Berlin,
Federal Republic of Germany
Mitsou Ronat
Centre National de Recherche Scientifiqi
Paris,
France
Wilhelm Η. Vieregge
Instituut voor Fonetiek,
Katholieke Universiteit,
Nijmegen,
Netherlands
Peter Winkler
Sozialwissenschaftliche Fakultät,
Universität Konstanz,
Federal Republic of Germany
I. H. Witten
Department of Acoustics,
University of Calgary,
Canada
DAFYDD GIBBON A N D HELMUT RICHTER

Phonology and Discourse: a Variety of Approaches

Both 'phonology' and 'discourse' tend to arouse strong feelings among


those of differing linguistic persuasions; experience has shown that the
combination 'discourse phonology' raises as many problems as its consti-
tuents. However, the field which the contributions to this volume treat is
reasonably clear: that of 'intonation' and related phenomena in relation to
the constituents of discourse. It is to the credit of the contributors that
they have found it possible to overcome some initial hesitation and take
part in this cooperative venture.
'Disourse' is used here with both methodological and substantive impli-
cations. O n the methodological side, it is not intended to imply a particu-
lar approach (such as 'discourse analysis'); rather, it indicates a general
tendency to select from certain data types:
i. Natural data (Cutler, Fuchs, Gibbon, Jassem, Knowles, Winkler), as op-
posed to paradigmatically elicited or generated test data;
ii. Discourse response data (Bing, Cruttenden, Fuchs, Gibbon, Knowles,
Ladd, Lieb, Richter, Ronat), associated with terms such as 'adjacency
pair', 'natural response', 'proper response relation' (Lieb), as opposed to
single sentences;
iii. Discourse oriented sentential data (Bing, Cruttenden, Fox, Knowles)
such as vocatives, expletives, 'epithets', quotations and verba dicendi, sub-
jective adverbs, 'tags', negation, appraisive 'superlatives', as opposed to
sentential units (Dogil, Boves, 't Hart, Jassem);
iv. Data selected in the context of discourse types (or styles, speech 'regis-
ters'), such as reading aloud (Brazil, Boves, 't Hart, Jassem) or fast versus
slow speech styles (Dogil).
O n the substantive side, those descriptive categories for discourse which
were used to characterise these data types are clearly in the forefront of
attention in practically all the papers. It is the sound-oriented, phonologi-
cal and phonetic aspects which are the main substantive concern, however:
intonation forms and structures, both in their own right and in relation to
textual locutions; accent and 'stress' and their placement in words and sen-
tences under specific discourse conditions such as repetition, anaphora,
speech timing and its phonological implications. A further dimension is
the conception of these categories as processes; this conception is inherent
in the signal processing approach (Krause) and in experimental phonetics
2 D. Gibbon and H. Richter

in general, but other papers (cf. Cutler, Gibbon) use this aspect to extend
the descriptive power of linguistic descriptions. Indeed, a concern with
temporal organisation (e. g. pitch contours as a function of time, rhythm,
tempo) and therefore, implicitly or explicitly, with 'processes' might be
considered an a priori condition for any treatment of suprasegmental,
prosodic, discourse phonetic or discourse phonological matters.
The contributors (except Cutler and Gibbon) are mainly concerned with
the 'post-production' phases of phonetics and phonology: signal process-
ing and the experimental phonetics papers on the one hand, and interpret-
ative linguistic analysis on the other. Issues in articulatory phonetics are
not dealt with; on the signal processing side, an introductory overview
(Krause) is included in view of the increasing methodological importance
of computer supported acoustic analysis of pitch.
There are three issues which stand out particularly, in the set of contrib-
utions taken as a whole, as being of lasting concern in this field:
i. The nature of intonational meanings, with a number of different ap-
proaches crystallising out. Two of the main lines might be thought of as
the 'basic meaning' approach in various forms (cf. Bing, Cruttenden), with
a 'system relative' version proposed by its critics (cf. Knowles), on the one
hand, and the 'configurational' approaches on the other, including a 'pat-
tern indexing' approach (cf. Fox, Gibbon) and a 'cohesion marking' ap-
proach with categories of 'focus', 'anaphora' and the like (Fuchs, Ladd,
Ronat). A different framework is provided by Lieb in his formal recon-
struction of 'speaker attitudes'.
ii. The concept of 'normal intonation\ 1normal accentuation' (especially
Fuchs, Winkler; also Ladd, Ronat, Lieb), in particular with respect to the
position of an utterance within discourse (e.g. initial; 'first instance' vs.
'second instance') or in a specific discourse type (e. g. quotation; reading
aloud). Clarification of this notion takes place here by careful study of,
among other things, a range of different 'non-normal' forms: contrastive
accent, the hitherto poorly understood 'default accent' and other types
(cf. Fuchs, Ronat); emphasis (Winkler).
iii. The autonomy of discourse phonological systems relative to specific locu-
tionary domains such as the sentence (Gibbon, Knowles, Ronat; most
others, implicite et passim). While not all would agree on details, Fuchs'
statement would probably not be contested too hotly by the contributors:
"The syntactic hierarchy of a sentence does not determine accent choice,
but plays an important role as a framework for the choices to be effected."
Generative phonology, which applied a non-autonomy hypothesis to pros-
odic features in the first two decades of its existence, has also adopted var-
ious versions of the autonomy hypothesis during the past decade (Dogil,
Ronat).
It is perhaps the last of these issues which will ultimately provide a key
to the solution of the first two. The question whether intonation struc-
Phonology and Discourse 3

tures can be reduced to constituent structures at other levels, or whether


intonational meanings can be reduced to meaning categories applicable in
the lexicon of the language concerned, or whether a given intonation or
accentuation is 'normal' in terms of congruence with other language sys-
tems can only be answered satisfactorily when close attention is given to
the 'autonomy of levels' issue. The following paragraphs are therefore
concerned with isolating some of the concepts involved.
The central notion is that of 'level', a term which has been variously
conceived in different linguistic methodologies as 'level of analysis', in
method-conscious approaches, or as 'level of representation', in ap-
proaches concerned with formalised descriptions.
A 'level of description' may be thought of as uniting both aspects. It is a
function of a method of analysis, M\ in the minimal case, Μ = (Ο, D, C,
5), a quadruple consisting of the linguistic observer, his data, his descrip-
tive categories and the structure into which these enter. 1 The data are the
observed events (utterance tokens); empirical judgments or measurements
are a function of the triple {O, D, C), and determine a universe of dis-
course. The roles of D and C need no further comment; the role of Ο is
also evident: a judgment or measurement is a contingent fact relative to an
observer with certain 'observational properties' (e.g. 'aided by technical
equipment') and cannot be automatically generalised to other observers.
For exposition here, we can concentrate on C and S as the major de-
terminants of linguistic levels. A few examples will clarify their role. In
perceptual phonetic descriptions, C may be the categories of the IPA ma-
trix and S a 'bead on a string' linear model of structure; in a phonemic de-
scription, C is supplemented by a functional category of contrast, and S by
a structural principle of complementary distribution (as well as metatheore-
tical principles such as simplicity). In morphology, a criterion of meaning
similarity or identity is central. The levels of analysis in pre-1970 transfor-
mational grammars may be explained similarly: surface structure was de-
termined by morphological similarity and the grouping criteria of traditional
IC analysis; deep structure was determined in part by S, (generalisations
over discontinuous constituents), in part by C (both by ambiguity types and
by paradigmatic similarity relations between sentences, based mainly on
generalisations over valency properties of verbs - subcategorial and selec-
tional restrictions). Later developments introduced anaphoric relations
into C, then the 'natural response relation' mentioned above, which made
questions of discourse phonology more accessible to this framework (cf.
Ronat). These? examples are particularly clear cases; the 'factual content'
of other approaches may be similarly described.
It is clear that some levels of description are more closely related to
each other than others; levels which have similar C or similar S may be
1
Gibbon, D. Perspectives of Intonation Analysis. Bern, Lang, 1976, p. 91.
4 D. Gibbon and H. Richter

said to be 'compatible'·, a strong postulate would be that if levels Z.,· and Lj


are compatible, and Lj and Lk are compatible (ΐφ j, ]Φ k, ίφ k), then Li and
Lk are compatible, i.e. compatibility is transitive. T h e details of this rela-
tion will not be discussed here.
A 'coherent' linguistic description may then be said to consist of a set of
compatible levels; these levels share a loose family resemblance by virtue of
the transitivity of this relation. A semantic level, a level of intonation de-
scription, a perceptual phonetic level and others may therefore belong to
the same coherent linguistic description.
A subset of levels in such a description will contain phonetic as well as
functional categories in C; let us say that C = (PC, FC). Forms occurring
at levels Lt and Lj may enter into a 'concomitance'relation2 if P C , and PC:
are disjunct, and different PC, or FCj apply to the forms when PC, ana
PCj occur together than when P Q or PC, do not occur together. For ex-
ample, a rise-fall pitch contour P C , in English may be interpreted as posi-
tive appraisal PC,i when it occurs with a word like 'lovely' (PCj, which has
a similar positive connotation, FCj) or with an interjection 'mm' (PQfe,
with no specific connotation of any kind, P Q ) . However, with a word
like 'John' PC) used as a proper name in a vocative context FC)j, PC,· may
be interpreted as negative appraisal FC&; if 'John' is in a context like
'John's going to do it' FCI, then PC,· may receive an interpretation FCQ
which may be different from either PC,i or F Q 2 or both.
An entirely different set of cases arises when the use of different meth-
ods leads to 'incompatible' levels, in particular when PC,· and PCj are dis-
junct and O,· and Oy are not identical (i. e. when the observational proper-
ties of O,· and Oj differ). A typical case is a perceptual phonetic
'transcription' as P C , vs. an experimental phonetic 'registration' as PCj. In
these cases, an extension of the observer's empirical faculties by means of
material tools and their supporting theories makes the phonetic descrip-
tive categories at each level quite incommensurable; the levels may then be
said to describe different domains of discourse.
Between these types of incompatible level, assuming that PC,· and FCj are
not distinct, it is possible to establish a relation of a relation of 'correla-
tion'. This is the basis of the idea of 'articulatory correlates', 'acoustic
correlates', 'auditory correlates' of intonations and other forms which are
established as mappings between empirically independent domains. It is
conceivable (and explicitly postulated in the paper by Gibbon) that the ar-
ticulatory and auditory levels may, in a discourse context, be analysed as
compatible, not just correlated; the same assumption underlies the 'styliza-
tion' method used b y ' t Hart and Boves & al.

1 Richter, Η., & D. Wegner. "Die wechselseitige Ersetzbarkeit sprachlicher und nichtsprach-
licher Zeichensysteme", in: Posner, R. und H.-P. Reinecke (eds.), Zeichensysteme, Wies-
baden, Athenaion, 1977.
Phonology and Discourse 5

The following sections are a contribution toward an up to date explica-


tion of the notion of 'correlate' (in the sense underlying approaches like
those of Jassem, Richter, and Winkler), and of the changing conception of
'parameter' based on the C, and Cj of the correlated levels.
Given a linguistic structure @ of a sentence of a language such as Ger-
man or English, it will be a reasonable assumption that this structure con-
sists of a finite number s of linearly ordered elements e. So the structure
can be represented as an ordered s-tupel

@ = (ex, e2,. . ., e u . . e) (1)

If the structure is to possess acoustic correlates element by element, it must


at least be the case that some set derived from acoustic data belonging
to utterances of the sentence having (§5 as its structure, is mapped into and
onto So we have to assume a surjection (i. e. every value e is assigned to
at least one argument)

Con = : i, 2> ® (A » e N) (2)

with the co-ordination

of elements of to elements of <25.

Note that our condition for a linguistic structure to have acoustic corre-
lates is not a strong one. A surjection is postulated (since none of the ele-
ments of a structure which is said to possess correlates may be allowed to
have no inverse image). However, it is not required that the order of the
structural elements et be pre-established in the organisation of the @y. Nor
is it required that Corr be an injection (where every value el is assigned to
only one argument Qj, because the realisation of a structural element can
also be conceived as the joint effect of several (provided that these are
discernable as its correlates, i.e. that there is a mapping relation at all).
Notwithstanding a fairly liberal understanding of 'having acoustic
correlates', it would appear to be grossly and ineffectively simplistic to in-
terpret the ©y in terms of the raw output of the equipment (which is itself
derived rather than raw). With regard both to the complexities of 'etic-
emic' relationships and to the evolving computer-aided practice of pro-
cessing the speech signal to phonetic ends, the Sy seem instead to presup-
pose several steps of derivation.
6 D. Gibbon and H . Richter

Figure 1

TN

πι m
\,\ m
hJ
(I,J;M,N
m = eIN)
π/ m m m
I,l u I,N

m m m
%m M, 1 M,J M,N

Let Figure 1 be the matrix of acoustic measurements of an utterance.


The measurements

mj = Ä*I· Tj)

result from procedures % ι it temporal sampling points Tj selected according


to their natural succession (J\ < fa, if Tj\ precedes Tji). Given an arith-
metic (digital or analog) connection '&' of elements of SER in neighbouring
lines or columns:

m j j S t . . . & mI>J+x = vI,J+x or


mjj&c ... & mj+y j = vI+y,J or
m
IJ& · · • & mI+ y,J+ χ = VI+ y,J+ χ

(x, y > 0), we can mark the case that the set of scalars ν can be partitioned
into k subsets containing η elements each and that the ν can be arranged
(cf. Figure 2) into a matrix SB with elements Vy such that

Figure 2

h 1 tn
J
Λ vn vxj V\n

SB =
Pi ViX Vij Vin

Pk Vkj Vkn
Phonology and Discourse 7

VUJ above V^Y, if vnj = VJJ and V^Y = vj+ YJ and


vtJ1 before v^, if = νΙ>} and vVjl = vI>J+x

Let us call äß matrix of acoustic valuations of an utterance. The valuations

Vij = f(P»t·)

can be said to be values of parameters P{ at valuation times tj. While the en-
tries of SR are to be understood as the results of procedures such as ^ - e x -
traction or itensity measurement, SB represents a variable accounting for
the speech signal being in one respect or another problem-oriented. With
the exception of a trivial transition from SR to 30, this implies that parame-
ters are no longer the dimensions of measurement themselves hut varying bun-
dles of these activated with varying 'density' along the temporal axis (cf. Fi-
gure 3).

Figure 3

Variation, however, is not arbitrary even though the connection was


not required to remain constant. What count are the topological con-
straints introduced with the transition from to SB as a formulation of
the notion that parameters are expected to form characteristics which are
both essential and not biased by looseness due to their partially accidental
basis.
Whether or not the above construction will succeed as an up-to-date ex-
plication of the concept of parameter in intonation research needs not
concern us too much. (Of course the scalars vy can figure in equations as
the values of parameters usually do.) The point is that the correlates Sy one
is in search of can, in our view, be obtained only when one starts from SB.
Statements about which measurement dimension provides the relevant intona-
tory cues seem to be outdated.
So let us introduce parameter combinations <5, depending on the valua-
tion time tß let 6y be a non-empty subset of valuations at ty.
8 D. Gibbon and H. Richter

{v\j, V2j, . . ., Vij, . . ., vkj\ D 6 j (j fixed) (3)

When f o r each j exactly one is compiled, then it seems rewarding to


look for the surjection Corr. Under adequate conditions both towards the
'emic' and towards the 'etic' poles of discourse phonology it can be legiti-
mate to expect that combinations of few but highly complex parameters
will prove to be correlates of structural elements realized in utterances.
Within the indicated frame of textual correlates one might, in particular,
expect that a (set of) structure(s) can be shown to have a unique acoustic
manifestation. Obviously a reproduction of token variation in terms of a
variety of relations Corr would be of little interest. But it seems reasonable
to introduce another surjection, between the Cartesian product of the set
of relevant structures @ and the set {C} of conditions C of their occur-
rence, on the one hand, and the set {Cori\ of relevant mapping relations
Corr on the other:

{©} χ {C} —• {Corr) (4)


C) Corr

Note, with this condition of unique manifestation, the reverse direction of


uniqueness. In (2) we admitted the interaction of different parameter com-
binations in the realization of one element of the structure. It is admitted
by (4) that different pairs formed by structure and condition of its occur-
rence are associated with the same correlate relation. While (4) allows real-
istically expected ambiguities, in terms of the tolerances furnished by our
avoidance of injection, it still meets explanatory scientific requirements in
the required 'right-uniqueness' (mapping).
T h e preceding metatheoretical discussion is meant to provide a point of
orientation f o r comparing the formal and empirical claims made by differ-
ent theories based on different observer properties, different categories of
analysis, and different notions of structure. T h a t this is necessary is clearly
shown in the wide variety of methodologies used in the studies in this vol-
ume; there is little consensus about theoretical underpinnings or descrip-
tive categories even among 'observers' with similar 'properties': for exam-
ple, it is more than doubtful whether it is meaningful to speak of such
entities as a 'British School' of intonation analysis in view of the differ-
ences among the papers by Cruttenden, Brazil, Fox, Knowles, or 'an' ex-
perimental phonetic approach in view of the Boves & al., 't Hart, Jassem,
Richter and Winkler papers.
A high degree of methodological awareness characterises most of these
papers in 'discourse phonology' (perhaps clearest in Boves & al., Cutler, 't
Hart, Fuchs, Gibbon, Lieb, Richter, Winkler). T h a t metatheory is not a
mere 'Ersatz' for genuine description is shown by all the papers. However,
there remains a quite formidable task of coordination and cooperation,
Phonology and Discourse 9

one where problems of 'compatibility' and 'concomitance' can be solved


within specific approaches to the study of language, but where problems
of 'correlation' require considerable interdisciplinary endeavour if the pat-
terns, the processes and the functions involved in 'discourse phonology'
are to be explained.
It was in this spirit that the present editors organised colloquia in Berlin
(1980) and Bielefeld (1979, 1980) and later decided to pool the efforts of
both in the form of the present publication, adding a number of further
studies in the field by scholars not represented at those colloquia. The vol-
ume was thus conceived at several stages in a spirit of joint effort; if the
significance of the whole is greater than the significance of its parts, this is
due to the willingness of the contributors to look for ideas beyond their
own familiar methodologies and to contribute themselves to such a search.
JANET MUELLER BING

A Discourse Domain Identified by Intonation1

It is obvious that the ambiguity of sentence (1) occurs only in the written
language.

(1) This is my sister, Eunice.

When the sentence is spoken, the ambiguity disappears because the into-
nation contour on the final word will indicate whether Eunice is being
spoken to or about. When sentence (2) is spoken or read, the interpreta-
tion is that Eunice is my sister.

(2) This is my/si^:er, Euri|ce. ( = This is my "sister I "Eunice.)2

When (3) is read, the interpretation is that the sentence is being addressed
to someone named Eunice.

I will propose in this paper that the contour found on the final word of (3)
has a special discourse function. This contour indicates that certain parts
of the sentence do not contribute to the truth conditions of the sentence,
but deal instead with speaker-listener relationships. The contour, which I
will label the D contour, has in the past usually been identified either as a
continuation of a previous contour or as the low rising contour, a contour
which is also found on questions and non-final intonation phrases.

1
I would like to thank Dwight Bolinger, Alan Cruttenden, Bruce Downing, Dafydd Gib-
bon, Kathleen Houlihan, and Michael Kac for helpful comments on an earlier version of
this paper. The problems which remain are solely my responsibility.
2
I have used Pike's (1945) notation for marking intonation rather than the more commonly
used notation which is in parentheses after examples (2)-(7). The notation commonly used
by British linguists does not distinguish between the D contour and the C contour, as I
have defined them in this paper. The contours are simplified, and usually show only the
nuclear and sentence-final rises and falls.
A Discourse Domain 11

Identifying the D Contour

As Ladd (1978) has made clear, there are a limited number of intonation
contours in English which are recognised by everyone who studies intona-
tion. These 'consensus contours' include those on sentences (4)-(7). The
intonation patterns marked on these and subsequent examples are simpli-
fied, and show only nuclear and phrase-final rises and falls.

(4) He has the/monVy^ ( = He has the money.) A contour 3

(5) He has the/mo^ey.y ( = He has the "money.) A-rise contour 4

(6) He has the/money? ( = He has the 'money?) Β contour

(7) He has the money * ( = He has the money?) C contour

These four 'consensus contours' can be characterized in terms of changes


in fundamental frequency adjacent to (before, during or after) the nuclear
syllable and at phrase boundaries. Schematically, the rises and falls of
(4)-(7) can be represented as (4 )-(7 ).

(4 ) He has the monjey.]. A contour: nuclear fall, phrase-final fall.

(5 ) He has the monjey.f Α-rise contour: nuclear fall, phrase-final rise.

(6 ) He has the |money?| Β contour: nuclear rise, phrase-final rise.

(7 ) He has the jmoney?f C contour: nuclear fall, phrase-final rise.5

3
The names of the contours are based on the labels given to pitch accents by Bolinger
(1957).
4
For readers who find it difficult to distinguish this contour from the others, the following
suggestions might help. This contour sometimes is used on sentences where there seems to
be some type of implication. For example, sentence (5) might have the implication, "but he
doesn't have something else." The Α-rise contour is sometimes used on sentences in order
to communicate a "just any" interpretation:

We won't invite/ärfyone.^ (Meaning: We won't invite just anyone, only certain people.)
5
The nuclear rise refers to the last rise or fall adjacent to or on the nuclear syllable. Ameri-
can linguists usually refer to the nuclear syllable as the syllable with sentence stress. The
fact that the Α-rise contour and the C contour have the same configuration is not acciden-
tal. In Bing (1979) I argue that these two contours do not contrast, although the common
assumption is that they do.
12 J. Mueller Bing

In Bing (1979) I argue that the meaningful contours in English, those


which contrast categorically, can be defined as configurations of nuclear
tones and boundary tones (such as the phrase-final tones indicated in
(4 )—(7 )). I am assuming this hypothesis in the remarks which follow.
When one examines the intonation on the word Eunice in (3), it be-
comes clear that the contour on the vocative is different from that of any
of the consensus contours, including the C contour. Although the C con-
tour ends in a phrase-final rise, it always has a fall before the nuclear syl-
lable. The D contour does not have a prenuclear fall; the pitch on the
word following the phrase boundary is the same as the pitch on the final
word preceding the phrase boundary. When a vocative is added to sen-
tences (4)-(6), the pitch at the end of the word money is the same as the
pitch on the first syllable of the word Eunice.

(8) He has the/mori^y, Eunice. A -I- D contours

(9) He has the/mori^y,, Eunice./ Α-rise -I- D contours


, ; /
(10) He has the/money, Eunice? Β -I- D contours

One advantage of defining the D contour as a configuration rather than


either a high rise or a low rise is that this definition makes it possible to
explain why three apparently different contours, the low rise in (8), the
mid rise in (9) and the high rise in (10), actually have the same function,
that of indicating the vocative. Although perceptually the contour on Eu-
nice in (8) and (10) are quite different, both can be defined as the D con-
tour which has no nuclear rise or fall, and an optional phrase final rise.
The rises and falls on (8) and (10) can be represented as (8 ) and (10 ) re-
spectively.

(8 ) He has the m o n j e y j Eunice f

(10 ) He has the fmoney fEunicef

Notice that if the C contour occurs on the vocative after (6), the resulting
sentence is ill-formed.

(11) * He has the/money, Eunice.6/ Β + C contours

' Dwight Bolinger suggests that (11) might be normal as a protest in the following context:
i. H e has the money, Eunice! If we kick him out now, how are we going to pay the tab?
For me, this is one of the possible interpretations for (9), which can also be pronounced:

ii. H e has the/monefr, Eunice./


A Discourse Domain 13

The D contour is the only one which can occur on vocatives in sentence-
medial and sentence-final position, as the following sentences illustrate:

(12) Tfojs, Eunice, is my/sl^er. (A, D, A contours)

(13) Eui^ce^is my/si^er. (A-rise, Α-rise, A contours)

(14) T h i s is my/si^er, Eunice. (Α-rise, Β contours)

Eunice can occur with the A contour, as in (2) or with the Α-rise contour,
but in these cases, it is not interpreted as a vocative.
If the D contour is a continuation of the previous contour, why con-
sider it an independent contour at all? Why not simply assume that it is a
'tail' or continuation of the contour which precedes it? The strongest ar-
gument for considering the D contour an independent entity is the fact
that it is clearly set off by phrase boundaries. Although there is not always
an actual pause before a phrase which has a D contour, there is always
lengthening of the syllable before it. Because of this and other rhythmic
differences 7 , listeners can reliably distinguish between sentences which
have the same fundamental frequency but different phrase boundaries as
in the pairs of examples which follow.

(15) a) John struck/out\my friend. (A contour)

b) John struck/5ify, my friend. 8 (A 4- D contour)

(16) a) I've/noticed John^ (Α-rise contour)

b) I've/noticed, John., (A -I- D contours)

However, if one consciously supplies a real pause between the sentence and the vocative, it
is possible to get this interpretation on (iii), which is a variant of (ii) and (9), but not possi-
ble to get this interpretation on (iv).

iii. He has the/tnonfey, Eunice./

iv. *He has the/money, Eunice./

7
Liberman (1975: 292-293) provides very specific data about the intonation and timing of
the sentences in (15).
8
Notice that there is no phrase-final rise on (15b) or on (26). In sentence-final position, the
rise is always optional My informal observation is that men tend to omit the rise more than
women do in American English.
14 J. Mueller Bing

The fact that the D contour is always set off by phrase boundaries is re-
flected in English orthography by the fact that the phrases which have the
D contour are always set off by commas. In both of the (b) examples in
(15)—(16) there is a potential for pause at the place marked by a comma.
This is in contrast to the (a) examples in which pause is impossible be-
tween the corresponding pairs of words.9

The Discourse Function of the D Contour

As I have defined it, the D contour consists of a separate intonation


phrase which can be characterized as having no nuclear tones and an op-
tional phrase-final rising boundary tone. I would now like to show how
expressions marked with the D contour have a special function in sen-
tences. These expressions do not add to the truth conditions of the sen-
tence.10 Rather, they are concerned with the relationship between the
speaker and listener.
Sentences (2) and (3) offer a good contrast between a phrase which con-
tributes to the truth conditions of a sentence and one which does not.
When determining the truth conditions of sentence (2), the word Eunice,
with the A or falling contour helps determine the truth or falsity of the
sentence. If I say sentence (2) and the person I am introducing is both my
sister and named Eunice, then the sentence is true. However, if I say (2)
and the person is my sister, but is named Mary, then the sentence is false.
With sentence (3), however, the word Eunice may be changed without af-
fecting the truth value of the sentence. If I am addressing someone named
Mary and say (3), the truth value of the sentence has not been changed as
long as the person being introduced is my sister and the sentence is true. If
I mistakenly call the person I am addressing Mary I have not made a false
statement, but merely a social faux pas.
Vocatives are not the only expressions which commonly occur with the
D contour. Like vocatives, expletives and epithets always have the D con-
tour in medial and final position. One need not even hear the words on
the epithets in (17) and (18) to realize that the speaker is expressing an at-
titude, usually a negative one.

(17) John wouldn't/loan me his'ca^, the stupid bastard.

9 Alan Cruttenden disagrees with my judgement that there is always potential pause before
the D contour. Specifically, pause is impossible f o r him on sentence (10). For me it is not.
The lack of agreement may arise from the fact that I am considering pre-boundary length-
ening as one of the cues to the presence or absence of intonation boundaries.
10 I am using the phrase 'truth conditions' informally here to mean those conditions in the
real world which are necessary and sufficient to determine whether a sentence is true or
false.
A Discourse Domain 15

(18) Jötjn, the stupid bastard, wouldn't loan me his/c^n

One can even make a fairly innocuous phrase into an epithet by using the
D contour.

(19) JcH\n, the semanticist, wouldn't/loan me his ci^r.

Unless followed by a long pause and interpreted as a separate utterance,


expletives and epithets cannot occur with any contour other than the D
contour in sentence-medial and sentence-final position.

(20) I'll/have to take theln^, confound it.. (A -I- D contour)

(21) "•I'll/have to take the^bVs, con^ound^it, (A-rise + A-rise contour)

(22) ^I'll/have to take t h e i r s , conJround\k. (A + A contour)

The D contour on (23) contrasts quite clearly with the A-rise contour on
(24).

(23) My neighbors, the finks, have drunk all my beer. 11

(24) My neighbors, the/Fitfyts,, have drunk all my beer.

In (24) the sentence is true only if the neighbors who drank the beer were
the Finks and not the Steins. In (23) the sentence can be true if the neigh-
bors drank all the beer regardless of whether they were the Finks or the
Steins, and regardless of whether they were 'finks' or wonderful people.
Another class of expressions which always have the D contour in sen-
tence-medial and sentence-final position are the expressions which occur
with direct quotations and include 'verbs of saying.'

(25) "This/pä^ty," she whispered, "is a/bo^e."

(26) "I'm eiVIo^ng it," he replied.

" Dwight Bolinger has provided a context in which (23) can be ambiguous:
(i) M y neighbors, the finks/Finks, and my friends, the finks/Finks, have drunk all my
beer.
Similarly, (3) could be ambiguous in the context given in (ii).
(ii) This is my sister, Eunice, and this is my cousin, Eunice.
In spite of these counterexamples, in most contexts, one tends to get unambiguous read-
ings f o r both (23) and (3). In addition, it is possible to get a sentence-final rise on (ii) only
with the vocative reading. There also seem to be rhythmic differences between the t w o
readings, although they are subtle.
16 J. Mueller Bing

Notice that in (25) and (26) the truth conditions for the whole sentence
are quite different from the truth conditions for the part which is quoted.
Compare the truth conditions for (25) and (27).
(27) This party is a bore.
Sentence (27) is true if there is a party, and it is a bore, and false other-
wise. This assertion in (25) is not affected by the phrase, she whispered,
and its truth value is not changed if, in fact, the speaker shouted. The
truth conditions for (25) are different. If the party is a bore, but the
speaker shouted, the sentence is false.
Although the expressions which introduce quotations are different from
vocatives, expletives and epithets in the way in which they affect the truth
value of the sentence, the D contour functions in a similar way in all three
cases. It marks part of the sentence as a domain independent of the rest.
This domain, which I will call a 'discourse domain' is speaker/listener-
oriented rather than message-oriented.
Allerton and Cruttenden (1976) note that certain adverbials such as
frankly are speaker/listener oriented. They also claim that frankly can oc-
cur only with a low rising contour. The following example is theirs:

(28) a. Richard only just failed I frankly. 12

b. Richard only just failed frankly.

In cases where an adverbial is spoken with the D contour it expresses the


speaker's attitude towards the sentence. The following pairs of sentences
illustrate some frequently observed contrasts.

(29) a. I intend to/talk\to him truthfully.

b. I intend to/talk^to him, truthfully.

(30) a. She/mai^ied hirn happily,

b. She/maf^ied him, happily.

(31) a. They preserve/their\fruit/natu^ally.

b. They preserve/their\fruit, naturally.

In each of the (b) sentences the adverbial with the D contour expresses the
speaker's attitude, and does not contribute to the truth value of the sen-
tence. In each of the (a) sentences the adverbial does contribute to the

12
Allerton and Cruttenden (1974: 51).
A Discourse Domain 17

truth value. For example, in (30a) the sentence is true if she married him
and was happy; it is false if she married him and was not happy. In (30b)
happily may or may not correctly represent the speaker's attitude towards
the wedding, but this is not relevant to the truth or falsity of the state-
ment.15
If, as I am claiming, at least one function of the D contour is to mark a
domain separate from the sentence, a domain concerned with speaker/lis-
tener relationships, one would expect to get this interpretation consist-
ently even when some ambiguity is possible. This seems to be the case, as
(32) and (33) illustrate, in (32) the phrase, the poor child\ is not a descrip-
tion of Jenny; in (33) it is.

In (32) the message of the sentence is only that someone is named Jenny
and that Jenny fell down. The phrase, the poor child, expresses the
speaker's attitude towards her. Since the primary message of the phrase is
one of expressing sympathy, it is not necessary for Jenny to be poor and a
child for the sentence to be true, as it would be for (33).
Based on the hypothesis I am proposing, one might expect that any ex-
pressions which deal only with speaker-listener relationships would occur
with the D contour. This seems to be the case. Polite expressions let the
listener know that the speaker is sensitive to his feelings, but add nothing
substantial to the message. These expressions always occur with the D
contour in medial and final position.

(34) I'll have some, thank you.

(35) I'd like a seat, please,


/
near the door.

One group of expressions might at first seem to offer counter-examples to


the hypothesis that expressions which have the D contour do not affect
the truth value of the remainder of the sentence. These expressions, con-

13 It is also possible to find examples of adverbial expressions which occur with the D con-
tour and do affect the truth value of a sentence. For example:

He drinks a bit more, frequently, than I do.


r
This sentence would be false if he seldom drinks a bit more than I do. W h a t examples like
this indicate is that, although the function of identifying a 'discourse domain' may be one
possible function of the D contour, it is not the only function. Since none of the consen-
sus contours has a single function, this is not surprising.
18 J. Mueller Bing

taining the epistemic verbs such as think, know, and suppose, seem to re-
flect a judgment about the truth or falsity of a statement.

(36) Claude went to the party, I think.

(37) Claude went to the party, I know^

(38) He drank a lot, I suppose.

In fact, the opinion of the speaker about the probability that the sentence
is true does not in any way affect the truth conditions for the sentences.
Sentences (36) and (37) are true if Claude went to the party and false if he
did not, regardless of the opinion of the speaker. This is also true for sen-
tences with adverbials such as possibly and probably.

(39) He was the last one to leave, probably.

An obvious question which must be asked is why the kinds of expressions


I have argued are marked as a special discourse domain by the D contour
can sometimes occur in sentence-initial position with other contours. For
example:

(40) Eunice, this is my/sis^er.

(41) Happily, she/married him.

In sentence-initial position, these expressions are usually not ambiguous.


That is, sentence (40) is unambiguously synonomous with (3) and not (2)
and sentence (41) can only mean (30b) and never (30a). In a connected
discourse the difference between sentence-initial and sentence-medial con-
tours can be an important cue. For example in the following pairs of sen-
tences, the intonation on the phrase she whispered is undoubtedly one of
the cues which tells the listener which sentence was whispered.

(42) "I'm/leaving now," she whispered. "You/must take your med\cine."

(43) "I'm/leaving now." She/whispgred^ "You/must take your med^ine."

Within a connected discourse it is apparently helpful to distinguish be-


tween sentence-initial and sentence-medial expressions. Apparently, the D
contour is one of the cues which helps listeners do so.
In summary, I have proposed that there is a class of expressions which
can occur with a contour which I have called the D contour. Expressions
with this contour have a special discourse function in which feelings, opin-
A Discourse Domain 19

ions, and speaker-listener relationships are dealt with. The phrases which
have the D contour are interpreted independently of the rest of the sen-
tence and do not contribute to the truth value of the sentence.

References

Allerton, D . J . and Cruttenden, A. (1974). English sentence adverbials: their syntax and their
intonation in British English. Lingua 34: 1-30.
Bing, J. Mueller (1979). Aspects of English Prosody. Ph. D. dissertation. University of Massa-
chusetts.
Bolinger, D. (1957). A Theory of Pitch Accent in English. Word 14: 109-149.
Ladd, D. Robert (1978). The Structure of Intonational Meaning. Bloomington: Indiana Uni-
versity Press.
Liberman, M. (1975). The Intonational System of English. Ph. D. dissertation. Μ. I. T.
Pike, K. (1945). The Intonation of American English. Ann Arbor: University of Michigan
Press.
L. BOYES, Β. L. T E N HAVE, W. Η . V I E R E G G E

Automatic Transcription of Intonation in Dutch

1. Introduction

T h e intonation of an utterance is a complex phenomenon which is consti-


tuted by a number of independent factors: pitch, loudness, and the tem-
poral organisation of the utterance (also called tempo or rhythm by some
authors) are traditionally regarded as factors constituting intonation.
Things are further complicated by the fact that all factors mentioned vary
with time. Such very complex phenomena are, of course, very difficult to
investigate and to transcribe, the more so if all contributing factors are
equally important.
Fortunately, however, there are good reasons to believe that pitch varia-
tions are far more important in the sensation of intonation than the other
factors (Crystal, 1969). More often than not the meaning of the word 'in-
tonation' is restricted to denote pitch movements. One can use this insight
in order to simplify the research on intonation by concentrating the atten-
tion on pitch alone. Such simplifications are widely accepted in most em-
pirical research, because they enable the systematic examination of very
complicated phenomena. There is, however, a real danger in a research
strategy that concentrates fully on a single factor at the expense of all
others: intitial success in the research into the functioning of that factor
may give rise to the fixation of all research onto this factor and the even-
tual denial of the role of the remaining, less popular ones. T h e fact that
many authors confine the meaning of the term 'intonation' to pitch move-
ments without any references whatsoever to other factors can be taken as
a proof of the existence of this danger.
The present research also concentrates on pitch movements; neverthe-
less we find it important to state that in our opinion at least the temporal
structure of the utterance will eventually prove to be indispensable for a
comprehensive description of intonation.
It is very difficult to make an adequate description of the pitch variation
in an utterance. This difficulty is in part a result of the fact that the exact
meaning of 'adequate' is primarily determined by the aim with which the
description is made. Obvious aims are the discussion and, hopefully, the
correction of failures made by second language learners or by handi-
capped mother tongue learners and - quite differently - the specification
of the intonation of sentences which are to be generated by speech syn-
Automatic Transcription 21

thesizers in the automatic voice response information systems of the near


future.
Traditionally, there have been two starting points in making a descrip-
tion of the pitch movements in an utterance. The first starting point is
purely auditory in nature: trained subjects (who are preferably musical)
transcribe the pitch movements in an utterance in staff notation. This is a
very time-consuming procedure and the results must be expected to be too
detailed to be adequate in most applications. A simplification of this
procedure, e. g. by reducing the full musical scale to 4 or so pitch levels,
runs into great problems in actual practice. The reader is referred to Crys-
tal (1969) for a comprehensive review of auditorily based transcription
systems that have been proposed in the literature. The alternative starting
point is purely physical in nature: one registers the time variation of the
fundamental frequency (F 0 ) in the utterance by means of any F 0 -extractor
and next one tries to describe this registration as compactly as possible.
This approach encounters great problems too. Not only is the construc-
tion of a dependable F 0 -extractor still one of the major, only partly
solved, problems of acoustic phonetics, 1 but even if one can produce F 0 re-
gistrations, these prove to be so capricious as to resist almost any attempt
to describe them in simple terms.
A group of Dutch researchers must be credited with the design of a re-
search strategy that overcomes some of the major problems of both ap-
proaches. We will give a brief description of this strategy; for a more com-
prehensive review the reader is referred to the literature (Cohen & ' t Hart,
1967, 'Hart & Cohen, 1973; 't Hart & Collier, 1975). Basic to this re-
search strategy ist the Intonator (Willems, 1966; Vögten & Willems,
1977), an instrument by means of which the F 0 movements in an utterance
can be changed. More precisely it is a speech analysis/resynthesis system
in which upon speech resynthesis the information as to the original F 0
movements in the input speech is discarded and replaced by a synthetic,
controlled F 0 contour. Although in principle all kinds of functions should
have been used in specifying the synthetic F 0 contours, in actual practice
they are limited to a concatenation of straight lines. Independent research
has shown that the use of more complex quadratic functions is not likely
to give better results (Romportl, 1974). A straight line segment may re-
present an F c -rise, an F c -fall or a level F 0 ; soon it appeared that the F 0 in
the absence of pitch rises or falls should not stay at the same, constant va-
lue but should fall slowly, thus replacing level F 0 by the so-called 'declina-
tion'. The said research strategy consists of trying to produce a synthetic
utterance by means of the Intonator, using a minimal number of straight
line segments to specify the intonation, which nevertheless is 'perceptually
equivalent' to the original utterance. The somewhat vague term 'percept-

1 Cf. Krause, this volume. Eds.


22 L. Boves, Β. L. ten Have, H. Vieregge

ual equivalence' should be taken to mean that the synthesized utterance -


although one can possibly hear that it differs from the original - must be
considered to have the same intonation as the original utterance, similarity
or dissimilarity being established on the basis of auditory perception. This
means that the F 0 contour of the 'perceptually equivalent' synthetic utter-
ance may differ considerably from the one in its original counterpart.
Research using the Intonator has led to the conclusion that a limited
number of F 0 movements (viz. exactly 12) suffices to synthesize the into-
nation of almost all utterances in Dutch. These F 0 movements correspond
to perceptually discriminable pitch movements. Thus it is possible - at the
cost of extensive training - to learn to transcribe the pitch contour of both
read and spontaneous utterances in Dutch in terms of the said pitch move-
ments.

Symbol movement place in syllable prominence lending


1 rise early yes
2 rise very late no
3 rise late yes
4 gradual extends over several no
rise consecutive syllables
5 half rise early enhances prominence
lending capacity of
following fall of type
A
A fall late yes
Β fall very early or in be- no
tween two syllables
C fall very late no
D gradual extends over several no
fall consecutive syllables
Ε half fall rather late depends on sur-
rounding pitch con-
tour
0 low extends over any no
declination number of syllables
0 high extends over any no
declination number of syllables

Table 1: Inventory of perceptually relevant pitch movements in Dutch. The position of the
movements in the syllable is defined with respect to the vowel onset. For a further
explanation of the column 'prominence lending' see text.
Automatic Transcription 23

Analysis of the transcriptions of the pitch contours in a rather large cor-


pus has shown that the 12 pitch movements do not occur in random order;
it is, instead, possible to describe the admissible combinations of the pitch
movements by means of a generative grammar ('t Hart & Collier, 1975).
As might already have been suggested by the fact that perceptually equiva-
lent synthetic utterances are generated using a minimal number of straight
line segments for the F c contour, the 12 pitch movements are not only per-
ceptually discriminable but also perceptually relevant: that is to say, the
omission (or addition) of one of the movements from (to) a synthetic F 0
curve would destroy the perceptual equivalence. Equally true, the omis-
sion (or addition) of such a movement from (to) a transcription of the
pitch contour of a given utterance would render this transcription defec-
tive.
The 12 perceptually relevant pitch movements that are necessary and
sufficient to transcribe the pitch contours of Dutch utterances are sum-
marized in Table 1. With regard to the definitions in the table some re-
marks are in order.
a) Apparently, the pitch contours of Dutch utterances can be described
as mainly two pitch levels and a limited number of ways to go from one le-
vel to the other. Only rises of types 2 and 5 may exceed the high level;
only a fall of type Ε can bring the pitch to a intermediate level between
low and high. It is crucially important that the pitch contours are de-
scribed in terms of movements, not levels. This requirement renders de-
scriptions of the underlying F 0 curves by only stating a single 'mean' level
or 'peak' level per syllable virtually meaningless. What we need are de-
scriptions which state at least the range and the slope of line segments that
approximate the original F 0 curves and their positions with respect to the
beginning and end of a syllable.
b) The perceptually relevant pitch movements can be divided into two
groups. The first group comprises the prominence-lending movements
(types 1, 3, A and Ε in specific contexts); the second group contains the
pitch movements that do not lend prominence to a syllable. The designa-
tion 'prominence-lending' is chosen deliberately, for if listeners consider a
syllable stressed, these stress judgments almost invariably prove to be
caused by pitch movements (Cohen & 't Hart 1967). Additional parame-
ters like the duration of the syllable and its loudness may give rise to stress
perception as well, but are clearly less effective than pitch movements (Van
Katwijk, 1974). In the system for the transcription of the pitch contours of
Dutch sentences devised b y ' t Hart, Cohen and Collier, intonation and
pitch accents are intricately connected with each other: every (pitch) ac-
cent involves a prominence lending pitch movement. The system will not
cover those syllables that carry - probably secondary — stresses caused by
other parameters. If, however, a syllable is assigned a prominence lending
pitch movement in an intonation transcription it must be perceived as be-
24 L. Boves, Β. L. ten Have, H. Vieregge

ing stressed. Equally important: in its present form, the transcription sys-
tem does not provide any means for distinguishing primary (pitch) accents
from secondary stresses; as far as the transcription system is concerned,
stress is a strictly binary feature - a syllable is stressed (carries a pitch ac-
cent) or it is unstressed; there are no half- or weak-stressed syllables.
c) The physical properties of the F 0 movements underlying the percept-
ually relevant pitch movements are defined rather precisely, the more so if
compared with other transcription systems known from the literature.
This precision, which would seem to make objective verification possible,
has been a major impetus for the present investigations.
d) The twelve perceptually relevant pitch movements do not have equal
probabilities of occurrence. The movements of type 1, 2, A, Β, 0 and 0 oc-
cur more frequently than the movements of type 3, 4, C and D; move-
ments of type 5 and Ε seem to be rather rare. The order in which the
movements have been discovered corresponds to the frequency of their
usage; our knowledge of both the applicability and the properties of the
underlying F c movements is accordingly greater for the 'older' move-
ments.

2. Purpose of the Investigations

Although the properties of the perceptually relevant pitch movements are


described in a way that seems to allow of physical measurements, the ap-
plications of the system have so far been based mainly on auditory percep-
tion. Extensive training is needed in order to use the system in making re-
liable transcriptions. We know of no formal test of the consistency within
and between subjects trained to use the system; the major reason to expect
that the outcome of such a test would be more positive than Lieberman's
results on the Trager-Smith system (Lieberman 1964) is the fact that the
system o f ' t Hart and co-workers is explicitly void of any reference to syn-
tactic structure. The main purpose of our investigations is to see to what
extent one can shift the emphasis from auditory perception to a visual
(and eventually perhaps even automatic) interpretation of the recordings
of a small number of physical parameters of speech signals.
Before we started a detailed investigation into the physical properties of
perceptually relevant pitch movements we wanted to get some more in-
sight into their global functioning and their salience. With this aim in
mind experiment I on the perceptual scaling of pitch contours was de-
signed.
Experiment II had a double motivation. In the first place it aimed at
scrutinizing a unique feature of the 't Hart-Collier-Cohen transcription
system, viz. the differentiation between pitch movements on the basis of
their position within a syllable. At the same time, we wanted to examine
Automatic Transcription 25

the applicability of a specific technique for the acquisition of utterances


having a known intonation contour, viz. the imitation of previously tran-
scribed sentences.
Experiment III aimed at the design of a technique for deriving the tran-
scription of the intonation of texts which are likely to contain only pitch
movements of the types 1, 2, A, Β, 0 and 0 directly from recordings of the
physical parameters relating to pitch, loudness, and temporal and syllabic
structure of the utterances.

3. Experiment in Perceptual Scaling of Intonation Contours

In spite of the restrictions on the ways in which perceptually relevant pitch


movements can be combined into intonation contours ('t Hart & Collier,
1975), it remains possible to realize even simple sentences with a great

he lenwil di klera me ne ma he lenwildi kle ra me ne ma


26 L. Boves, Β. L. ten Have, H. Vieregge

number of 'grammatical intonations'. Fig. 1 displays schematically fifteen


different contours that can be used on the Dutch sentence Heleen wil die
kleren tneenemen (Heleen wants to carry those clothes). This set of possi-
ble intonations is not exhaustive; there are, for instance, no intonations
that put a pitch accent onto the pronoun die, although such an accent
would be perfectly natural. The fifteen different intonations, originally
designed for a separate research project (Collier, 1975), were recorded
onto magnetic tape by R. Collier. Next, a stimulus type containing all 105
possible pairs of different intonations (15 X 14/2) in random order was
prepared. Pairs were separated by pauses of approximately 10 seconds and
preceded by alerting tones. This tape was presented to ten subjects who
were asked to indicate their estimates as to the similarity of the intona-
tions in each pair on a six-point scale. N o subject had any special knowl-
edge of intonation, nor any experience with the transcription system unter
study.
The scores were at first processed by means of the multidimensional
scaling technique MRSCAL due to E. Roskam. This approach did not
yield interpretable results, because the number of dimensions needed to
represent the distances between the stimuli was too high. Therefore we
must suppose that the similarity judgments of our subjects were not based
upon a small number of simple underlying parameters. This should not be
taken as a proof that the parameters pitch, loudness, and temporal struc-
ture do not suffice to explain similarities and differences between intona-
tion contours. The results do suggest, however, that these parameters be-
have differently from essentially linear and orthogonal parameters as for
instance length, width, and height in estimating the similarity of solid ob-
jects. This conclusion is not too surprising, as length, width, and height of
solids are constant in time, whereas pitch, loudness, and tempo change
with time, every parameter changing in its own way, independently of the
others.
Hierarchical cluster analysis (Johnson 1967) is another means of analys-
ing the structure of a set of similarity judgments. The technique provides
two methods for defining clusters, known as the minimum and maximum
methods, respectively. If the set of similarity scores exhibits a clear and
consistent structure, the minimum and the maximum method will yield
identical results. As might be expected from the failure of the multidimen-
sional scaling technique, our data do not show this ideal structure. The re-
sults of both methods are, however, sufficiently similar to warrant a cau-
tious interpretation, the more so because both methods show a reasonably
high correlation (.7 in both cases) between the raw similarity scores and
the ultrametric distances.
The maximum method puts the contours 10, 11, 13 and 15 together;
this must be caused by the fact that these contours end with a high decli-
nation (the fall of type C occurs mostly so late in the last syllable that it
Automatic Transcription 27

does not really disable the perception of a high-ending pitch). The mini-
mum method, on the other hand, shows a very great distance between the
cluster formed by the contours 10 and 13 and the contours 11 and 15.
This suggests that the prominence lending pitch rises of type 1 and type 3
are perceptually different indeed.
The close association in both solutions between the contours 4 and 9 is
very remarkable if one realizes that the addition of the non-prominence-
lending rise of type 2 in the last syllable turns contour 9 into a question,
whereas contour 4 is a neutral declaration. This supports the reasonable-
ness of the decision b y ' t Hart and his co-workers not to use syntactic and
semantic categories in the design of their transcription system.
Another striking result is the dissociation between plain and dented
hat-patterns (1 and 2, 3 and 4, 5 and 6), all being neutral declarative into-
nations, especially the distance between contours 5 and 6. This can only be
explained by supposing that not only the number and place of pitch ac-
cents count, but also the manner in which they are realized. Accentuation
by means of a pitch fall seems to be clearly different from an accent
caused by a combined rise and fall on one syllable. This seems to under-
score the perceptual importance of the otherwise inconspicuous movement
of type B. The close association in both methods between the contours 5
and 14 should also be pointed out; it suggests that under certain circum-
stances the sequence of two falls of type Ε might easily be mistaken for a
single fall of type A.
For the rest, the results seem to resist any simple and consistent expla-
nation. The number and placement of pitch accents seem to be of influ-
ence, but it is not quite clear exactly how. Nevertheless, the results have
enhanced our confidence in the reality of the perceptually relevant pitch
movements as defined b y ' t Hart et al.

4. Experiment II: Close Examination of the Physical Differences Between


Two Pitch Rises

The description of the perceptually relevant pitch movements in Table I


gives the properties of the movements in their canonical forms or, perhaps
even more precisely, in the form they take upon resynthesis in the Intona-
tor. It is unlikely that these descriptions apply equally well if we have to
deal with pitch movements in natural speech. If the transcription system
should become amenable to application based upon recordings of physical
parameters extracted from natural speech, it should be known what
amount of variation along the dimensions used in the description is to be
expected. One such dimension, which seems to be unique to this system, is
the position within a syllable where a pitch movement occurs. The most
interesting case is the difference between the pitch rises of types 1 and 3;
28 L. Boves, Β. L. ten Have, H. Vieregge

both are prominence-lending, but rises of type 1 are said to occur early in
the syllable whereas rises of type 3 occur late. There are more instances of
movements that differ primarily in their position within the syllable, but
then one is prominence-lending and the other is not. Therefore we
thought that a proof of the possibility of discriminating between instances
of rises of type 1 and rises of type 3 solely on the basis of acoustic registra-
tions would strongly support our hypothesis that it will eventually be fea-
sible to base the entire system on an interpretation of physical registra-
tions.
Obviously in natural speech two sources of variation in F 0 exist, viz. in-
ter· speaker and intra-speaker variations. In order to estimate if inter-
and/or intra-speaker differences in each type of pitch rise would be so
great at to exceed the differences between the movements, we needed a
great number of 'sure' tokens of both movements produced by a number
of speakers.
We decided to try to collect our material by asking subjects to imitate
simple sentences containing clear examples of the movements under inves-
tigation.
Four sentences, each spoken with five different intonation contours,
were recorded onto magnetic tape by J. 't Hart. Three sentences and three
intonation contours were meant as distractor items, the remaining sen-
tence with two different intonation contours served as test items. Ten sti-
mulus tapes were constructed by copying the five tokens of each sentence
in random order, but blocked per sentence. All ten stimulus tapes were im-
itated once by two subjects in one session. Next, nine stimulus tapes were
imitated by the same two subjects one tape a day on nine successive work-
ing days. This design enables the comparison of relatively short-term fluc-
tuations in the properties of pitch movements (fluctuations within a single
recording session) with long-term fluctuations, e. g. fluctuations between
recording sessions on different days.
As test sentence, the almost meaningless but at the same time almost en-
tirely voiced clause Wie naait moet een naald gebruiken (Who sews has to
use a needle) was used. The intonation contours under test are fairly well
established, viz. a hat pattern and a cap pattern ('t Hart & Collier, 1975)
and are shown in Fig. 2. The hat pattern has a prominence-lending rise of
type 1 on the second syllable, where the cap pattern has an - equally
prominence-lending - rise of type 2 instead. As will be clear from Fig. 3,
the patterns show considerable difference in the second part, the hat pat-
tern having a prominence-lending fall of type A on the fifth syllable,
whereas the cap pattern remains on the high declination until the fall of
type C at the very end of the utterance. The hat pattern occurs extremely
often and gives an utterance the character of a neutral declaration. The
cap pattern occurs somewhat less frequently and is also less neutral, giving
more or less the impression of a surprised exclamation.
Automatic Transcription 29

wi nait mut an nalt χβ brvvy ka

Figure 2: Two intonation patterns for the Dutch sentence 'Wie naait moet een naald gebrui-
ken'.
A: Hat pattern with prominence lending rise of type 1 on second syllable
B: Cap pattern with prominence lending rise of type 3 on second syllable

From all imitations and from the two example utterances simultaneous
registration of F 0 , intensity and the oscillogram were made, using the
same apparatus as described in experiment III below. On the basis of the
oscillogram and the intensity curve the onset of the vowel in the second,
stressed syllable (naait) was established. The vowel onset was defined as
L. Boves, Β. L. ten Have, H. Vieregge

3c
50 100
Automatic Transcription 31

Figure 3a-e: F c contours in the first two syllables of the sentence "Wie naait moet een naald
gebruiken".
The upper pannels contain averages from 10 imitations during a single session;
the middle pannels show averages for 9 imitations on 9 consecutive days. The
leftmost collumn pertains to speaker IS, the rightmost to speaker TR.
Vertical bars denote standard deviations. The arrows mark syllable boundaries.
The lower pannel shows the F 0 contours in the corresponding syllables in the
example utterance, spoken by J. 't Hart.
Curves labelled '3' pertain to the rise taken from the Cap-pattern, whereas
curves labelled Ί ' are taken from the Hat-pattern.
32 L. Boves, Β. L. ten Have, H. Vieregge

the moment t = 0, and F 0 measurements were taken at the points t =


- 100, - 80, . . . , 0, 20, . . . , 200 ms.
The measurements have been averaged per subject and per intonation
contour, for the imitations in one session and the imitations in nine suc-
cessive sessions separately. In the example utterances, no such averaging
was, of course, possible. The results are summarized in Fig. 3, where the
vertical lines denote the standard deviation per measurement point, the
standard deviations range from 2 Hz to 16 Hz and seem to be somewhat
greater for the rises of type 3 than for rises of type 1. Also, the deviations
seem to be marginally greater in the imitations on nine different days than
in the imitations within one session.
It is perfectly clear from Fig. 3 that the obvious difference in the onset
of the rises in the example utterances is absent from the imitations. The
'early-late' contrast in the examples seems to have replaced by a 'less steep,
smaller excursion (type 1) - more steep, greater excursion (type 3)' con-
trast in the imitations. The fact that rises of type 3 consistently show
greater excursions was not a total surprise, as a similar observation had
been made earlier by Collier (1970), who used 5 semitones to synthesize
rises of type 1 and 1 octave for rises of type 3. Although the way in which
the rises differ from each other was not as we had expected, there still is
so obvious and consistent a difference that both types of rises could have
been discriminated by a simple algorithm. This conclusion supports the
idea that a transcription based upon physical parameters is feasible. We
can also conclude that the imitation technique used here gives results that
show far less variation than repeated, free renderings, as in Atkinson
(1976). However small the variations within and between speakers may be,
the imitation technique nevertheless failed to produce the sort of differ-
ences aimed at. For this a number of explanations can be brought up. One
is the possibility that different speakers use different cues for signalling
the distinction between various pitch movements. Another explanation
might be that it is very difficult (at least for naive subjects) to imitate the
non-neutral cap pattern correctly. As a matter of fact it might even prove
that the intonations of the imitations could not be distinguished and iden-
tified correctly as hat or cap patterns. Therefore we decided to design a
perception test.
To that end the imitations of the two test utterances were collected on a
tape randomized per subject and per situation (single session vs. nine ses-
sions). The tapes containing the imitations from the single session were
given to six subjects with the instruction to indicate whether the pitch rise
was early or late in the second syllable. Four of these subjects processed
the tapes containing the imitations from nine different sessions as well.
Results are summarized in the upper half of the rows of Table II in terms
of percentages correct. It is seen that three subjects reach a 100% correct
score; of these, J H is one of the designers of the transcription system; the
Automatic Transcription 33

other two (IS and LB) had some previous experience in using the system.
The remaining subjects knew the system only in an abstract way, from the
literature.

Judge JH LB IS TR W PW
Stimuli
Speaker IS 100 100 100 0 30 0
10 imitations
in one session 30 80 85 15 10 15
Speaker IS - 100 100 5 — 0
9 imitations on
consecutive days - 94 100 0 - 0
Speaker TR 100 100 100 35 50 15
10 imitations
in one session 20 90 80 30 30 15
Speaker TR - 100 100 17 — 17
9 imitations on
consecutive days - 83 83 11 - 17

Table 2: Percentage correct scores on judging the type of pitch rise in the second syllable of
the Dutch sentence Wie naait moet een naald gebruiken.
The upper part of the rows refers to scores on complete sentences, the lower part to
scores on the gated segments consisting of the first and second syllable only.

From these results we may only conclude that all imitations constituted
acceptable realizations of hat or cap patterns, as all three 'trained' judges
reported that they had based their decisions entirely on the contour as a
whole. The confusions made by the untrained subjects show that they can
distinguish both contours (for speaker IS somewhat more easily than for
speaker TR), but cannot label them correctly.
In order to prevent our subjects from taking recourse to the differences
in the second halves of the contours and forcing them instead to give real
'early-late' judgements, we prepared stimulus tapes that contained only
the first two syllables of all imitations. These tapes were given to the same
subjects with the same instruction: indicate whether the pitch rise is late or
early in the second syllable. The results are summarized in the lower half
of the rows of Table II, again in terms of percentages 'correct' score; the
quotes indicate that correct assignments in terms of actual timing are
hardly to be expected; the best we can expect being correct labelling in
terms of rises intended as specimens of type 1 or type 3. The results look
rather strange at first sight, especially the fact that now the 'moderately
trained' subjects LB and IS outperform the 'highly trained' subject J H
34 L. Boves, Β. L. ten Have, H. Vieregge

whose correct scores were reduced to the same percentage as that of the
'untrained' subjects. This result can, however, be explained by the fact
that the 'moderately trained' judges continued judging in terms of rises of
types 1 and 3 (and have succeeded to a large extent in doing so) whereas
the 'highly trained' judge J H really used the position of the rise in the syl-
lable as his criterion. This explanation is supported by a detailed analysis
of the individual imitations and the corresponding judgments. As can be
seen from Fig. 3, speaker IS begins both types of rises rather early in the
syllable; in perfect accord with this, J H denotes the great majority of rises
produced by this speaker as type 1. Only those rises that indeed occur re-
latively late are judged to be of type 3 by JH. With speaker TR, who be-
gins all rises in the middle of the syllable, no consistent explanation for
correct scores can be given. From the fact that there is no relation between
the correct scores of J H and those of the untrained subjects it must be hy-
pothesized that a great amount of training is necessary in order to be able
to determine the position of a pitch movement within a syllable correctly.
The better than chance performance of subjects IS and LB suggests that
the first two syllables contain much information as to the type of contour
the rise is taken from, but it is not clear what exactly this information is.
Because we did not see ways to separate the effects of speaker idiosyn-
crasies and problems in the imitation of non-neutral intonation contours,
we decided to discard both the imitation technique and the use of pitch
movements that are likely to be used in non-neutral utterances.

5. Experiment III: Transcribing Intonation on the Basis of Physical Parame-


ters

5.1. The purpose of the experiment


Having seen that two prominence-lending pitch rises can indeed be distin-
guished by means of physical parameters, we decided to try and develop a
set of rules for deriving the transcription of the intonation directly from
recordings of FOJ intensity and the oscillogram of the utterance. In doing
that we had, of course, to restrict ourselves to a subset of the pitch move-
ments defined in Table I. We decided to limit ourselves to the subset of
most frequently used movements, comprising the movements of types 1, 2,
A, Β, 0 and 0. For this we had a number of reasons, the most important of
which are related to the research strategy we had in mind. In the first
place, we wanted to collect the speech material to work with no longer by
means of imitations but by asking a number of subjects to read aloud sim-
ple texts. Neutrally read texts are known to most likely contain only pitch
movements of types 1, 2, A, Β, 0 and 0 ('t Hart, personal communication).
In the second place, this particular restriction makes it possible to evaluate
— at least to a great extent - the transcription derived from recordings of
Automatic Transcription 35

physical parameters without taking recourse to a transcription made by an


expert transcriber. For if in our transcription a syllable is assigned one (or
both) of the movements type 1 or tape A, it carries a pitch accent and must
thus be perceived as stressed by listeners. If on the other hand a syllable is
not assigned a prominence-lending pitch movement in our transcription, it
should not be judged as stressed by the listeners. The transcription of the
non-prominence-lending pitch movements can be checked against the
background of the intonation grammar ('t Hart & Collier, 1975), by al-
lowing only 'grammatical' sequences of pitch movements.

5.2. Speech material


Tape recordings have been made from five extensive weather forecasts
read by four professional speakers, from a passage from a periodical read
by two non-professional speakers and interviews with these non-profes-
sional speakers. The weather forecasts are referred to as recordings 1-5,
the passages from the periodical as recordings 6 and 7, and the interviews
as recordings 8 and 9. The weather forecasts 3 and 4 are read by the same
speaker, all speakers are adult males.

5.3. Recording and processing of physical parameters


From all utterances we made simultaneous recordings of F 0 , intensity and
oscillogram. For making the F 0 registrations we used the pitch-period ex-
tractor designed and described by von Rossum & Boves (1978). Speech in-
tensity has also been registered by means of a apparatus of our own mak-
ing, which has the advantage that it provides a very short averaging time
(10 ms). A short integration time is helpful in the segmentation of the ut-
terance into syllables. All recordings have been made by means of an
U.V.-recorder, which provides a bandwidth of 5 kHz. Such a large band-
width is, of course, only necessary for a low-distortion registration of the
oscillogram.
The processing of the recordings and their eventual interpretation can
be divided into a number of sequential steps. We begin with the segmenta-
tion of the utterance into syllable-like units. In doing so, the intensity
curve is the primary source of information. At those points where the in-
tensity curve leaves doubt as to the correct placement of syllable bounda-
ries, the oscillogram of the speech signal provides additional information.
A narrow phonetic transcription of the utterance is helpful too, because
such a transcription gives a more reliable indication of the number of syl-
lables actually produced than the conventional orthography. In the great
majority of the cases the syllable units derived from the intensity curve
and oscillogram will correspond to the linguistically defined syllables, but
unstressed short linguistic syllables may be found to be totally merged
with one of their neighbours.
The syllabic segmentation of an utterance being completed, we start
36 L. Boves, Β. L. ten Have, H. Vieregge

processing the F 0 registration. First of all, apparent errors (if present) of


the pitch period extractor are removed from the F 0 registration. For this
initial smoothing, which is completely done by hand, the oscillogram pro-
vides valuable information too. Next, the F 0 curve is approximated by one
or two straight lines per syllable; this is motivated by the fact that in the
transcription system devised by 't Hart et al., a syllable can carry at most
two perceptually relevant pitch movements. The straight line approxima-
tion is then converted into numeric form by recording the starting and end
points along both the time and the frequency axis. Finding an expression
for the steepness of F 0 movements poses a problem for which we have
only found an ad hoc solution: a great many syllables begin or end with
one or more unvoiced consonants so that the F c curves inevitably show a
number of gaps. Simple measures for the steepness of F 0 movements can
only be given for movements that are not at a boundary of a voiced inter-
val. This situation corresponds with case a) in Fig. 4 where the problem is
illustrated for left-hand boundaries. For right-hand boundaries the prob-
lem ar completely analogous. Case b) in Fig. 4 is an example of what has
become known as a virtual pitch (or F 0 ) movement. After a gap caused by
unvoiced consonants the F 0 in a syllable starts again with a level or slightly
falling segment, but at a level well above (or below) the end point of the
previous voiced interval. In cases like these there is no definition of the
steepness of the F 0 movement that makes sense, the more so because the
gap in the F 0 curve may also be caused by a pause, the duration of which is
more or less free. Virtual movements will receive special treatment in all
subsequent processing. The most difficult case is that in which a move-
ment is partially virtual and partially real, as in case c) in Fig. 4. In situa-
tions like these the gap in the pitch curve may have been caused by any
number of unvoiced consonants; this makes the total duration of the
movement a meaningless figure, and therewith also the definition of steep-

' A , Β C j L__ F
Fo
(Hz)
t
Af
t
Δf
^t Δf
/
V At
_L
•v \
u «ι \
krl
F b
-Fo

(arbitrary units)
Figure 4: On defining the slope of F c movements.
A: Purely real Fo movement; slope is Δ ί / Δ ί
Β: Purely virtual F 0 movement; no sensible definition of slope possible
C: Partly virtual, partly real F 0 movement; formal definition of slope is very diffi-
cult. An ad hoc definition that appears to work well in practice is slope is A f c o .
talMtreai
Automatic Transcription 37

ness as F 0 difference divided by duration. As an admittedly arbitrary, but


in practice useful definition of steepness in cases like these we have chosen
the total F 0 difference divided by the duration of the real part of the
movement only.
As a last piece of information, the position of the F 0 movement within
the syllable is recorded. In handling the starting and end points of an F 0
movement we do not use the absolute value, expressed in ms, separating
the point from either boundary; instead, we express this distance as a per-
centage of the mean duration of the syllables in the utterance. The final
step on the way from raw registration to a transcription of the intonation
is the application of a set of rules for the conversion of F 0 movements into
perceptually relevant pitch movements. The derivation of these rules is, of
course, the aim of this experiment. We will now describe the development
and testing of the rules.

5.4. Considerations in developing the rules


The primary criterion for checking the adequacy of rules for deriving the
intonation transcription from physical registrations was, of course, the
transcription of the intonation made by an expert transcriber, namely J.
't Hart. A secondary criterion, mainly used in testing the results of the ap-
plication of the rules, was formed by stress scores given by a number of
subjects. The use of stress scores as a criterion is motivated by the fact
that two of the perceptually relevant pitch movements in the subset under
analysis are prominence-lending; syllables which have been assigned one
or both of these movements must accordingly be perceived as stressed. J.
't Hart has provided us with transcriptions of the intonation in the record-
ings 1-5 inclusive, stress scores have been obtained for all recordings.
The experiments for obtaining stress scores all had the same uniform
setup. Four subjects were first presented with the recording as a whole;
next, they heard every sentence or part of a sentence with a duration not
exceeding 3 s repeated three times, with pauses of approximately 4 s be-
tween the repetitions. Subjects were instructed to indicate which syllables
they considered stressed by crossing them out in an orthographic rendi-
tion of the text. In the instruction, some emphasis was laid on the fact that
an utterance can contain more than one stressed syllable. Owing to the
small number of subjects, statistical processing of the scores was thought
to be meaningless. We decided instead to consider a syllable as 'perceptu-
ally stressed' if that syllable had been marked as stressed by three or four
subjects.
In order to design rules for the mapping of F 0 movements onto percept-
ually relevant pitch movements, the recordings were divided into a design
set and three test sets. Recordings 1-3 inclusive formed the design set; dif-
ferent test sets were used to test the rules under different conditions.
The physical properties of the F 0 curves of the recordings in the design
38 L. Boves, Β. L. ten Have, H. Vieregge

set (i.e. the width of the movements in Hz, their slopes as defined in
Fig. 4, and their position in the syllable) were compared with the tran-
scription given b y ' t Hart in a very detailed manner. We paid special atten-
tion to the syllables that were assigned one or two prominence-lending
pitch movements in the transcription o f ' t Hart, checking whether these
syllables showed corresponding F 0 movements and if so, collecting their
properties in tables. In this way we tried to formulate a set of explicit crite-
ria which a F 0 movement must fulfil in order to be acceptable as a rise or
fall, and similar sets of criteria for rises and falls separately that must be
met for a movement to be prominence-lending. Movements that are not
rises or falls are, by consequence, declinations; whether a declination is
high (0) or low (0) depends primarily on its being preceded by a rise or
not.
There are various ways to assess the quality of a set of rules that map F
movements onto a transcription of the intonation. An obvious possibility
is to compare the transcription by rule on a per movement or per syllable
basis with the transcription b y ' t Hart. For two reasons we did not settle
on this possibility. T h e first reason is that we had in mind the comparison
of the transcription by rule with stress scores by listener subjects as an ulti-
mate criterion; the other reason is that we do not know how to interpret
possible shifts by one or two syllables of non-prominence-lending move-
ments.
Assuming that all main stresses are pitch accents, we can easily derive a
classification of all syllables as stressed or unstressed from the transcrip-
tion b y ' t Hart. A similar classification follows directly from the stress
scores of the listener subjects. We therefore decided to use the number of
syllables correctly assigned prominence-lending or non-prominence-lend-
ing pitch movements by applying the rules as a measure for the quality of
the rules. We can then use a very simple formula, due to Miller (1969), for
the assessment of the quality, viz.:

2*Τ
Ο = J
V C + R

where Q ( 0 < Q < 1) is the quality measure, R the number of syllables as-
signed a prominence-lending pitch movement by our rules, C the number
of syllables stressed according to the criterion (a prominence-lending pitch
movement in the transcription o f ' t Hart or stressed according to the ma-
jority of the listener subjects), and J the number of syllables jointly classi-
fied as stressed by rules and criterion.

5.5. Description of the rules


Assuming that the speech material we deal with contains only movements
of types 0 and 0 (declinations), types 1 and 2 (rises) and types A and Β
Automatic Transcription 39

(falls), a set of rules for mapping F 0 movements onto perceptually relevant


pitch movements must accomplish three things:
a) it must distinguish between declinations and rises/falls;
b) it must classify rises as either type 1 or type 2;
c) it must classify falls as either type A or type B.
For distinguishing declinations from rises or falls we have mainly used
the excursion width of an F 0 movement and its slope. F 0 movements with
an excursion less than 20 % of the starting value in H z or with a slope of
less than 15% per 100 ms have been regarded as declinations. Purely vir-
tual movements (discontinuities in the F 0 curve) are taken to have a slope
of 100% per 100 ms, i. e. purely virtual movements are classified as rises,
falls or declinations on the basis of their excursion width alone. A declina-
tion is low (type 0) unless it is preceded by a rise, in which case it is high
(type 0. The intonation grammar ('t H a r t & Collier, 1975) excludes the
possibility of a fall occurring without preceding rise or high declination; it
also excludes the possibility of a rise of type 1 unless preceded by a fall or
a low declination (a rise of type 2 preceded by a high declination is, how-
ever, quite possible).
The most important criterion for the classification of pitch rises as
either type 1 or type 2 and the classification of falls as type A or type Β is
the position of the onset of the movement within the syllable. It has
proved to be advantageous to describe all timing information relative to
the mean duration of the syllables in a complete utterance. The mean syl-
lable duration is easily computed by dividing the number of syllables into
the total duration of the utterance (our material did not contain utterances
having conspicuous pauses within them).
In order for a pitch rise to be classified as being of type 1, then, it must
satisfy two criteria: Firstly, the preceding syllable must contain either a fall
or a low declination and, secondly, the distance between the onset of the
pitch rise and the beginning of the voiced part of the syllable may not ex-
ceed 30 % of the mean syllable duration. This means that all purely virtual
rises are considered as being of type 1. If either condition is not fulfilled,
the rise is classified as type 2.
Pitch falls must satisfy three conditions before they are classified as be-
ing of the prominence-lending type A. Firstly, their slope as defined in
Fig. 4 may not exceed 55% per 100 ms; note that this implies that virtual
falls can never be of type A. Secondly, the voiced part of the syllable con-
taining the fall must be at least as long as 80 % of the mean syllable dura-
tion; this implies that short syllables are virtually excluded as candidates
for carrying a fall of type A. Thirdly, the distance between the beginning
of the voiced part of the syllable and the offset of the fall must be greater
than 70 % of the mean syllable duration; this means that the fall must be
late in the syllable. Note that it is only possible for a pitch movement to be
a fall if the preceding syllable contains a rise or a high declination, but this
40 L. Boves, Β. L. ten Have, H. Vieregge

criterion applies equally to both types of falls. If a fall ist not of type A it
must consequently be of type B.
Note that the rules described so far preclude intonation contours tran-
scribed as Ό10Γ or '011' and '0ΒΑ' '0ΑΒ' or '0ΑΑ'. The intonation gram-
mar allows for contours that begin on a high declination (e. g. '00AO') but
utterances having an intonation like this do not sound neutral most of the
time. Therefore we have excluded this possibility in our system of rules.
We must, however, account for utterances with a rise of type 1 on their
first syllable, realized by means of a virtual F 0 movement, in fact by means
of an F 0 starting at a high level. This rather frequent situation necessitated
the introduction of an extra rule. The occurrence of ari intonation of the
form '00AO' will also invoke the application of that rule, thus incorrectly
assigning a rise of type 1 to the first syllable.
The classification of pitch movements proceeds from left to right
through the utterance. Having reached the end of the utterance, we return
to the first syllable in order to see whether the extra 'virtual rise of type Γ
rule applies. This extra rule states that the first syllable of an utterance
contains a virtual rise of type 1 if it reaches a pitch value at some point
which exceeds a value of 1.8 times the starting value of that rise of type 1
later on in the utterance which has the lowest starting point.

5.6. Results
The first criterion for the quality of the rules is, of course, the agreement
between the output of the rules and the transcription b y ' t Hart. The sec-
ond criterion for testing the quality of the rules derives from the fact that
assigning pitch movements to syllables implies a classification into two
categories, viz. stressed and unstressed syllables. This classification can be
compared with stressed-syllable scores by a number of subjects.
We have tested the quality of the rules in three different situations. The
first one was identical with the situation in the design set, i.e. weather
forecasts read by professional speakers. The second test situation con-
sisted of the reading of a magazine article by non-professional speakers
whereas the third situation, a spontaneous interview, differed consider-
ably from the design set. By comparing results obtained with different
speakers we can see whether or not the results are speaker-dependent.
As stated above, the comparison of the output of the rules and the tran-
scription b y ' t Hart will be confined to prominence-lending movements.
In the first situation, the weather forecasts, both design set and test set
are treated as homogeneous entities, despite the fact that the design set
consits of renderings by three different speakers and the test set of two.
The test set contains 432 syllables, of which 122 were assigned a promi-
nence-lending pitch movement by 't Hart, 106 received such a movement
from our rules (of which 101 were also considered stressed b y ' t Hart) and
89 syllables were stressed according to our panel of four subjects. From
Automatic Transcription 41

the 249 syllables in the test set 73 were judged stressed b y ' t Hart, 66 re-
ceived a prominence-lending pitch movement from our rules, and 68 were
considered stressed by our subjects. Using Miller's formula, we can com-
pute quality measures for all possible combinations of stress scores. The
results are shown in Table 3. It is surprising that the agreement is always
better for the test set than for the design set.

Design set Test set


(3 weather forecasts) (2 weather forecasts)
Rules compared
with't Hart .89 .94
Rules compared
with subjects .80 .85
Subjects compared
w i t h ' t Hart .83 .87

Table 3: Quality scores derived using Miller's formula for the agreement between several
pairs of stress-assignments for weather forecasts read by professional speakers.

5.7. Discussion
It seems worthwile to discuss the results of the first test in some detail. A
comparison of the transcription b y ' t Hart and the output of the rules has
shown that the rules are somewhat more successful in detecting promi-
nence-lending rises of type 1 than in finding prominence-lending falls of
type A. From the 21 syllables in the design set that are stressed according
t o ' t Hart, but not according to the rules, 10 apply to a missed A and 11 to
a missed 1. These figures should be compared with a total of 99 rises of
type 1, 4 falls of type A (of which 3 are wrongly assigned to syllables
w h e r e ' t Hart has transcribed a non-prominence-lending fall of type B)
and 3 instances of the combination l&A in the output of the rules. In the
transcription b y ' t Hart, we have 93 rises of type 1, 7 falls of type A, 11
combinations l&A, 3 times the combination A&2 and 6 times the - not ex-
pected - combination 5&A; the remaining two prominence-lending move-
ments were isolated rises of type 5 that could also be considered as some-
what high-starting rises of type 1 (according t o ' t Hart's comments).
Because falls of type A typically occur at the end of an utterance or
clause (which gave them the name "final fall"), missing them does not
strongly influence the overall transcription. Although we have no formal
means to compare the details o f ' t Hart's transcription and the output of
our rules, the correspondence seems to be quite good, at least except for
the falls of type A. In the test set our rules find 55 instances of a rise of
type 1, 3 falls of type A and 3 combinations l&A. The transcription b y ' t
Hart shows also 55 rises of type 1, 4 falls of type A, 10 combinations l&A,
42 L. Boves, Β. L. ten Have, H. Vieregge

1 instance of the combination A&2 and 3 times the combination 5&A. For
reasons that are not quite clear, our rules seem to be more successful in
finding falls of type A in the test set. This explains the slightly better
correspondence with the transcription o f ' t Hart.
As stated before our four listener subjects clearly assigned fewer stresses
t h a n ' t Hart, and also fewer than our rules. As the rules were designed to
mimic't Harts transcription as closely as possible, the latter result is not
surprising. We have not been able to find a simple and consistent explana-
tion for the details of the behaviour of our subjects. They are somewhat
reluctant to assign stress scores to adjacent syllables, an aversion which is
clearly not shown b y ' t Hart and - consequently - also not by our rules,
but this refusal to accumulate great numbers of stress scores in small
stretches of speech cannot explain the whole discrepancy. The behaviour
of our subjects is, by the way, in good correspondence with results of
Bouwhuis (1973), who found that subjects are not likely to consider more
than approximately 25% of the syllables in a neutrally read text as
stressed. This suggests that a trained phonetician must be expected to hear
more stresses than a naive listener. Because we are dealing with a system
which treats stress as a binary feature, we cannot say whether or not this is
due to a greater chance of weak stresses being heard by a trained subject.
It must be kept in mind t h a t ' t Hart did not give stress scores like our
small panel of subjects. Instead, he gave a transcription of the pitch con-
tours of the utterances, from which we have derived stress assignments.
Given the extraordinary capability of this single subject to correctly de-
scribe the pitch movements we cannot exclude the possibility that he has
transcribed a number of pitch movements as prominence-lending because
their physical properties were in accordance with those of 'standard prom-
inence-lending movements' whilst other parameters usually contributing
to the perception of prominence had values typical of non-prominent syl-
lables. Such a selfcontradictory set of parameter values could have re-
strained our panel from considering a syllable as stressed, but n o t ' t Hart.
It is interesting to note that there are virtually no syllables called
stressed by our panel but not b y ' t Hart. This means that the professional
speakers under study do indeed use pitch as the primary cue to signal
stress; virtually all accents are pitch accents, or, at least, also pitch accents.
This statement cannot be refuted by the hypothesis that 't Hart hears
stresses first and then assigns a prominence-lending pitch movement to
the stressed syllables, because of the very good correspondence between
his transcription and the output of our physically based rules.

5.7. Testing the rules in less favourable situations


The results obtained thus far made us accept the comparison of the output
of our rules with stress scores from a small panel of subjects as a meaning-
ful criterion for the quality of the intonation transcription derived by ap-
Automatic Transcription 43

plying the rules to registration of F 0 , intensity and oscillogram. We de-


cided to use that criterion to test whether the rules would also apply to
texts read by non-professional speakers and to spontaneous speech. The
results are summarized in Table 4.

Reading a Spontaneous
magazine article speech
text 6 text 7 text 8 text 9
(LB) (TR) (LB) (TR)
total number of syllables 240 240 174 216
number of syllables stressed ac-
cording to rules 47 54 17 35
number of syllables stressed ac-
cording to subjects 38 49 31 36
quality of the rules computed
by means of Miller's formula,
using subject's scores as crite-
rion .81 .93 .66 .93

Table 4: Summary of the results for the tests with non-professional speakers

For the readings of a magazine article the figures show the pattern
which is by now familiar: whilst the overall correspondence between the
output of the rules and the stress scores of the subjects is reasonable, the
rules are more liberal in assigning stress judgments than the subjects. For
speaker LB the quality of the rules can be said to be fair, for speaker T R it
is excellent.
If we turn to the spontaneous speech, the picture changes dramatically.
For both speakers the subjects consider more syllables stressed than the
rules, for speaker LB, indeed, almost twice as much. According to the
scores of the subjects both speakers use slightly less than 20 % stressed syl-
lables, not an unusual figure for spontaneous speech.
The F 0 curves in the spontaneous speech of speaker LB are virtually
flat, confirming that his speech, although quite normal after all, is almost
monotonous. Nevertheless it appears to contain clearly perceptible ac-
cents, that must have been signalled by otherwise concomitant parameters
as duration and loudness. The role of those parameters in cueing stress
has been confirmed in an informal experiment. Three utterances taken
from the spontaneous speech of LB were processed by means of an LPC
vocoder modified to become an Intonator. From all utterances a number
of different versions were produced, among which an original (standard
LPC vocoder action), a fully monotonized version and two or more ver-
44 L. Boves, Β. L. ten Have, H. Vieregge

sions having prominence-lending pitch movements on syllables called


stressed by the subjects but not by our rules. Neither treatment has a strik-
ing influence on the perception of stress. This is in accordance with the re-
sults of a more elaborate experiment described in Rietveld and Boves
(1979).

6. Conclusion

The ultimate aim of our research was to prove that it is possible to develop
a set of rules for converting registrations of F 0 , intensity and oscillogram
of normal Dutch speech into a transcription of the intonation according
to the system due t o ' t Hart et al. Characteristic of that system is that it
describes the pitch of utterances as moving between two levels, while
transitions from one level to the other will give rise to the perception of
stress in some well-defined conditions and will not do so in all other con-
ditions. Considering only two levels, it is clearly the movements which are
important, not the levels.
We have succeeded in developing those rules. Using the twelve percept-
ually relevant pitch movements (or rather a subset thereof) as a back-
ground it is possible to stylize and describe the raw F 0 tracings in terms of
one or two movements per syllable.
The output of the rules appeared to agree quite well with the transcrip-
tions made by an expert transcriber. Also it appeard to explain the stress
scores given by a small panel of subjects, at least as long as neutrally read
texts were considered. This must be taken to prove the conjecture that un-
der normal conditions specific pitch movements are the most important
factor in signalling prominence in Dutch.
The rules proved to be able to explain the stress scores of our subjects
in the spontaneous speech of one speaker too. For the spontaneous speech
of a second speaker they failed almost completely. This speaker, who is
judged as normal although rather exceptional, seems to shift to another
strategy for signalling prominence if he changes his style of speech from
neutral reading to conversation. The very fact that he is considered as ex-
ceptional and that the rules perform fairly well even for him when read-
ing, suggests that at least for more or less formal Dutch it is possible to
derive a meaningful description of intonation by means of applying a sim-
ple set of rules to registrations of F 0 , intensity and oscillogram.

Acknowledgement

This research was supported by the Dutch Organisation for the Advance-
ment of Pure Science (Z.W.O.).
Automatic Transcription 45

We are specially indebted to J. 't Hart, who put much time and effort
not only into making numerous transcriptions of not very exciting texts
but also into fruitful and above all encouraging discussions.

References

Atkinson, J. E. (1976). Inter and intra speaker variability in fundamental voice frequency.
J. Acoust Soc. Am. 60: 440-445.
Bouwhuis, D. (1973). Het klemtoonoordeel, persoonlijk of eenstemmig? (Stress-judgements,
personal or unanimous?). IPO Rapport 253 (in Dutch). Eindhoven.
Cohen, A. & ' t Hart, J. (1967). On the anatomy of intonation. Lingua 19: 177-192.
Collier, R. (1970). The optimum position of prominence lending pitch rises. IPO Annual
Progress Report 5: 82-85.
Collier, R. (1975). Physiological correlates of intonation patterns. J. Acoust. Soc. Am. 58:
249-255.
Crystal, D. (1969). Prosodic systems and intonation in English. Cambridge: Cambridge Uni-
versity Press.
't Hart, J. & Cohen, A. (1973). Intonation by rule - a perceptual quest. Journal of Phonetics 1:
309-327.
't Hart, J. & Collier, R. (1975). Integrating different levels of intonation analysis. Journal of
Phonetics 3: 235-255.
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika 32: 241-254.
Katwijk, A. F. V. van (1974). Accentuation in Dutch. Assen: van Gorcum.
Lieberman, P. (1965). On the acoustic basis of the perception of intonation by linguists.
Wordy. 40-51.
Miller, G. (1969). A psychological method to investigate verbal concepts. /. Math. Psycholo-
gy 6: 169-191.
Rietveld, T. & Boves, L. (1979). Automatic detection of prominence in the Dutch language.
Proceedings Inst, of Phonetics, Nijmegen 3: 72-78.
Romportl, M. (1974). Zur Synthese der Intonation. In: Romporti, M. & Janota, P. (Eds.),
Acta Universitatis Carolinae, Philologica, 2/1974 ( = Phonetica Pragensia IV): Prague.
15-29.
Rossum, N. van & Boves, L. (1978). An analog pitch-period extractor. Proceedings Inst, of
Phonetics, Nijmegen 2: 1-17.
Vögten, L. L. M. & Willems, L. F. (1977). The formator, a speech analysis-synthesis system
based on formant extraction from linear prediction coefficients. IPO Annual Progress Re-
port, 2: 47-54.
Willems, L. F. The intonator (1966). IPO Annual Progress Report, 1: 123-125.
DAVID BRAZIL

The Intonation of Sentences Read Aloud

It is a matter of common observation that no two readers, given the same


printed text to read aloud, can usually be expected to make identical in-
tonation choices. Indeed, received notions tend to emphasise the speaker's
interpretative role in making - whether voluntarily or not - intonation de-
cisions that stamp the resulting performance as to a significant extent his
own creation. While recognising the freedom the speaker undoubtedly
has, it seems worthwhile nevertheless to seek some definition of the con-
straints within which this freedom is exercised. A reader cannot do just
anything if he is to produce an intelligible spoken realisation of the text:
we all recognise an 'inappropriate' intonation which makes nonsense of
what is being read; indeed we often claim that it shows in some way that
the reader has not grasped its import. The investigation reported here 1
was undertaken as a first step towards identifying the constraints a text
imposes on a reader by examining differences and similarities between vo-
cal renderings of the same printed material.
Interpretations - as well as the necessary transcription conventions - are
along the lines proposed in Brazil (1975 & 1978) and in Brazil, Coulthard
and Johns (1980). That is to say, the significance of intonation is related
to the function of the utterance as an existentially appropriate contribu-
tion to an interactive discourse. It is assumed that a speaker's choice, in
each of a set of options associated with the tone unit, reflects his appre-
hension of the unique discourse conditions then obtaining. In this paper I
shall confine my attention to choices of tone.
At first sight it may seem that mapping such a description into reading
performances will present difficulties. In the model of the communicative
process it subsumes, speakers are thought of as making successive, real-
time, decisions on the basis of their there-and-then assessment of the state
of convergence existing between them and a hearer. The act of reading
aloud does not obviously replicate all the situational factors that might af-
fect those decisions. Nor can the significance of the time dimension be
conceived of in the same terms: the processing of a written text in time dif-

1
The research on which this paper is based was carried out as part of a programme spon-
sored by the Social Sciences Research Council during the period 1975-78. The paper is a
revised version of one of seven that were appended to the Final Report on that programme.
Sentences Read Aloud 47

fers in important ways from the real-time generation of spontaneous and


hearer-sensitive speech. The fact that the model does enable us to make
predictions about regularities among readers is, however, one of the more
compelling reasons for thinking that reading is properly viewed as an in-
teractive process. One feature of the model - and one which helps to fo-
cus this particular issue - is the concept of orientation. It enables us to pre-
dict certain of the intonational features that characterise a purely
mechanical act of reading out, and so to distinguish such an act from the
presentation of an otherwise similar utterance when it contributes rele-
vantly to an interactive discourse.
A question which had to be faced at the beginning of the investigation
was what kind of material would produce the most revealing results, bear-
ing in mind the considerable openness of the whole area. The context of
interaction to which speakers address themselves is partly a construct of
the particular discourse as it has proceded up to the moment of utterance;
to the extent that it is, the analyst can expect to find the motivation for
particular intonation choices in the context. One might expect, therefore,
that the reader's choices late in a passage of continuous prose could be
manipulated by making appropriate alterations earlier in the passage. But
the context of interaction incorporates also features of the interpenetrat-
ing biographies of the participants, to say nothing of culture-based under-
standings of wider currency. It is hard to predict or control what evocative
force an extended text will have for individual readers. In general, elabor-
ations of a text which are designed to 'force' a particular intonation choice
at a particular point seem to increase the scope for differential contextuali-
sation of this more nebulous kind, and therefore to reduce the chances of
there being any significant consensus among readers. It was for this rea-
son that I returned to earlier work on the reading aloud of uncontextual-
ised sentences.
Brazil (1972) reports the results of experiments which were designed to
investigate the degree of conformity to be expected when native speakers
were asked to read aloud a sentence that was presented to them as a cita-
tion form. The results of small-scale testing suggested, among other
things, that particular changes - whether lexical or grammatical — in the
sentences presented for reading aloud had fairly predictable effects on the
readings elicited. So, for instance, while the presence of an indefinite arti-
cle in the subject nominal group of a clause tended to elicit a falling tone,
substitution of the definite article inclined readers towards the use of the
fall-rise tone:
/ / p a po/iceman / / (had taken him away)
/ / r the po/iceman / / (had taken him away)2

2 The original transcriptions were made in accordance with the conventions used by Halli-
day (1967), but for the sake of consistency I have used my own conventions here.
48 D. Brazil

Similarly, certain adverbials elicited a falling tone while others had a fall-
rise:
/ / ρ in any case / / (you'd better go)
/ / r in that case / / (you'd better go)
In interpreting these results, it was assumed that any attempt to character-
ise the significance of such co-occurrences of lexico-grammatical choice
and intonation choice would be set in a theoretical framework which han-
dled the semantics of the sentence as a complete and separate event. Sub-
sequent work on discourse intonation has made it possible to reinterpret
these earlier findings in a different way. It does not seem proper to assume
that, because sentences are presented as isolated entities in test material,
they are necessarily processed as such by readers. Indeed, it is an important
implication of the approach that only by adopting an oblique stance can a
reader avoid making decisions about intonation choices that presuppose
some view of a putative context of interaction for the item he is reading.
The application of this principle illuminates the readings of the sentences
quoted above. In the case of
/ / r the po/icemen / / (had taken him away)
the presence of the definite article in the subject could well be expected to
encourage a reader to behave as if its referent were somehow present to
the consciousness of a hearer and so available for reference. There would
be no corresponding encouragement to regard A policeman . . . in this
light, a fact which one might connect with the tendency of readers to not
assign referring tones to subjects which have indefinite articles. Alongside
these two we may place sentences like 'Policemen were taking him away',
where the fact that readers show no clear preference for either referring
or proclaiming tone can be connected with the absence of anything to sug-
gest that the referent is, or is not, present in the common ground of the
context of interaction.
We may ask what significance attaches to the fact that, in some but by no
means all cases, it is possible to predict certain intonation choices a reader
will associate with an item presented to him for reading aloud. A working
hypothesis would be that whenever there is general agreement among
readers about such choices (as, for instance, about the assignment of refer-
ring tone to a certain constituent) it is because the presented item is
strongly evocative of just those contextual features on which the choice
depends. The possibility then suggests itself that pairs of sentences might
be constructed, members of each pair being contrasted with respect to a
single lexical or grammatical feature in such a way that:
(i) they will elicit consistently contrasted readings with respect to one of
the phonological oppositions a description postulates; and
(ii) it will be possible to say with some measure of confidence just what
are the imputed discourse conditions that motivate the different re-
sponses.
Sentences Read Aloud 49

Method

Two lists of sentences were prepared, those in the first list differing from
those in the second with respect to some feature which it was expected
would result in different choices of tone. Informants were told that the ex-
perimenter was interested in discovering how they would 'normally' read
aloud certain sentences and what degree of conformity one could expect
among readers. They were asked to read aloud each sentence from the
first list, having first read the whole list silently to themselves. Other read-
ing material was then provided for similar treatment - a prose passage and
a piece of dialogue required for a slightly different experiment - in order
to reduce the danger of remembered patterns influencing informants' per-
formances when they were asked to read sentences from the second list.
They were warned that they would probably be aware of minor differ-
ences between otherwise similar sentences in the two lists, a precaution
which was found to reduce the distorting effects of contrastive intonation
in second members of pairs. The readings of both lists were recorded and
subsequently transcribed.
To be useful, pairs of sentences had to satisfy two conditions.
(i) There had to be a difference in the intonation treatment of the two
members.
(ii) At least one member of the pair had to elicit a high measure of agree-
ment as to the intonation treatment of the differentiating feature.
So, for instance, having established that readers regularly select r tone
for a certain constituent of one member of a pair, one might propose a
discourse explanation of the fact that in the other member ρ tone is se-
lected regularly for the same constituent, or of the fact that in the second
member ρ tone or r tone appears to be selected indiscriminately.
The testing procedure was planned with a view to assembling a battery
of pairs which satisfied the above demands. Anticipating which sentences
would elicit agreed readings of a particular kind proved to be a difficult
business, and a large number of items that were tried out had to be aban-
doned. After I had obtained five readings of the first pair of lists I ex-
amined the results, so that items which were evidently not going to be use-
ful could be eliminated and replaced by others in new lists. The process
was continued until patterns of similarities and differences had been main-
tained with reasonable consistency through twenty readings of any parti-
cular pair.
It was evident that consistency was dependent on at least two variables:
the ease with which relevant features of a hypothetical discourse setting
could be 'recovered' from the printed item; and the reader's skill and ex-
perience in reading aloud. The filtering procedure I used was taken to
have resulted in the selection of items in which relevant contextual fea-
tures are relatively accessible to a reader. With regard to the second var-
50 D . Brazil

iable, informants were all people who could be expected to have a higher-
than-average level of verbal facility — language teachers and students of
English at University. By using a selected population I sought to reduce
the effect of differential skill to a minimum. There is clearly scope, once a
set of 'norms' has been established, for investigating differences among
readers, including non-native readers, but the immediate focus of interest
was on the construction to be placed upon differential readings of pairs of
sentences when readers were agreed in producing them. What is taken as
'agreement' is made clear in relation to each pair of sentences as it is dis-
cussed. Different sets of informants were used for each revised list of
items, so that the analysed results of readings of 61 sentences reflected the
performances of a total of 50 different readers.

Note on the Use of Transcription Conventions


Transcription conventions are intended primarily for use in connection
with the individual utterance. It is an extension of their use to represent
what is common to a number of utterances. The peculiar conditions of the
present investigation make it necessary often to indicate that a set of utter-
ances are similar in respect to one feature (the choice of tone in a specified
constituent) while possibly differing in other respects. The conventions
used make it possible to represent the feature which, at any point in the
discussion, is the object of interest, while implying nothing about concom-
itant regularities or irregularities. Thus when we represent the tendency of
readers to agree upon the reading of part of a sentence as
/ / r what you want / /
we mean no more than that a tonic segment somewhere in the tone unit
has referring tone. Actual readings brought together by this generalisation
might include:
/ / r WHAT you want / / ,
/ / r what you want / / , and
/ / r what you want 11
Although the distinctions that these versions represent have to be taken
into account in a full description, the generalised representation is to be
taken as indicating that they are not pertinent to the point presently under
consideration.
Two special instances of generalisation can be singled out for mention.
I sometimes wish to make statements about the incidence of referring
tones without specifying whether they are realised as r tone ('fall-rise') or
as r + tone ('rise'). In such cases,
/ / R what you want / /
is to be taken as conflating readings which might be distinguished as
/ / r what you want / / and
/ / r + what you want / /
Sentences Read Aloud 51

Similarly, the symbol Ρ is used to include occurences of ρ tone and p +


tone, though the experiment did, in fact, elicit too few occurrences of ρ +
tone for this to be a matter of any substance.
The other comment concerns the use of the boundary symbol. Readings
which are otherwise similar sometimes differ with respect to the exact lo-
cation of a tone unit boundary: differences arise because of the distribu-
tion of non-prominent words, so that they occur either in the enclitic seg-
ment of one tone unit or in the proclitic segment of the unit following.
Arbitrary regularisation of such variant forms can avoid proliferation of
cited forms without in any way affecting the issues I wish to attend to. So,
/ / what you want / / is a new car / /
includes cases of
/ / what you want is / / a new car / / and
/ / what you want is a / / new car / / ,
where neither is nor a is prominent.

Orientation and Tone Selection

The description postulates a feature orientation which affects, among


other things, the speaker's choice of tone. An oblique set towards the test
item, that is to say, the regarding of it as an uncontextualised sample of
language, results in the reader's interpreting the instruction to read it
aloud as an invitation simply to tell the experimenter what the printed item
says. The operation of the P / R system will then, I predict, result in the
choice of proclaiming tones throughout, or (in the case of lengthy items)
possibly of 'level' (or o) tones. It is central to the hypothesis that R tones
will occur only when a speaker/reader is making assumptions about the
state of convergence with a (possibly imaginary) hearer. Thus, while pro-
claiming tones have equivocal status, making it impossible for an observer
to say whether a reader is treating the sentence he is reading as if it were
part of a communicative event or not, we can set up a very firm hypothesis
that whenever referring tones occur, assumptions are being made about the
function of the utterance in a possible context of interaction. We can then
seek to establish what those assumptions are and how they are derived
from the printed item.
An example will serve to show how this method works in practice. Sen-
tence (1) elicited at least one reading like each of the following:
/ / Ρ what you want is a new car / /
/ / Ρ what you want / / Ρ is a new car / /
/ / R what you want / / Ρ is a new car / /
Our procedure allows us to regard only the last of these three versions as
positively meshed with some rudimentary context of interaction. The
others may result from orientation towards the language sample. We can,
52 D. Brazil

however, say, that when R tone occurs in readings of this particular sen-
tence it is always associated with 'what you want'. An important consider-
ation in constructing examples to investigate tone choice was to find sen-
tences in which most readers did select R tone at some point. In this
particular case, sixteen out of the twenty readers satisfied the condition.
The analysis which follows begins with an examination of the elicited
data.

Analysis

I The distribution of referring tone


It helps to clarify the presentation if we distinguish four kinds of relation-
ship between changes in the read sentence on the one hand and readers'
selections of R tone and Ρ tone on the other. Sets of pairs which exemplify
each type of relationship are presented under 1 to 4 following.

1. When readers associate a referring tone with one of two tone units in a sen-
tence, a change in the order of the constituents results in a corresponding
change in the distribution of tones.

Example:
(1) / / R what you want / / Ρ is a new car / /
(2) / / P a new car / / R is what you want / /
Interpretation: When either of these items is presented in isolation the
thematisation arrangements exert a powerful influence on the reader to
treat 'what you want' as an assumed focus of interest and 'a new car' as
matter introduced as new into a hypothetical context of interaction. Sen-
tence (2) can be interestingly compared with the very different contextual
implications of
(3) / / R but a new car / / Ρ is just what you want / /
where the situation reflected in the tone choice is one in which the possi-
bility of someone having 'a new car' must have been introduced into the
conversation in order for it to be rejected.
Measure of agreement:
Sentence (1) : 18 informants assigned R tone to the first tone unit
Sentence (2) : 17 informants assigned R tone to the second tone unit
The remaining five readings had Ρ tone in both tone units and would thus
be explicable in terms of our postulated oblique orientation, the reading-
out process being carried out without reference to any reader-hypothesis
as to the sentence's interactive implications. There are no counter-exam-
ples, among the 40 readings examined, to our prediction that when R tone
is selected it will occur in a certain constituent.
Other pairs of sentences in which the ordering of constituents seems to
Sentences Read Aloud 53

be conclusive in evoking a hypothetical discourse setting for each member


are:
(4) / / R what i can't understand / / P i s how he manages it / /
(5) / / Ρ how he manages it / / R is what i can't understand / /
(6) / / R the one who found it / / Ρ was Peter / /
(7) / / Ρ Peter / / R was the one who found it / /
(8) / / R apparently / / Ρ she couldn't manage it / /
(9) / / Ρ she couldn't manage it / / R apparently / /
(10) / / R if that's the best you can do / / Ρ i don't want it / /
(11) / / Ρ i don't want it / / R if that's the best you can do / /
Interpretation: The R tone in (4), (5), (6) and (7) is associated with that
which, in the discourse setting, it is assumed needs specifying: for instance
'What I assume you want to know is what I can't understand . . .'. The Ρ
tone unit specifies it. In (10 and (11), 'If that's the best you can do . . .'
would fairly predictably make reference back to some kind of offer or
proposal, implicit or explicit. The use of 'apparently' in (8) and (9),
whether it implies any real qualification of the assertion or not, is nor-
mally an evocation of mutual understanding already arrived at.
Measure of agreement: Table 1 shows the incidence of R tone in the 5 pairs
of sentences examined so far.

Table 1: Incidence of Κ Tone

Sentences First sentence Second sentence


First t/u Second t/u First t/u Second t/u
1& 2 18 — — 17
4& 5 20 — 2 16
6& 7 19 — — 15
8& 9 17 — — 20
10 & 11 15 - - 17

Comment: The high measure of agreement among readers that Table 1


shows provokes three observations.
(i) Under certain circumstances there is a close relationship between the
significance of grammatical thematisation and the distinction that the in-
tonation system R / P realises.
(ii) At first sight it seems possible to interpret the results as suggesting a
direct relationship between grammar and intonation. It is easy to show,
however, that an explicit discourse setting could result in a quite different
tone distribution, as for instance in:
"I can't think why they gave the reward to Peter!"
/ / R Peter / / Ρ was the one who found it / /
54 D. Brazil

Grammatical considerations are taken to be conclusive for most readers in


these ten cases because they provide particularly strong clues as to a probable
discourse setting.
(iii) Even when, in cases like these, the conditions clearly favour a com-
mon hypothesis about what parts of the sentence make retrospective refer-
ence and what parts represent new assertion, there is not 100 % conformity
among informants. It is perhaps a useful rough guide to the degree of con-
formity we may expect when conditions are very favourable that of the
200 readings 174 (87%) had R tone in the 'expected' constituent. Most of
the others had no R tone. Indeed there are only two instances of R tone
occurring in the 'wrong' constituent.

2. When readers associate a referring tone with one of two tone units in a sen-
tence, a change other than a change in the order of constituents results in the R
tone occurring in a different constituent.

Example:
(12) // R if it rains // Ρ we shall expect you //
(13) // Ρ even if it rains // R we shall expect you //
Interpretation: Sentence (12) is treated rather like sentence (10) above. The
if-clause is assumed to represent a possibility already foreseen in a hypo-
thetical context of interaction and so available for reference. By contrast,
the 'Even if. . .' of (13) would usually introduce a possibility as though it
had hitherto not been considered. There would seem to be a strong possi-
bility that such a new consideration would be introduced in relation to
something that was already conversationally in play, a prediction we can
associate with the tendency for readers to associate R tone with 'we shall
expect you' in (13).
Measure of agreement: Sentence (12): 17 informants assigned R tone to the
first tone unit
Sentence (13): 10 informants assigned R tone to the second tone unit
There were no counter-examples, that is to say no readings having R tone
in the 'wrong' constituent. Two had an additional R tone in 'even'. Ex-
cluding these, and also those in which there is no occurrence of R tone
(and therefore, we would argue, no evidence of the matching of the sen-
tence with a hypothetical discourse setting) the presence or absence of
'even . . .' influences all readers in the same way.
Other examples that can be compared with the pair (12) and (13) are:
(14) // R she was trying to convince me // Ρ she was only twenty //
(15) // Ρ it was hard to believe // R she was only twenty //
(16) // R to tell you the truth // Ρ i don't like it //
(17) / / P i can assure you // R i don't like it //
(18) // R where it came from // Ρ is a mystery //
Sentences Read Aloud 55

(19) / / Ρ where it came from / / R is the mystery / /


(20) / / R the three strangers / / Ρ began to get worried / /
(21) / / Ρ even the three strangers / / R began to get worried / /
(22) / / R luckily / / ρ jfog W eather improved
r //
/ / Κ presumably
(23) / / Ρ admittedly / / R the weather improved / /
Interpretation: Sentence (14) seems to be treated as though making refer-
ence to some earlier question such as 'What was she doing?', 'What was
she telling you?' or 'What was she talking about?'. A context which would
motivate the alternative version, as for instance
"She wants everyone to believe she hasn't had her twenty-first birthday
yet!"
/ / Ρ yes / / Ρ she was trying to convince me / / R she was only twenty / /
requires some ingenuity to construct and is less likely to come to the
reader's mind. In (15), the assertion 'It was hard to believe . . .' would
usually occur in a situation where the referent of 'It' (subsequently repre-
sented by the clause 'she was only twenty') was a statement that had al-
ready been said to be, or assumed to be, true.
Sentence (16) would be expected to have R tone in the first tone unit be-
cause truth-telling is normally taken as a pre-established condition of dis-
course. Ί can assure you . . .' in (17) has Ρ tone because it is an assertion
that what follows is true, however much appearances may be to the con-
trary. Both expressions tend to be largely empty of content: they serve
principally to insinuate social solidarity or separateness, and it is this kind
of stereotyped usage that the readings seem to reflect.
The two sentences (18) and (19) differ in the choice of article in the last
nominal group. The reading (18) seems to be determined by thematisation
in a manner similar to that noted for (1) above. In (19) the presence of the
definite article suggests that 'mystery' is already conversationally in play,
and this overrides the effect of thematisation in determining which consti-
tuent is to be recognised as making retrospective reference.
In (20) the definite article in the grammatical subject encourages the
reader to treat it as referring to something present in the context of inter-
action. 'Even . . .' in (21) has an effect similar to that noted for (13): the
"three strangers" are treated as though newly introduced into the conver-
sation in spite of the definite article. Furthermore, 'Even the three stran-
gers . . .' implies previous mention of others who were worried, and so en-
courages treatment of '. . . began to get worried' as if having retrospective
reference.
In (22) both 'luckily' and 'presumably' would usually represent judge-
ments introduced as if shared by speaker and hearer. By contrast 'admit-
tedly' introduces a concession made by the speaker alone; that which is
conceded, namely that the weather improved, would often have been an-
ticipated in the conversation in some way.
56 D. Brazil

Table 2: Incidence of R Tone

Sentences First sentence Second sentence


First t/u Second t/u First t/u Second t/u
1 2 & 13 17 — — 10
14 & 15 15 - 1 16
16 & 17 19 — — 8
18 & 19 18 — 3 8
20 &21 17 — — 5
22 & 23 20 - 1 8

Measure of agreement .Table 2 shows the incidence of R tone in the six pairs
of sentences considered above.
Comment: The recording of three counter-examples for (19) can probably
be related to the fact that a number of informants failed to read this sen-
tence fluently at the first attempt. Hesitations, misreadings and self-cor-
rection are likely to be due to the opposing pressures of thematisation and
the late occurrence of a definite article.
The consistently lower figures in the last column of Table 2 suggest that
readers find it more difficult to contextualise an item in which the ex-
pected R tone is in the second of two tone units, and so tend to make a
noncommittal choice of Ρ tone in both tone units. It is evident from the
variation in the figures, moreover, that the relative probability of such
noninitial R tones occurring in different sentences could only be investi-
gated by a large-scale statistical study.

3. When some readers associate R tone with the first, but none with the sec-
ond, of two tone units, a change in the sentence affects the likelihood of the R
tone occurring.

Example:
(24) // R the parcel of books // Ρ was lying on the table //
(25) // R a parcel of books // Ρ was lying on the table //
Interpretation .-The presence of the definite article in the subject gives grea-
ter encouragement to the reader to regard 'the parcel of books' as being
conversationally present.
Measure of agreement:
(24) 17 informants assigned R tone to the first tone unit
(25) 2 informants assigned R tone to the first tone unit
In none of the 40 readings was R tone associated with the second tone
unit.
Sentences Read Aloud 57

A similar pattern of readings was elicited by the pair


(26) / / R surprisingly / / Ρ the match was cancelled / /
(27) / / R obviously / / Ρ the match was cancelled / /
Interpretation.'The treatment of 'surprisingly' reflects a common conversa-
tional assumption that surprise is a shared reaction: the hearer's agreement
that the event in question is surprising is customarily pre-empted. O b v i -
ously' used in a similar way can serve either to insinuate solidarity ('We are
agreed it could not be otherwise') or to emphasise the divisive aspect of
the relationship ( Ί should have expected you to realise . . . ' ) . It thus se-
lects either R tone or Ρ tone.
Measure of agreement:
(26) 19 informants assigned R tone to the first tone unit
(27) 6 informants assigned R tone to the first tone unit
In none of the 40 readings was R tone associated with the second tone
unit.

4. In cases where there is more variation among readers, regularity can be ob-
served in the way changes in the sentence affect the tone assigned to a particu-
lar word.

Example:
(28) / / R i HOPE he'll like it / /
/ / Ρ i hope 11 R he'll like it / /
(29) / / Ρ i woNDer whether he'll like it / /
/ / R i woNDer whether / / Ρ he'll like it / /
/ / R i wonder / / R whether / / Ρ he'll like it / /
In spite of the variation among readings of these two sentences, they can
be compared on the basis of the tone choice associated with 'like':
(28) / / R (i hope he'll) like it / /
(29) / / Ρ (i wonder whether he'll) like it / /
Interpretation: Sentences like (28) seem likely to present the question of
whether he'll like it as a matter already established as of common concern
among the participants in a conversation. It is assumed that the hearers
share the speaker's interest in whether he does or not. Whether he as-
sumes also that his hoping so is also shared determines whether 'hope' is
included in the same tonic segment, and since this depends on more spe-
cific knowledge of the situation it is to be expected that the intonation
treatment of 'hope' will be less predictable. Sentence (29) differs in that it
seems likely to occur either as thinking aloud, or as a y e s / n o elicitation
equivalent to ' D o you think he will like it?'. In either case 'like' would be
located outside the area of convergence between speaker and hearer, and
so would have Ρ tone.
58 D. Brazil

Measure of Agreement:
(28) 17 informants assign R tone to 'like'
(29) No informants assign R tone to 'like'.
Other pairs of sentences that can be compared on a similar basis are
(30) / / R (we can't afford) both of them / /
(31) / / Ρ (we can't afford) either of them / /
(32) / / R (i) could (have been mistaken) / /
(33) / / Ρ (i) couldn't (have been mistaken) / /
(34) / / R (he could) /»Oisibly (be right) / /
(35) / / Ρ (he couldn't) poisibly (be right) / /
(36) / / R (i shouldn't try to go in) august II
(37) / / Ρ (i should try to go in) august II
Interpretation: In (30) 'both' is likely to be a substitute for two things pre-
viously mentioned; but 'either' in (31) serves usually to correct a misappre-
hension that one or other can be afforded and is thus presented with Ρ
tone as a potential alteration of shared assumptions.
'Could' in sentences like (32) typically concedes a possibility already in-
troduced or implied by another speaker. In uttering it one would normally
be acknowledging the assumed possibility as a basis for proceeding.
'Couldn't', on the other hand, denies an implied possibility and so has Ρ
tone. Similar considerations apply to sentences (34) and (35), where the
relevant tone choice occurs in the (situationally redundant) word 'possi-
bly'.
(36) would normally make retrospective reference to the hearer's ex-
pressed intention of going in August, while (37) encourages the reader to
treat 'August' as if it were a (new) suggestion.
Measure of agreement: Table 3 shows the choice of tone in the word speci-
fied in each of the sentences (28) to (37).

Table 3: Tone Choice

Word First sentence Second sentence


R Ρ R Ρ
(28) LIKE 17 3
(29) LIKE - 20
(30) BOTH 16 3
(31) EITHER — 20
(32) COULD 14 —

(33) COULDN'T — 18
(34) POSSIBLY 18 2
(35) POSSIBLY — 20
(36) AUGUST 18 1
(37) AUGUST - 20
Sentences Read Aloud 59

Comment: The ten readings not accounted for in this table are those in
which readers did not assign a tonic syllable to the word in question. Of
the 83 occurrences of R tone, none occurred where Ρ tone would have
been 'expected'. On the other hand there are nine substitutions of Ρ tone
for the predicted R tone, a fact which is in line with our hypothesis that
failure to contextualise a sentence may result in orientation change and
consequent assignment of Ρ tone to any constituent.

Distribution of referring tone: summary


Of the 469 occurrences of R tone we have considered in this section, all
but 7 occur in a constituent that can be predicted.
In every case, it seems possible to relate the occurrences of R tone to
some sense of existential common ground (though, of course, the inter-
pretative procedure involved in making the relationship can never be en-
tirely proof against dispute).
Occasional substitutions of Ρ tones for expected R tones are consistent
with our hypothesis that oblique orientation (towards the language speci-
men rather than towards a putative hearer) can result in any constituent
being proclaimed. Interestingly, substitutions of R tone for Ρ tone are rare.
In the case of two sentence pairs (24 & 25) and (26 & 27) we have
found ourselves considering the relative likelihood of R tone occurring in
each of the members, a procedure that becomes more and more necessary
as the analysis proceeds.

II The distribution of tones r and r+


Our description postulates that r tone and r + tone are in complementary
distribution in that they are both associated with tone units whose matter
is presented by a speaker as common ground. They are held to be differ-
entiated with respect to their role implications, choice of r+ tone being
the prerogative of a speaker who, at the moment of speaking, is or aspires
to be the dominant party. Both of these claims can be examined in the light
of elicited readings.
In the last section, regularities were noted in the way the undifferen-
tiated category R tone was distributed by readers. The first claim referred
to above can be restated as a prediction that those discourse conditions
which lead to a choice of R tone in preference to Ρ tone will be equally well
satisfied whether the former is realised as r tone or r + tone. In other
words, where we have discovered a principle distribution of R tones we
may provisionally expect that either realisation will occur unless there is
some additional element in the test sentence which persuades the readers
towards choice of either the dominant or the non-dominant form. A pre-
ference for one or the other will be attributable to implications that are ac-
cessible to the reader as to his role relationship with a hypothetical hearer.
60 D. Brazil

Table 4: Tone choices in first and second tone units. Expected tones are boxed; sequence R/P was
expected in the upper group, P/R in the lower.

Sentence First t / u Second t/u


r+ r+

1
3
4
6
8
10
12
14
16
18
20
22
24
26
27
32
34
36
38
42
43
48
50
54
55
56
57
58
59
60
61

17 5 12
17 13 3
15 10 5
20 3 17
Sentences Read Aloud 61

Sentence First t / u Second t/u


r+ r+

11 17 11 6
13 10 4 6
15 16 13 3
17 8 Q
0
19 6 2
21 3 2
23 8
33 6 5 1
39 13 5 8
40 17 14 3
41 10 2 8
51 3 2 1

Table 4 was prepared to make possible a preliminary comparison of


these expectations with how our informants actually performed. It ana-
lyses the distribution of r, r+ and ρ tones in all those individual readings
which divided the sentence into two tone units and assigned ρ tone to only one
of them. Readings excluded by this criterion are:
(i) Those which had only ρ tones and ο tones, which might be attributable
to the reader's adopting an oblique stance. They would, in any case, shed
no light on the present question.
(ii) Those having fewer or more than two tone units. It follows from our
general contention that the more tone units a reader distinguishes in a sen-
tence, the more complex are the discourse implications of his reading. By ex-
cluding the small number of readings that do not concur in having two tone
units we are able to attend to the performances of informants whose re-
sponses (and therefore whose hypotheses about discourse setting) are suffi-
ciently similar to make the examination of differential choices illuminating.
We have also avoided unprofitable complication of the table by leaving
out all sentences that elicited fewer than five readings that satisfy these
two requirements. There would be little point in showing the distribution
of three variables among four cases.
The general pattern that emerges can be summarised as follows: with
the exception of the occasional odd reading (about 2.5% of all choices),
which it seems reasonable to suppose result from some performance lapse
on the part of the informant, r tone and r + tone commute with each other
but not with ρ tone. This is in line with our prediction that the discourse
conditions which lead one reader to select say, r tone for a particular con-
62 D. Brazil

stituent will be equally well satisfied in so far as they lead him not to select ρ
tone by another reader's choice of r+.
There is much variation among sentences in the way readers tend to
choose between r tone and r+ tone. There are a number of cases where
our group of informants has selected r tone exclusively. Perhaps signifi-
cantly, there are no sentences which lead to a similarly marked preference
for r + tone, a fact which is the more interesting in view of our deliberate
attempts to elicit r + tones for contrastive purposes (see below). In fact
there is no instance of r+ tone being associated with a particular sentence
by some informants and r tone being associated with the same sentence by
none. These observations are consistent with our working hypothesis that
a reader who sees cause to represent a particular constituent as having re-
ferring status will select r tone unless he also perceives role-implications in
the sentence of a kind that lead to him to assume dominance. That the lat-
ter implications are far less predictably set up by the prepared sentences
we have used is shown by the great variation in the proportion of r tone to
tone choices.
To try to show what motivates choice of r+ tone specifically, pairs of
sentences were constructed in which it was expected that opposing tenden-
cies (towards r and r+) could be associated with grammatical and/or lexi-
cal alternatives. It was found that (cf. also second part of Table 4):

5. Certain changes in the test sentence resulted in a marked increase in the


number of readers who realised R tone as r+ tone.

Example:
(38) / / R when we'd passed the traffic lights / / (we took the next street
on the left)
(39) / / R when you've passed the traffic lights / / (take the next street
on the left)
Interpretation: Sentence (39) would be most likely to occur when the
speaker was giving directions, an activity in which it is customary to as-
sume dominant role. There is a sense in which the enquirer accepts the 'in-
ferior' conversational role by the simple act of enquiring, and the other
party responds by adopting the role of one who, for the time being, ex-
pects to be listened to. Sentence (38) could be a response, from a non-
dominant position, to a request for an explanation of how the speaker
reached a certain destination. It could, for instance, be part of an apology
for having lost one's way and arrived late for an appointment.
Comment: This pair of sentences illustrates a problem that seems to be in-
separable from an attempt to use the sentence-citation technique to inves-
tigate this particular tone contrast. We have already noted that r tone is al-
ways likely to be substituted for the 'expected' r+ tone. Eight readers
actually preferred it in the case of (39). Moreover, our prediction that (38)
Sentences Read Aloud 63

Table 5

Realisation of R tone
r r+
38 15 4
39 8 11

will suggest non-dominant behaviour to the reader is only partly reliable.


An alternative would be to treat it as part of an anecdote, the speaker as-
suming dominant role as story-teller. In this, and in other pairs of sen-
tences examined in this section, we can make no stronger claim that
changes in the presented item result in a marked reversal of the order of
preference, and the change is in the direction our hypothesis would pre-
dict. It is not really surprising that the situational factors which determine
role relationships are less easily invoked than those which determine selec-
tion in the r/ρ system. As a further illustration of the kind of indetermin-
acy we must expect, we can consider the readings of a sentence included to
test our expectation that story-telling would be an activity in which a
reader would most readily be persuaded to assume dominant role. We pre-
dicted r+ tone in the first tone unit of
(64) / / R a great while ago / / (there lived a giant)
Actual recorded readings are shown in Table 6.
Table 6

r r+
64 6 14

Other pairs of sentences that can be compared on a similar basis to (38)


and (39) are:
(40) (i wouldn't dream of doing it) / / R actually / /
(41) (you mustn't try to do it) / / R actually / /
(42) (i don't know whether) R he'll (come or whether she will)
(43) (i don't know whether she) R can (or whether she can't)
(1) / / R what you want / / (is a new car)
(3) / / R but a new car / / (is just what you want)
(8) R apparently / / (she couldn't manage it)
(9) (She couldn't manage it) / / R apparently / /
Interpretation: The imperative implication of the modal verb in sentence
(41) leads many readers to select the r + alternative for 'actually'. The
64 D. Brazil

speaker assumes the dominant role of one who determines what the other
shall do, an implication that is absent from (40).
Adding the logically redundant second part of '. . . whether he can or
whether he can't' would seem generally, because of its suggestion of
something like truculence or irritation, to be the prerogative of the 'supe-
rior' party to a conversation, or one who can properly assert superiority.
By contrast, 'he' and 'she' in sentence (42) are non-transparent alternatives
and require no special role in the interaction to make their use acceptable.
The inclusion of 'but' and 'just' in (3) has already been shown to result
in a different distribution of R and Ρ tones from (2) Ά new car is what you
want'. We may now note that, by suggesting an act of contradiction or dis-
pute, they incline the reader towards choice of the dominant option in an
initial R tone unit.
The differential treatment of initial and final 'apparently' is slightly less
easy to explain. If 'apparently' is regarded as being largely social in its im-
plications, as insinuating a generalised sense of solidarity rather than qual-
ifying the associated assertion, then it seems appropriate that an assertion
followed by the consolidatory gesture should be placed by a reader in a
context of assumed speaker superiority. When the gesture precedes the as-
sertion, it seems more tentative and accommodating. To put it another
way, a relationship which leads one to invoke social common ground be-
fore a statement might be judged more deferential than one which permits
adding on the social gesture as an afterthought. The treatment of sen-
tences (8) and (9) can usefully be compared with readings of
(53) Tom's got one, hasn't he.
Like other assertion + tag sentences, this includes among its possible reali-
sations:
(a) / / R tom's got one / / Ρ hasn't he / /
(b) / / Ρ tom's got one / / R hasn't he / /
In (a) the speaker introduces the assertion as if it were common ground
and then asks to have it confirmed as one who does not know. In (b) the
assertion is made as if from a vantage point outside the area of assumed
convergence and the hearer is then asked to agree that it is now common
ground. Readers who followed the pattern of (b) reflected the fact that
Table 7: Readings of sentence 53

Realisation of R tone
Sequence r r+
R/P 6 _
P/R - 7
Others 7
Sentences Read Aloud 65

the second sequence of events is usually characterised by social dominance


by realising R tone as r + . The roughly equal number who followed the
pattern of (a) selected r.
Measure of agreement: Table 8 compares the realisation of R tone in all the
pairs of sentences discussed in this section.

Table 8: Realisations of R tone

Sentence r r+
38 15 4
39 8 11
40 16 4
41 3 14
42 17 1
43 10 9
1 11 7
3 4 11
8 17 3
9 3 17

Comment .With the exception of the sentence pair 42/43, the alteration in
the presented item results roughly in a reversal of the proportion of read-
ers selecting tones r and r + . The figures for sentence (43) indicate an up-
setting of the very strong preference for r tone in (42).

Distribution of tones r and r+ : summary


Examination of the way tones ry r + and ρ are distributed in all those sen-
tences that our informants read as two tone units, supports the claim that
they recognise a primary tone choice RJP and that they select tones r and
r + as alternative realisations of the R term. It also supports our hypothe-
sis that r tone has less restricted currency than r + tone: that the former is
an option available to all speakers while use of the latter presupposes a
privileged role in a non-symmetrical interaction. This relationship be-
tween the two makes it difficult to invent pairs of sentences which will
show positively the connection between tone choice and the social attri-
bute of dominance, since readers who recognise an implication of domi-
nance in a sentence presented for reading may still, with propriety, select
the non-dominant option. Moreover, the factors which would determine
the relevant aspect of role relationship in the case of a real utterance must
66 D. Brazil

often be present in the wider (non-textual) context. In many non-symmet-


rical speech situations, for instance those involving teacher and pupils,
doctor and patient, chairman and committee or story-teller and audience,
the occupant of the dominant role is more-or-less fixed for the duration
of the speech event. It is not surprising, therefore, that we have difficulty
in eliciting comparable dominant and non-dominant readings by manipu-
lating the lexis and grammar of sentences. Nevertheless when we have suc-
ceeded in inventing sentence-pairs in which the majority of readers display
a preference for different tones, the change in preferred tone is always in
the direction we should predict.

References

Brazil, D. (1972). An investigation of the relationship between intonation and grammar in the
reading aloud of citation items. Unpublished M. A. Thesis, University of Birmingham.
Brazil, D. (1975). Discourse intonation. Discourse analysis monographs 1. University of Bir-
mingham.
Brazil, D. (1978). Discourse intonation II. Discourse analysis monographs 2. University of Bir-
mingham.
Brazil, D., M. Coulthard, and C. Johns (1980). Discourse intonation and language teaching.
London: Longman.
Halliday, M.A.K., (1967). Intonation and grammar in British English, The Hague: Mouton.
ALAN CRUTTENDEN

The Relevance of Intonational Misfits

Any theory of context has to make a basic division between linguistic con-
text and situational context. For example, Halliday (1967:23) has a basic
division between two types of 'given' elements: an element 'has either been
mentioned before or is present in the situation'. An element that has been
mentioned before is typically referred to by anaphoric reference involving
the use of a pronoun or the definite article or by what Ladd (1979:105)
calls 'default accent', i. e. moving the accent forward away from its most
common position on the last lexical item, e. g.
(1) JOHN didn't complain about the result.
But such formal features (pronouns, definite article, default accent) do
not only refer to elements that have been mentioned before verbatim.
They can also be used in cases of 'associative anaphora' (Hawkins, 1978;
Cruse, 1980), e.g.
(2) (Bill didn't achieve his aim) but JOHN succeeded.
The other type of context, being 'present in the situation', usually refers to
the immediate situation, e.g.
(3) [Taking a snake out of a bag.]
That's a POISONOUS snake.
But anaphoric features may possibly refer to a wider situation, e. g.
(4) (Did anything happen while I was out?)
THE DOCTOR called.
In this example the use of the definite article constitutes one anaphoric
usage and the occurrence of a non-final accent on 'doctor' is a type of
usage which has also often been explained as a type of anaphora. 'Call' is
what doctors typically do and is therefore said to be 'culturally given' or
predictable (Bolinger, 1972). There are however other explanations of this
type of example, which explain it either in terms of nouns being innately
more accentable than verbs (Bing, 1979; Ladd, 1979), or in terms of a spe-
cial pattern for 'event' sentences (Allerton and Cruttenden, 1979; also cf.
Fuchs, 1980). Any influence of wider situational expectations on accentua-
tion (and intonation generally) is uncertain.
In addition to textual/associative anaphora and immediate/?wider situ-
ation, there are two additional types of context which are specially rele-
vant to intonation. Firstly, any immediately preceding intonation is rele-
vant. For example, in English the low falling tune is one of the most
68 A. Cruttenden

undemonstrative, unexpected tunes and used as a response to a very ex-


cited tune (for example, a high fall) will be at the least 'irritating':
(5) (I got a distinction.)
^ i d you?
Secondly, the lexical items and grammatical structure with which the in-
tonation cooccurs will be relevant. A low rising tune on a sentence non-fi-
nal intonation-group will be less attitudinally marked than on a final in-
tonation-group; or a high falling tune on a 'reversed polarity' tag will
sometimes indicate disappointment, sometimes pleasure, depending on the
lexical items in the sentence, e. g.
(6) They haven't re'membered / "have they?
(7) They're not at "all worn out / "are they?
Thus the types of context which may interact with intonation are as fol-
lows:

Preceding Co-occurring
(a) intonation (c) lexis/grammar
(b) lexis (d) immediate situation
?wider situation

Different types of context seem to be differentially important to different


aspects of intonation. The division into intonation-groups seems to inter-
act particularly with grammatical structure: a separate intonation-group is
often given to the subject of a clause when it is post-modified, and to sen-
tence adverbials, e.g.
(8) The first man on the moon / was Neil Armstrong.
(9) Unfortunately / he didn't know how to do it.
T h e placement of the main accent on the other hand (often called the
'nucleus' or the 'tonic') seems to depend particularly on preceding lexis
and on the immediate situation, as illustrated in examples (1) to (4) above.
Finally the meaning of any one tune on any one occasion seems to depend
particularly on the immediately preceding intonation as in (5) above, and
on the co-occurring grammar and lexis, as in (6) and (7) above. The re-
mainder of the article considers in particular the interaction between tunes
and their context of occurrence, especially co-occurring grammar and le-
xis.
Much of the attitudinal description of English intonation has been criti-
cised on the grounds that it underestimates the contribution of contextual
factors to intonational meaning. So particular tunes can be shown to be
associated with widely different attitudes according to the context in
which they occur. One such example is shown in (6) and (7) above. An-
other example can be drawn from O'Connor and Arnold (1961: 45 ff.)
where two of the attitudes associated with the rise-fall nuclear tone in
English are said to be 'impressed' and 'challenging'. These two attitudes
International Misfits 69

seem very different, but in fact the two attitudes often correlate with
whether the speaker is agreeing or disagreeing with the preceding speaker,
e.g.
(10) (He's got two wives.)
I"know.
(11) (I don't like the man.)
You've never even "spoken to him.
Any one tune can occur in a wide variety of contexts and hence result in a
wide variety of conveyed attitudes. The basic meaning associated with a
particular tune must therefore be of a much more abstract kind than any
of these individual conveyed attitudes. A description of intonation should
at least try to extract the meaning common to all or most occurrences of a
particular tune, however abstract that meaning may be.
In associating different tunes with different lexical items, it is generally
true that any tune may be associated with almost any co-occurring gram-
matical structure and lexical items. In order to factorize out an abstract in-
tonational meaning for one tune, it is therefore necessary to examine a
large number of varied examples of the use of that tune. However a more
economical procedure may be possible precisely because the possibilities
of combination between intonation and grammar/lexis are only almost
free. In other words, the few occasions where particular tunes are impossi-
ble may be expecially revealing. They may indicate something about the
meaning of a tune by indicating what is completely disharmonious with it.
Of course the charge might be made that there are as many negative
aspects to the meaning of a tune as there are positive aspects but prima fa-
cie this seems unlikely when the cases where particular tunes are impossi-
ble are so much more limited than those where they are possible. At any
rate, it seems probable that a systematic examination of such intonational
misfits might reveal something about the meaning of some tunes (and pos-
sibly also something about the characteristics of particular grammatical
structures and/or lexical items or sets of lexical items). As an initial con-
tribution to the analysis of such intonational misfits, I will examine four
such areas in English: (a) tag sentences; (b) adverbials; (c) negatives; (d)
superlatives.

(A) Tag Sentences

The more common structure associated with tag questions in English is


that of the so-called reversed polarity type, i. e. where the tag is negative if
the preceding clause is positive, and vice versa. Moreover it is well enough
known that there are two typical intonations associated with such tags,
namely those involving a separate falling or low rising tune (by 'separate' I
mean a separate tune additional to that on the preceding clause), e. g.
70 A. Cruttenden

(12) He's "gone / "hasn't he?


(13) He's "gone / .hasn't he?
(14) H e hasn't "gone / "has he?
(15) H e hasn't "gone / ,has he?
However in some cases one or other of the tunes will seem very strange. I
have many times had to correct foreign learners saying:
(16) It's a nice "day / -isn't it?
N o w as a general form of social chit-chat such an intonation is impossible
(although it is not impossible to invent a very unusual situation where it
might be possible). Why should the low rise be odd here? Tags of the re-
versed polarity kind differ from tags of the constant polarity kind (e.g.
You did, did you?) in asking for at least a response, but response expecta-
tion is considerably more restricted with a falling tune than with a rising
tune. T h e response to a falling tune is restricted by the high expectation of
agreement, whereas the rising tune has only a slight bias towards expecta-
tion of agreement. Quite clearly people usually expect agreement in mat-
ters of weather, or rather, more precisely, they do not allow much room
f o r disagreement.
As mentioned already, reversed polarity tags expect a response of agree-
ment. They suggest that the speaker and the listener collude to agree on
the statement preceding the tag (I am talking only of tags added on to pre-
ceding statements by the same speaker: tags in response utterances serve a
wider array of functions). If the speaker is suggesting that the listener
agree with him, the implication is that the listener knows something about
the statement with which he is invited to agree; 'knowing something' may
involve a fact or an opinion (e.g. He's a Quaker, isn't he? or I'm a fool,
aren't I?). We would not therefore expect to find tags appended to state-
ments about which the listener could not possibly have any knowledge or
opinion. But curiously enough tags in such situations now occur in some
dialects of English (e.g. Cockney). I have recently heard the following ex-
amples:
(17) (I wanted to have a nice lie-in this morning. And what happened?).
T h e postman came banging on my "door / at half-past "seven /
"didn't he?
(18) [One fisherman to another] (What's the matter?) [Reply] T h e
hook's caught on the "bottom / "isn't it?
(19) (Why didn't you come to see me?). T h e landlord wouldn't let me
"in / "would he?
(20) [Mother to child] (Why haven't you finished your "homework?)
[Reply] I can't "do it / "can I?
All these examples are in the nature of rhetorical questions where no reply
is expected. Notice that all the tags in the above examples have falling
tags, the form which typically demands agreement more strongly from the
previous speaker. Hearing such tags, speakers of different dialects will in-
Intonational Misfits 71

terpret them with different degrees of 'jocularity'. To speakers of R. P.,


they will appear to break the rule concerning tags which demands that the
listener be in a position to at least have an opinion about the preceding
statement; they will therefore appear completely jocular. But to speakers
of Cockney, for example, there is nothing strange about them at all. Tags
of this sort simply add an attitude of complaint to the preceding state-
ment. Thus intonational misfits in one dialect may be perfectly acceptable
in another. Speakers of the dialect where they are unusual will interpret
such intonational misfits in a rather special way, i. e. as jocular, or ironic,
or sarcastic.
What I am now saying then is that combinations of intonation and con-
text which are generally considered misfits may sometimes be heard and
that any such misfit may be interpreted as indicating that an utterance is
not to be taken literally. Consider now constant polarity tags (e.g. He's
gone, has he?). Such tags do not invite agreement from the listener; rather
they themselves indicate that the speaker is reflecting on (and not dis-
agreeing with) something he has just been told or learnt. They are echo
tags, sometimes with a threatening overtone (which may depend on a nar-
rowed pitch range and also on the co-occurring context), e.g.
(22) [Son to father] (I'm afraid I've had an accident with the car.)
[Reply] Oh you "have / .have you?
But whether threatening or not, the majority of tags imply agreement with
what the speaker has just learnt, e. g.
(23) [Mother] (Alison's eloped with John.)
[Father] Oh she "has / .has she?
Here the father may echo the mother's statement as resigned acceptance
of an inevitable event or as a threat, but is generally implying that he does
not disagree with the mother. But imagine a situation where Alison comes
through the door as the father makes his statement. There is obviously
then a misfit between the situation and the generally agreeing nature of
such constant polarity tag sentences with the result that any listeners, in-
cluding the mother, will almost certainly interpret father's sentence as iro-
nic.
The tune associated with constant polarity tags is almost always low ris-
ing. It cannot for example be high rising, e. g.
(24) He's gone / 'has he?
Why can the high rise not occur? The answer seems to be that a high rise
will always have a strong note of query or surprise in it, whereas constant
polarity tags generally imply acceptance or agreement. There is therefore
a misfit. A high rise on a tag may however occur in such utterances pro-
vided there is a long pause between preceding clause and tag. The tag then
becomes something in the nature of an afterthought and has claims to be
considered as a separate sentence. High rise is not a misfit on a tag which
is used as a separate response utterance.
72 A. Cruttenden

(Β) Adverbials

Adverbials which modify the whole sentence in English commonly occur


in sentence-initial position (for the purposes of this article I include time
and place adverbials). In this position they typically take a falling-rising
intonation (less commonly a low rise or a mid level) if they are important
enough to be given a separate intonation-group of their own, e.g.
(25) Un"'fortunately / they didn't succeed.
(26) "Wisely / he didn't let them "know too much.
(27) On "Fridays / I go to Jennifer's.
(28) In "Manchester / we do it "this way.
When such adverbials occur in final position, they may take a low rising
tune (alternatively they may take a low level, non-prominent, intonation,
if following a preceding fall), e. g.
(29) They didn't succeed / unfortunately.
(30) He didn't let them "know too much / .wisely.
(31) I go to "Jennifer's / on .Fridays.
(32) We do it "this way / in .Manchester.
But an interesting fact about such adverbials is that there is a small subset
which always demand a falling intonation in either initial or final position.
Very often such falling adverbials are semantically very close to other ad-
verbials which take the more usual rise. This is illustrated in the following
pairs (asterisks indicate those examples which are much less likely):
(33a) Accidentally / he smashed my favourite "vase.
(33b) *Accidentally / he smashed my favourite "vase.
(34a) ^Deliberately / he smashed my favourite "vase.
(34b) Deliberately / he smashed my favourite "vase.
(35a) He goes by "train / .usually.
(35b) *He goes by "train / "usually.
(36a) He goes by "train / "always.
(36b) *He goes by "train / .always.
(37a) I've called him a "fool / "regularly.
(37b) "I've called him a "fool / .regularly.
(38 a) He "did it / .probably.
(38b) *He "did it / "probably.
(39a) *He "did it / .definitely.
(39b) He "did it / "definitely.
Thus accidentally, usually and probably commonly take a rise while delib-
erately, always, regularly and definitely commonly take a fall. Or putting it
in the terms of this article a falling intonation is a misfit for the first group
and a rising intonation a misfit for the second group. What does this tell
us about the nature of the two groups or about falls and rises? The rising
group appear to be semantically limiting whereas the falling group appear
to be semantically reinforcing. This seems to be something basic to the
International Misfits 73

meanings of rises and falls in English (at least in R. P. and in many other
dialects). And quite apart from the intonational relevance of this division
among sentence adverbials, where in any syntactic or semantic description
would we find any classification scheme which would separate regularly
and usually? In this case intonational misfits have been informative both
about intonation and about the semantics of adverbs.
It is not only sentence adverbials which show this division into a rising
type and a falling type. Adverbials which function as adjectival modifiers
of degree show a similar division, e. g.
(40) He's "partially wrong.
(41) He's 'completely wrong.
Indeed this limiting/reinforcing division is apparent more generally in
English, e.g.
(42) I "thought he was married.
(43) I "know he was married.

(C) Negatives

The complex interaction between intonation and negatives has been dis-
cussed by many writers (e.g. Palmer, 1922; Schubiger, 1935, 1957; Lee,
1956; Halliday, 1967; Liberman and Sag, 1974; Ladd, 1977; Bing, 1980).
The most well-known examples of all are those from Palmer (1922:1):
(44a) He doesn't lend his books to anybody.
(44b) He doesn't lend his books to "anybody.
Palmer's gloss on the meanings of the two examples is as follows: (44a)
means 'he lends his books to nobody' and (44b) means 'he is rather parti-
cular as to the persons he lends his books to; he doesn't lend them to
everybody'. One way of simplifying these glosses is to suggest that two
different meanings of anybody are involved: (a) 'absolutely everybody'
(and this meaning takes the falling tune); and (b) 'any second-rate person'
(and this meaning has the falling-rising tune). Conversely a falling-rising
tune is incompatible or disharmonious with the meaning 'absolutely every-
body' while the falling tune is incompatible with the meaning 'any sec-
ond-rate person'. We are not here involved with an absolute misfit but
rather a limited misfit in which the misfits involve a particular tune and
one of two meanings associated with a lexical item. Ladd (1977:24) sug-
gests the contrast of meanings between fall and fall-rise (or in his terms
Ά-rise') as 'plain focus' versus 'focus within a given set'; the latter mean-
ing is obviously incompatible with the meaning 'absolutely everybody'.
This incompatibility is made even clearer if we actually use a more precise
synonym in each case, and thus produce absolute misfits:
(45a) *He doesn't lend his books to 'any old person.
(45b) He doesn't lend his books to "any old person.
74 A. Cruttenden

(46a) He lends his books to 'nobody.


(46b) *He lends his books to "nobody.
However Ladd's suggested meaning for fall-rise of 'focus within a given
set' does not provide a very relevant gloss for other sentences involving
fall-rise and negatives, e. g.
(47a) I wouldn't do it if you hit me on the head.
(47b) I wouldn't do it if you hit me on the "head.
(47a) indicates that hitting on the head would not be a sufficient induce-
ment to undertake the task while (47b) indicates that hitting on the head
would prevent the speaker from undertaking the task. In (47a) I would do
it unter no conditions, not even head-hitting; whereas in (47b), if you did
not hit me on the head, I might be able to do it. In (47b) the fall-rise indi-
cates something left unsaid; the speaker is stating the basic fact with
'reservations'. This term 'reservations' is I think a better general character-
ization of the meaning of fall-rise than 'focus within a given set'. Clearly
absolute meanings of negative sentences will not harmonize with the
'reservations' meaning of fall-rise.
Before leaving this particular topic it is worth noting that a recent ex-
periment (Iannucci and Dodd, 1980) studied children's acquisition of in-
tonation in this area with particular use of the examples:
(48a) "All the rabbits aren't in the cages.
(48b) \AJ1 the rabbits aren't in the cages.
If we overlook the fact that (48b) is a somewhat strange combination (per-
haps it sounded better when presented along with one of the illustrations
on which the experiment was based), the experiment showed clearly that
children take much longer to develop understanding of (48a) than (48b).
The interaction between fall-rise and negatives, and probably the use of
fall-rise generally, causes considerable trouble to native learners of Eng-
lish and almost certainly to learners of English as a second language as
well (see, for example, Scuffil, 1980).

(D) Superlatives

Relationships between superlatives and intonation have to my knowledge


not been touched on anywhere in the intonational literature. The discus-
sion which follows is developed from an original idea of a colleague,
D. A. Cruse. Among English adjectives the potential use of an extra high
fall (here symbolized by ") is diagnostic of an innate superlative. Consider
the following sequence:
(49) It's not just "big. It's 'very big. In fact, its "huge!
The intonation here associated with 'huge' is unlikely with any adjective
which is not an innate superlative, e. g.
International Misfits 75

(50) He's not just "average size. O r even a bove average size. *He's
big.
As further exemplification, compare the following sets:
(51) It's "huge. It's "massive. It's "marvellous. It's "tiny.
(52) "It's "tall. *It's "pleasant. *It's "small. *It's "thin.
T h e adjectives in (52) can only occur with the extra high fall where there
is an overt comparison with their opposite, e. g.
(53) (It's only short people who are allowed in, people like Peter, Mar-
garet and Jane.) But Jane's "tall!
They certainly cannot occur as the finale of a 'culminating' set of expres-
sions as in (49) and (50). There seems to be some sort of metaphorical link
between the height of the fall and the noteworthiness of the feature de-
scribed by the adjective. Conversely there is a hiisfit between an extra high
fall and an adjective which is not an innate superlative.
So what is the relevance of intonational misfits, that is, intonations
which are impossible, or at least highly unlikely, in particular contexts,
chiefly linguistic contexts involving co-occurring lexis a n d / o r grammar?
Such misfits may be limited, where a particular intonation eradicates the
choice between two possible meanings of lexis/grammar, as in the case of
certain types of negative sentence; or they may be absolute, as in the pro-
hibition on the use of rising intonations with some sentence adverbials.
Such intonational misfits may, firstly, act as operational tests to produce
delicate distinctions within grammatical classes, as, for example, in the
subdivision of frequency adverbials into the falling set which includes re-
gularly and always and the rising set which includes usually; as, too, in the
identification of adjectives which are innately superlative. Secondly, into-
nations which are actually heard in contexts where they would usually be
thought to be impossible (i. e. where they would normally be regarded as
misfits) will be interpreted as jocular or ironic, as in the use of reversed
polarity falling tags, which generally assume that the listener has certain
knowledge, and which may be interpreted as jocular in cases where the lis-
tener clearly does not have such knowledge. Lastly, intonational misfits
will be particularly revealing about the basic abstract meanings of tunes:
for example, falls are shown to be essentially reinforcing while rises are li-
miting. T h e fact that the meaning of a particular tune is heavily dependent
on co-occurring context need not be quite as big a disadvantage as has of-
ten been thought if we focus on the small number of cases of contextual
incompatibility.

References

Allerton, D. J. and Cruttenden, A. (1979). Three reasons for accenting a definite subject.
Journal of Linguistics 15: 49-53.
76 A. Cruttenden

Bing, J. Μ. (1979). Up the noun phrase: another stress rule. In Engdal, E., and Stein, M.,
Papers presented to Emmon Bach by his students. Amherst: University of Massachusetts.
Bing, J. M. (1980). Intonation and the interpretation of negatives. Cahiers linguistiques d'Ot-
tawa, NELS 10: 13-23.
Bolinger, D. (1972). Accent is predictable (if you're a mindreader). Language 48: 633-644.
Cruse, D.A.C. (1980). Review of Hawkins, J. A. (1980), Definiteness and indefiniteness: a
study in reference and grammaticality prediction. Journal of Linguistics 16: 308-316.
Fuchs, A. (1980). Accented subjects in 'all-new' utterances. In Brettschneider, G., and Leh-
mann, C., Wege zur Universalienforschung: sprachwissenschaftliche Beiträge zum 60. Geburts-
tag von Hansjakob Seiler. Tübingen: Narr.
Halliday, M.A.K. (1967). Intonation and Grammar in British English. The Hague: Mouton.
Hawkins, J. A. (1978). Definiteness and Indefiniteness: α Study in Reference and Grammaticality
Prediction. London: Croom Helm.
Iannucci, D. and Dodd, D. (1980). The development of some aspects of quantifier negation
in children. Papers and Reports on Child Language Development, 19: 88-94. (Stanford Uni-
versity: Department of Linguistics).
Ladd, D. R., Jr. (1977). The Function of Α-rise Accent in English. Bloomington: Indiana Lin-
guistics Club.
Ladd, D. R., Jr. (1979). Light and shadow: a study of the syntax and semantics of sentence
accent in English. In Waugh, L. R. and Coetsem, F. van (eds.), Contributions to Grammati-
cal Studies: Semantics and Syntax. Leiden: Brill.
Lee, W. R. (1956). English Intonation: a New Approach. Lingua 5. 345-371.
Liberman, M. and Sag, I. (1974). Prosodic Form and Discourse Function. Tenth Regional
Meeting of the Chicago Linguistics Society.
O'Connor, J. D. and Arnold, G. W. (1961). Intonation of Colloquial English. London: Long-
mans.
Palmer, Η . E. (1922). English Intonation with Systematic Exercises. Cambridge: Heffer.
Schubiger, M. (1935). The Role of Intonation in Spoken English. Cambridge: Heffer.
Schubiger, M. (1958). English Intonation: its Form and Function. Tubingen: Max Niemeyer.
Scuffil, M. (1980). The Interpretation of English Intonation Patterns by Native Speakers and
German-speaking Learners. Unpublished Ph. D. thesis, University of Cambridge.
ANNE CUTLER

Stress and Accent in Language Production and Under-


standing1

Introduction

The central thesis of this paper is that the psycholinguistic evidence on the
perception of prosodic structure in language understanding, and on the
determination of prosodic structure in language production, converge to
show two aspects of the same phenomenon. That is to say, the perceptual
and production evidence together enable us to construct a picture of our
mental representation of the role of prosody in language, and the way this
knowledge is expressed in language use.
Rather than attempt to cover all aspects of prosodic structure, and all
the relevant evidence on how each type of prosodic variation is produced
and perceived, this chapter will concentrate on two phenomena only:
stress and accent. Stress and accent each concern the relative prominence
of one syllable in comparison with others; but as defined here, stress is a
property of words, accent of sentences (or utterances).
The two are not independent in the utterance itself. For instance, accent
is usually realised on a syllable which is marked for stress (although some
exceptions to this generalisation will be discussed below). However, at a
more abstract level of linguistic processing the two phenomena are quite
distinct and are driven by totally independent processes. (For a compre-
hensive discussion of the independence yet interaction of accent and
stress, though one in a theoretical framework different from that devel-
oped in the present paper, see Jassem & Gibbon, 1980.) The evidence to
be cited below, and hence the theoretical conclusions to be drawn, refer
exclusively to English; and the mental representations of prosodic struc-
ture which are inferred are therefore presumably those of English speak-
ers only and are by no means necessarily shared by speakers of other lan-
guages in which word stress and sentence accent are differently expressed
or not expressed at all.

1 This research was supported by a grant from the Science and Engineering Research Coun-
cil, U. K., to the University of Sussex.
78 A. Cutler

Stress

Word stress patterns are an integral part of the phonological representa-


tion of words in the mental lexicon; they are not generated by rule. Al-
though it can be shown that speakers possess general knowledge of the
stress patterns of their language — they can assign appropriate stress pat-
terns to new derivations (splendidijy) or nonsense words (porpitude), for in-
stance, by analogy to other words they know — nevertheless there is ade-
quate evidence that in the process of language production, stress patterns
are not assigned to each word by application of general rules, but are re-
trieved along with the rest of the pronunciation of the word when the
word is looked up in the mental lexicon.
This evidence is provided, for example, by the guesses which speakers
offer when they can't remember a word but have it on the tip of their
tongue (see e.g. Brown & McNeill, 1966). A speaker searching for the
name Ghirardelli, for instance, produced the guesses Garibaldi, Gabrielli
and Granatelli (Browman, 1978) — all of them words with some sounds in
common with the target word, chiefly initial and terminal sounds; but cru-
cially, all of them words with the same number of syllables and stress pat-
tern as Ghirardelli. Brown and McNeill proposed that the lexical entries of
infrequently used words could become faint with disuse, so that only parts
could be clearly read - perhaps the beginning and the end, but often the
number of syllables and the location of primary stress. This argument pre-
supposes that stress patterns are listed in the lexicon.
Evidence from slips of the tongue in spontaneous speech provides a
similar picture. For instance, it has been argued by Fay & Cutler (1977)
that semantically unrelated word substitution errors (e.g. confession for
convention) arise when instead of the intended word a near neighbour of it
in the mental lexicon is mistakenly selected. Such errors show an obvious
similarity of sound to the word which the speaker intended to say, on the
basis of which Fay and Cutler argued that the lexicon used in language
production is arranged by sound properties (that is, since such a lexicon is
obviously adapted for use in comprehension, the comprehension and pro-
duction lexicons are the same - there is only one mental lexicon). The
sound similarities in this kind of slip of the tongue include stress pattern:
almost without exception the error and the intended word have the same
number of syllables, with the same syllable carrying the primary stress.
Again, the lexical explanation of these errors as offered by Fay and Cutler
presupposes lexical representation of word stress.
A second variety of speech error which provides evidence pointing in
the same direction is the misapplication of stress itself - e. g. saying super-
fluous for superfluous. Such errors are not uncommon; and they show a
consistent pattern: the stress falls wrongly on a syllable which bears stress
in a related word (e.g. superfluity). Cutler (1980) argued that words de-
Stress and Accent 79

rived from the same root morpheme are stored together in the mental lexi-
con, and that stress errors arise when the root and the correct ending are
chosen, but the primary stress is applied to a syllable marked for stress not
in the intended derivative, but in another member of the lexical cluster.
This account also relies upon the lexical representation of stress informa-
tion. A variety of speech production evidence, therefore, converges to in-
dicate that lexical stress pattern forms part of the representation of words
in the mental lexicon.
Accordingly one would expect that identification of lexical stress pat-
tern would play a large part in word recognition during language compre-
hension. And indeed it does. For instance, when words are misheard, it is
the stress pattern and usually the nature of the stressed syllable which de-
termines what listeners think they hear (see e.g. Games & Bond, 1975).
Typical hearing errors include her peas oyster for herpes zostertesticle for
testable, your purse for reverse, are you using paint remover? for are you
gonna paint your ruler?(a\\ examples from Browman 1978). In each case at
least the steady state portions of the most highly stressed syllables have
been correctly perceived - not surprisingly, since stressed syllables are
acoustically clearer than unstressed syllables. But importantly, the stressed
syllable information has been used to reconstruct a message, in which
number of syllables and stress pattern are the same as in the original, but
precious little else is preserved. Only semantic incongruity (of the per-
ceived message with the context, or of the interlocutor's response based
on an incorrect perception) alerts participants in a conversation to the fact
that a slip of the ear has occurred. One may assume that reconstruction of
an utterance on the basis of partial information in this manner is not con-
fined to those cases where it produces an incorrect result; if it did not of-
ten produce acceptable results it would presumably be abandoned as a
speech perception strategy. That is to say, it would seem that drawing
heavily on information about stress pattern and the nature of the stressed
syllable is a reasonably usual and efficient way of perceiving speech.
What happens, then, when stress pattern information is incorrect? N o t
surprisingly, this often precipitates an error of interpretation. Puns, re-
ports Lagerquist (1980), fail to work when they involve a stress shift. Cut-
ler (1980) reports that a hearer who heard the word perfectionist stressed
on the first syllable, with the second syllable reduced, parsed it as a perfect
shnist, and only became aware of the error when no meaning could be
given to shnist. Bansal (1966) presented listeners with English spoken by
Indian speakers, who often applied word stress in an unorthodox manner,
and found that the listeners tended to interpret what they heard to con-
form with the stress pattern, often in conflict with the segmental informa-
tion. For example, when words with initial stress were uttered with sec-
ond-syllable stress, atmosphere was heard as must fear, yesterday as or study,
character as director, and written as retain. Similarly, when two-syllable
80 A. Cutler

words with stress on the second syllable were uttered with initial stress,
hearers perceived prefer as fearful\ correct as carried, and about as come out.
Robinson (1977) conducted an experimental investigation of the im-
portance of stress pattern in word recognition. Subjects heard lists of
two-syllable nonsense sequences with either initial or final stress. In a
false recognition test they were then presented with two-syllable items
which were made up of the same syllables they had already heard al-
though never in the same combinations which they had heard. Subjects
tended to accept these items (erroneously) if the stress levels of the syl-
lables were the same as they had been in the original presentation. Sim-
ilarly, interference effects in free recall of both nonsense items and short
phrases were found as a result of stress pattern similarity; in other words,
stress pattern identity can precipitate false recognition, often in defiance
of segmental evidence.
As one would suspect, it is not only the case that false stress informa-
tion leads to difficulty of word recognition; prior knowledge of stress pat-
tern can facilitate correct word recognition. Engdahl (1978) found that
when listeners were given a sentence complete but for the last word (e. g.
'Laura tried all the keys but the door wouldn't '), and were asked to
supply an appropriate continuation as quickly as possible, they could do
the task well on the basis of contextual cues alone, but responded signifi-
cantly faster when the stress pattern of the required word was presented
(as a pattern of tones) as well.
Even in the absence of context, stress pattern information can facilitate
lexical access. Subjects in a simple visual lexical decision experiment run
by the present author at Sussex University were presented with words and
non-words in one of two conditions: mixed randomly, or blocked uni-
formly so that a block of 40 items all had the same number of syllables
and stress pattern. Responses in the blocked condition were faster than in
the mixed condition, insignificantly so for the words but significantly for
the non-words. Interestingly, Hirst and Pynte (1978) proposed lexical
stress as one of several arbitrary features which could be incorporated in a
language and which served the function of providing 'extra' word identifi-
cation cues. One of the other such features which Hirst and Pynte pro-
posed was noun gender, and in an analogous experiment to the stress pat-
tern one just described, they presented subjects with French words and
non-words mixed or blocked as to gender; prior knowledge of gender, as
of stress pattern, facilitated performance to a significant extent in the
non-word condition, though not in the word condition. Hirst and Pynte
concluded that prior information about an arbitrary lexical feature such as
gender or stress allows a subcategorisation of the lexicon, so that only the
relevant entries need be consulted; in the case of non-words, the number
of items to be fruitlessly searched is thus significantly reduced by the ap-
propriate information. An alternative explanation, involving not partition-
Stress and Accent 81

ing of the entire lexical space but access from each entry of simply the
stress pattern information alone, is also possible - such a procedure would
also greatly increase the speed with which nonwords could be checked
against potential entries. But whichever is the case, the implication is that
stress is not merely information which becomes available on access of a
word's lexical entry, but is of use to guide lexical access, i. e. enable only
those entries with appropriate stress patterns to be fully accessed. This ex-
planation again depends upon the assumption that stress is a feature of the
lexical entries for words.
It might be argued, of course, that the phenomenon of vowel reduction
in unstressed syllables makes the interpretation of much of the evidence
cited above somewhat doubtful. Since changes of stress pattern usually en-
tail changes of vowel quality, the apparent perceptual effects of lexical
stress may in fact not be exercised directly, but only indirectly via their
segmental side-effects. However, a recent experiment by Cutler and Clif-
ton (1983) demonstrates that when vowel quality is controlled, lexical
stress information itself can indeed be shown to play a part in the word re-
cognition process. The experiment was motivated by a prior study by Ga-
nong (1980), who found that the typical stop consonant identification
function for synthetic stimuli varying in voice onset time (VOT) could be
affected by lexical factors. If, for example, the same [t]-[d] continuum is
prefaced to the syllables [ik] and [ip], in one case the [t] version forms a
word (teak) while the [d] version does not, while in the other case the [d]
version is a word (deep) whereas the [t] version is not. Using many such
pairs, Ganong found that subjects characteristically shifted the crossover
point of their identification function towards the short-lag end on the
V O T continuum (i. e. reported more voiceless than voiced stops) when the
voiceless stop made a word while the voiced stop did not, and shifted it
towards the long-lag end (i. e. reported more voiced than voiceless stops)
when the voiced stop made a word but the voiceless stop made a non-
word. Cutler and Clifton prepared an analogous set of stimuli based on
the words tigress and digress, which contain identical segments from the
second segment on, but differ in stress pattern. Using synthesised versions
of these words in which the initial consonants varied in VOT, Cutler and
Clifton found a similar effect of lexical status to that found by Ganong:
when stress fell on the first syllable, subjects shifted their identification
function crossover so that they reported more [t] than [d], whereas they
reported more [d] than [t] when stress fell on the second syllable. Cutler
and Clifton interpreted this result as indicating that lexical stress alone —
independent of its effects upon vowel quality - can be of primary impor-
tance in word recognition.
Thus the production and comprehension evidence are in full agreement:
lexical stress is an important part of word pronunciations as listed in the
mental lexicon. Moreover, it is, both directly and indirectly via its segmen-
82 A. Cutler

tal repercussions, of great importance to the language processor in the op-


eration of word recognition.

Accent

In any utterance at least one word is accorded a higher level of emphasis


than the others. The accent patterns of sentences are not simply a function
of the lexical stress patterns of the words which comprise them - or even
of the major lexical items which comprise them, ignoring 'function words'
— but depend upon additional semantic and pragmatic factors. The process
of accent placement in the production of an utterance was described by
Cutler and Isard (1980) as operating on a string of words with stressed
syllables marked (and if compound nouns occur in the string, with the
most prominent syllable in the compound marked); the semantic and prag-
matic effects expressed in the placement of accent are (1) the assignment
of focus or emphasis; (2) the expression of contrast; and (3) the deaccen-
tuation of words which would otherwise be accented twice. In the normal
case the effect of these operations will be to allot accent to a word or
words (and, again in the normal case, these words will be major lexical
items - open class words such as nouns, verbs or adjectives, rather than
words from the closed 'function' classes); the accent will then actually be
realised on whatever syllable in the word is marked for lexical stress. In
exceptional cases, however, the semantic and pragmatic accent-assignment
operations can override lexical stress (and compound stress) placement
and result in accent falling on a syllable which is not normally marked for
stress, or on a closed-class word such as a preposition, copula, conjunc-
tion or article.
An example sentence will illustrate the multiplicity of possibilities with
respect to discourse context and its influence upon accent pattern. The
context - including both preceding utterances of the present speaker and
of other speakers, and the speaker's intentions - determines the placement
of accent. In (1), for example, there are exactly three major lexical items: a
noun, a verb and a nominal compound.

(1) This paper was prepared on the word processor.

There are therefore three primary candidates for the most prominent syl-
lable in the sentence: the first syllable of paper; the second syllable of pre-
pared, and the word word which is the stress-marked part of the com-
pound word processor. Any of these three words can bear accent if they are
the focus of the speaker's message, i. e. if the speaker wishes to express or
imply contrast with, or lay emphasis upon, that particular part of his utter-
Stress and Accent 83

ance. Accent on paper, for instance, could imply contrast as a reply to (2),
or express emphasis as a reply to (3):

(2) You've never prepared anything on the word processor.

(3) What did you say was prepared on the word processor?

The reader will have no difficulty constructing appropriate contexts for


emphatic or contrastive accent on the other lexical items. But the same me-
chanisms can also result in accent falling on the non-lexical words of the
sentence; for instance, accent could be applied to the copula was if the
speaker wished to contrast his utterance with another's statement to oppo-
site effect; accent on the could result from the speaker's intention to set
off the particular word processor on which this manuscript was prepared
from other, more ordinary, word processors, and so on. Contrast can even
accentuate a syllable not marked for lexical stress - for instance, one
might accent the first syllable of prepared to correct a hearer's impression
that the word had been, say, compared. Deaccenting could similarly have a
variety of effects. Occurrence of word processor in a preceding utterance
could result in accent falling on the verb even though the verb itself was
not focussed; or if all the lexical items had already occurred, accent could
even fall on the preposition:

(4) What do you mean, the word processor speeds things up? This paper
took five days to prepare, and this paper was prepared on the word
processor.

See Ladd (1980) and Cutler and Isard (1980) for further examples of deac-
centing.
If none of the lexical items is individually marked for focus, then accent
must be assigned by default. The default case, in which no word is singled
out for particular emphasis, and contrast and deaccenting are not in-
volved, is equivalent to focus on the clause as a whole (see Ladd, 1980, for
a good discussion of this), in which case the accent falls on the rightmost
lexical item - in our example, on word, in its capacity as the stressed syl-
lable of word processor. If word processor has been marked for deaccenting,
the default accent shifts back one and falls on the verb. That is to say, the
default location for accent is the rightmost lexical item not deaccented. (It
is even possible to contrive a case in which only the word word had previ-
ously occurred in the context, not the whole compound word processor; in
this case only word by itself would be marked for deaccenting, and proces-
sor would still be available to the right of it to receive accent:
84 A. Cutler

(5) This paper about stress on words was prepared on the word proces-
sor.

More natural examples may be found in Cutler and Isard, 1980.)


This account of accent assignment, it will be noted, gives little role to
the syntactic structure of the sentence, in contrast to many linguistic mod-
els. In the sentence production process, it is claimed, accent is assigned
primarily on the basis of semantic and pragmatic factors. Evidence from
speech errors supports this interpretation. For instance, when speakers
correct their accent placement, it is to rectify a false impression given by
the first attempt:

(6) Now if it only occurred — if it 0nly occurred in free recall. . .

(7) Downes has to do something tricky to generate reflexives in his base -


to generate reflexives in his base structure.

Similarly, the way a hearer recognises an inappropriate accent placement


when one occurs is by semantic/pragmatic incongruity rather than syntac-
tic:

(8) She had a lot of cups but the one she gave me leaked.

(9) The only other place I've seen that kind of thing is in Rogers Park -
there's nothing like it right around where we live - where we live.

In (8), for instance, the accent should indeed have been placed on a pro-
noun, but the error consisted in assigning it to the wrong pronoun. Again,
in (9) the assignment of accent to live in 'There's nothing like it right
around where we live' is syntactically perfectly acceptable, but was seman-
tically anomalous in the context.
A corollary of this is that erroneous accent placement is only ever de-
tected by the hearer when it results in semantic anomaly. If it doesn't, the
hearer will simply assume that focus was placed where the speaker in-
tended to place it, with the accented item representing what the speaker
considers to be most important. Thus it is up to the speaker to correct ac-
cent placement if it is important to him to avoid giving a wrong impres-
sion; and correction of accent placement is in fact quite common in every-
day speech.
Again the evidence from sentence understanding complements the pro-
duction evidence: hearers pay great attention to where sentence accent
falls when they are understanding a sentence. This is clear from studies in
which the processing of sentence accent has been explicitly investigated.
For instance, experiments using the phoneme-monitoring task show that
Stress and Accent 85

listeners respond faster to accented than to unaccented words. In a pho-


neme-monitoring experiment people are asked to listen to a sentence and
to press a response key as fast as possible as soon as they hear a particular
word-initial sound. Responses to this sound (the 'target') are faster if it
begins an accented syllable (Shields, McHugh & Martin, 1974), irrespec-
tive of the form class of the word bearing accent (Cutler & Foss, 1977).
Moreover, this response time advantage for accented syllables is not due
solely to the (indubitable) fact that accented syllables are acoustically
more distinct - and hence presumably easier to make sense of - than unac-
cented syllables. This is demonstrated by a further experiment (Cutler,
1976) in which the acoustic cues to accent on the target-bearing word it-
self were removed, leaving only the cues provided by the prosody of the
sentence context. To accomplish this, sentences like (10) were recorded
twice, once with accent on the target-bearing word (in this case code), and
once with accent falling elsewhere.

(10) The top experts were all unable to break the code the spy had used.

The target-bearing word itself was then spliced out of each recording and
replaced in all versions by identical copies of the same word excised from
a third, 'neutral', rendition of the sentence. The result of this manipula-
tion was two versions of each sentence, each version with acoustically
identical target-bearing words, but with differing prosodic contexts; in
one case the prosodic contour surrounding the target-bearing word was
consistent with accent falling on the target-bearing word, in the other it
was consistent with the target-bearing word being unaccented. Under
these conditions the 'accented' targets still elicited faster responses than
the 'unaccented', and since the only differences between the two versions
of each sentence lay in the overall prosodic contour, it was concluded that
the listeners must have used cues in the prosody to direct their attention to
the sentence accent.
Subsequent experiments investigated the nature of these cues; the pre-
dicted accent effect was found to be unaffected by cues to stress contained
in the duration of closure of the target stop consonant (Cutler & Darwin,
1981), indicating that the effect is not localised in the tenth of a second or
so immediately preceding onset of the target word. Cutler and Darwin
also found that removing pitch cues - i. e. monotonising the sentences -
did not do away with the accent effect; in monotonised spliced sentences
like (10) the 'accented' targets were still responded to faster than the 'un-
accented' targets. In other words, fundamental frequency variation is not
a necessary component of the prosodic information pattern on which sub-
jects can base their search for sentence accent. Other cues have proven
sufficient; in the present experiment the available cues included both dura-
tional and amplitude information. As Cutler and Darwin pointed out,
86 A. Cutler

however, this does not imply that either durational or amplitude cues must
be necessarily present for the accent effect to be found. There is ample
evidence from studies of prosodic cues to the perception of lexical stress
and to the perception of syntactic structure that the human speech proces-
sor can derive a given piece of information from any of a wide range of
alternative cues. For instance, the location of a syntactic boundary in a
syntactically ambiguous sentence can be identified on the basis of funda-
mental frequency variation (Lea, 1973; Collier & ' t Hart, 1975) or dura-
tional variation (Lehiste, Olive & Streeter, 1976; Scott, 1980). Likewise
both fundamental frequency (e.g. Cheung, Holden & Minifie, 1977) and
duration (e.g. Nakatani & Aston, 1978) serve as effective cues for the per-
ception of lexical stress.
Thus the perception of sentence accent most probably has in common
with the perception of lexical stress and the prosodically guided location
of syntactic boundaries the fact that they can all be achieved by the use of
quite a variety of cues. The question which must be asked at this point is
why prosodic cues to sentence accent are so efficiently employed. That is
to say, we have already seen that lexical stress is an intrinsic part of the
lexical representation of words, and that stress information therefore
serves to aid word identification. Word identification is obviously the
most crucially important part of language understanding. Similarly, syntax
is a vital part of understanding a sentence, and it is not surprising that any
prosodic cues to syntactic structure should be exploited without hesita-
tion. Is sentence accent, however, in the same league as lexical identity and
syntax when it comes to intrinsic importance within the sentence, and
hence usefulness to the sentence comprehension process?
Consider the written representation of language. Each word has an or-
thographic identity and word boundaries are marked with spaces; word
identification is comparatively easy. Syntax is explicitly represented where
necessary by marks of one kind or another. Sentence accent, however, is
usually not represented at all - in rare cases, usually in more casual writ-
ing, words may be underlined or capitalised for emphasis, but there is no
explicit orthographic convention which signals the location of primary ac-
cent in the way that, for instance, a comma signals the location of a syn-
tactic boundary.
Sentence accent is a property of the spoken sentence only. In the utter-
ance it communicates information structure, as we have seen - focus or
contrast. There are other devices for expressing focus, it is true. For exam-
ple, certain syntactic constructions - e. g. clefting, pseudo-clefting, topical-
isation - serve to bring elements of the sentence into the foreground.
Their use is, however, comparatively rare. The information structure of a
sentence is also affected by the structure of the discourse in which the sen-
tence is uttered. However, the important thing to note is that other de-
vices are subordinate to prosodic focussing; accent overrides both dis-
Stress and Accent 87

course cues to what might be most important in the sentence, and syntac-
tic cues. To take a simple example, in the cleft sentence (11) the focus of
the sentence, and hence the location of sentence accent, would in the de-
fault case be assumed to be Sandy.

(11) It was Sandy who started it.

But there is nothing unusual in using a cleft construction in which the


clefted item is old information and the accent falls elsewhere:

(12) A: Don't tell me Sandy got into that fight.


B: It was Sandy who started it!

This is not a major point; it is only to emphasise that syntactic cues to fo-
cus cannot override accent - accent always determines information struc-
ture. Thus the new information in (12) is about starting the fight.
Given that accent is the primary cue to what the speaker considers the
most important part of his utterance, then, it is clear that using cues in the
prosodic contour to direct one's attention to the accented words as speed-
ily as possible would be an effective way of getting quickly to the most im-
portant information. Thus it is no surprise to find that one can mimic the
accent effect in phoneme-monitoring by manipulating information struc-
ture alone without changing the accent. Cutler and Fodor (1979) demon-
strated this by having subjects listen to sentences which were preceded by
questions focussing on one or other part of the sentence. Thus sentence
(13), for instance, could be preceded either by question (14) or (15):

(13) The author of the bestseller refused to go to the congressman's


party.

(14) Which author refused to go to the party?

(15) Which party did the author refuse to go to?

Half the subjects in the experiment heard (13) preceded by (14), and half
heard it preceded by (15). The sentences were recorded only once, and no
strong accent was placed on any item. That is, the speaker who recorded
(13), for instance, endeavoured to assign approximately equal accent to
each of the six content words in the sentence. It is nevertheless the case, as
we have seen, that in English in the default case accent is stronger towards
the end of the sentence, so that it is possible that accent was perceived to
fall more strongly at the end of the sentence than elsewhere. However, the
important fact is that subjects who heard the first question and subjects
88 A. Cutler

who heard the second question each heard acoustically identical versions
of the sentence which followed. Within each group half the subjects were
listening for a target in the early part of the sentence, focussed on by the
first question, while half listened for a target in the later part of the sen-
tence, focussed on by the other question. In (13), for instance, the targets
were / b / - on bestseller- and / k / - on congressman. Responses to the early
target were faster if subjects had heard the question which focussed on the
early part of the sentence, but responses to the late target were faster if
subjects had heard the question which focussed on the end of the sen-
tence; that is, focussed targets were responded to faster.
An overall effect of perceived accent is presumably responsible for the
further finding that responses to the later targets were generally faster
than responses to the early targets. However, the interesting result is that
focussing on a particular part of the sentence leads to faster responses to
targets at that point, just as accenting a particular part of the sentence pro-
duces faster responses to targets on the accented words. Focus behaves a-
nalogously to accent with respect to its effect on phoneme-monitoring
reaction time. This allows us to conjecture that the accent effect may in
fact be a focus effect; in making use of the prosody to direct their atten-
tion towards accented words, listeners are actually doing this because the
accented words are focussed words. Listeners appear to exploit whatever
cues are available - discourse cues (as in the focus experiment) where they
exist; prosodic cues (as in the accent experiments) where these are there to
be used. In other words, sentence accent perception directly decodes the
information which was encoded in the production of accent; accent re-
presents focus, and perception of accent is perception of focus.
Supporting evidence can be found in the way the ability to use prosodic
cues to accent is developed. Cutler and Swinney (1980) reported two ex-
periments investigating the perception of sentence accent and sentence fo-
cus in young children. These experiments were similar in design to the
perception experiments described above, except that the children moni-
tored for whole words rather than individual phonemes. In the first of
these experiments it was found that young children (aged between four
and six) did not respond significantly faster to accented than to unac-
cented words, although older children showed an accent effect exactly
analogous to the effect in adults. The second experiment tested only four-
to six-year-olds and found that among these, only the older children
showed a focus effect. Although the development of the accent and focus
effects has yet to be monitored in the same children, the results so far
seem to imply that the development of the accent effect is dependent on
the development of the focus effect. That is, until children have learnt that
it is a good idea to listen within sentences for the new, focussed, most im-
portant information, they are not going to be able to develop the relatively
sophisticated techniques which adults use to zero in as quickly as possible
Stress and Accent 89

on sentence focus — e. g. tracking the prosodic contour for the cues it gives
to where accent, and hence focus, is going to fall.
In the children, however, we do find a dissociation between the percept-
ual and productive function of accent. Interestingly, it is a dissociation in
the reverse direction to that usually found in children's (or any other) lan-
guage performance, in that for once productive abilities seem to be ahead
of perceptual. That is to say, the very children who cannot respond faster
to accented words, i. e. do not show the adult effect of directing special at-
tention to accented words and anticipating where they will come, neverthe-
less correctly assign sentence accent to new information and produce pros-
odic contours which embody all the information adults need to predict
where accent will fall.

Conclusion

The picture that emerges, then, is that the complementary role of prosodic
perception and prosodic production develops comparatively slowly. Once
adult language competence has been attained, however, the role of pros-
ody in production and in perception is reciprocal: two sides of one coin.
Word stress patterns are part of the lexical identity of words, not arbitrar-
ily assigned by rule; thus in language production lexical stress patterns are
part of the information about each word which is stored in the mental lex-
icon and retrieved when the word is looked up as a sentence is spoken.
Similarly, identification of stress pattern is part of word identification and
is used in the process of looking up a word in the mental lexicon during
the understanding of a sentence. Sentence accent, on the other hand, ex-
presses the information structure of a sentence; when a sentence is pro-
duced the speaker assigns accent according to what he considers to be the
more and less important parts of what he is saying. A listener hearing a
sentence finds it important to identify the location of accent and uses all
available cues to assist him in his search; and the reason why accent is so
keenly sought appears to be precisely because it expresses focus - thus the
perception of accent is as intimately connected with the information struc-
ture of the sentence as is accent production.

References

Bansal, R. K. (1966). The Intelligibility of Indian English. Ph. D. Thesis, London University.
Browman, C. P. (1978). Tip of the Tongue and Slip of the Ear: Implications for Language Pro-
cessing. UCLA Working Papers in Phonetics 42.
Brown, R. & D. McNeill (1966). The "tip of the tongue" phenomenon. Journal of Verbal
Learning & Verbal Behavior 5: 325-337.
90 A. Cutler

Cheung, J. Υ., A. D. C. Holden & F. D. Minifie (1977). Computer recognition of linguistic


stress patterns in connected speech. IEEE Transactions on Acoustics, Speech & Signal Process-
ing 25: 252-256.
Collier, R. & J. 't Hart (1975). The role of intonation in speech perception. In A. Cohen & S.
G. Nooteboom (eds.), Structure and Process in Speech Perception. Heidelberg: Springer,
1975. 107-121.
Cutler, A. (1976). Phoneme-monitoring reaction time as a function of preceding intonation
contour. Perception & Psychophysics 20: 55-60.
Cutler, A. (1980). Errors of stress and intonation. In V. A. Fromkin (ed.) Errors in Linguistic
Performance: Slips of the Tongue, Ear, Pen and Hand. New York: Academic Press, 1980.
67-80.
Cutler, A. & Clifton, C. E. (1983). Lexical stress effects on phonetic categorization in audi-
tory word perception. Paper presented to the Tenth International Congress of Phonetic
Sciences, Utrecht.
Cutler, A. & C. J. Darwin (1981). Phoneme-monitoring reaction time and preceding pro-
sody: Effects of stop closure duration and of fundamental frequency. Perception & Psycho-
physics 29: 217-224.
Cutler, A. & J. A. Fodor (1979). Semantic focus and sentence comprehension. Cognition 7:
49-59.
Cutler, A. & D. J. Foss (1977). On the role of sentence stress in sentence processing. Lan-
guage & Speech 20: 1-10.
Cutler, A. & S. D. Isard (1980). The production of prosody. In B. Butterworth (ed.) Lan-
guage Production. London: Academic Press. 245-269.
Cutler, A. & D. A. Swinney (1980). Development of the comprehension of semantic focus in
young children. Paper presented to the Fifth Boston University Conference on Language
Development, Boston.
Engdahl, E. (1978). Word stress as an organizing principle for the lexicon. Papers from the
Fourteenth Regional Meeting, Chicago Linguistic Society.
Fay, D. A. & A. Cutler (1977). Malapropisms and the structure of the mental lexicon. Lin-
guistic Inquiry 8: 505-520.
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal of Ex-
perimental Psychology: Human Perception & Performance, 6, 110-125.
Games, S. & Z. S. Bond (1975). Slips of the ear: errors in perception of casual speech. Papers
from the Eleventh Regional Meeting, Chicago Linguistic Society. 214-225.
Hirst, D . J . & J. Pynte (1978). Gender, stress and tone: arbitrary features in the organisation
of the lexicon. Sigma 3: 73-87.
Jassem, W. & D. Gibbon (1980). Re-defining English accent and stress. Journal of the Interna-
tional Phonetic Association 10: 2-16.
Ladd, D. R. (1980). The Structure of Intonational Meaning. Bloomington: Indiana University
Press.
Lagerquist, L. M. (1980). Linguistic evidence from paronomasia. Papers from the Sixteenth
Regional Meeting, Chicago Linguistic Society. 185-191.
Lea, W. A. (1973). Prosodic Aids to Speech Recognition II: Syntactic Segmentation and Stressed
Syllable Location. Sperry Univac Technical Report No. P X 10232.
Lehiste, I., J. Olive, & L. A. Streeter (1976). The role of duration in disambiguating syntacti-
cally ambiguous sentences. Journal of the Acoustical Society of America 60: 1199-1202.
Nakatani, L. H . & C. H . Aston (1978). Perceiving stress patterns of words in sentences. Jour-
nal of the Acoustical Society of America 63: S5.
Robinson, G. M. (1977). Rhythmic organization in speech processing. Journal of Experimen-
tal Psychology: Human Perception & Performance, 3: 83-91.
Scott, D. R. (1980). Perception of Phrase Boundaries. Ph. D. Thesis, University of Sussex.
Shields, J. L., A. McHugh & J. G. Martin (1974). Reaction time to phoneme targets as a
function of rhythmic cues in continuous speech. Journal of Experimental Psychology 102:
250-255.
GRZEGORZ D O G I L

Grammatical Prerequisites to the Analysis of Speech Style:


Fast/Casual Speech

1. The problem

The question of where and why casual speech phenomena are found (i. e.
their extralinguistic conditioning) is independent of the question how they
are to be described within grammar. This second question is the primary
interest of my investigation. Such a narrowly defined domain might not
satisfy many of the researchers into linguistic variation in general and
fast/casual speech in particular. I hope, however, to satisfy grammarians,
and particularly phonologists, who have been most concerned with these
phenomena. When confronted with speech other than 'normal' or An-
dante, i.e.

moderately slow, careful, but natural; typical of, for example, delivering a lecture or teaching
a class in a large hall without electronic amplification (Harris 1969: 7)

the practice of phonologists has depended, as a rule, only to a limited ex-


tent on the general phonological theories assumed by them. The phonolo-
gists, in describing so-called fast (casual, allegro, presto) speech pheno-
mena, were guided not only by the rules of general theory they follow but
also by other incidental factors (like tempo, concentration, familiarity,
and other extragrammtical contexts) which are alien to the grammatical
theory proper. All these extra-grammatical factors play a very important
role in fast speech phenomena and it is tempting for a linguist describing
them to include them somehow in his phonological theory. I will argue,
however, that this should not be done, as it immediately leads to a loss of
clarity and precision in the theory itself. It is clear that a theory which is
not sufficiently clear and precise will not work in practice, and the linguist
who employs such a theory will be inclined to adjust it to those features
which are peculiar to the phenomena he is examining. As a result the role
of phonological theory amounts to that of a set of vague instructions only
in a slight degree restricting the choice of formulations. With this view of
phonological theory in mind, and consequently abstracting away from the
'incidental' questions of fast speech, I will try to confront a number of ex-
isting phonological theories (the standard phonemic theories in § 3, Natu-
92 G. Dogil

ral Generative Phonology in § 4, Natural Phonology in § 5, Standard


Generative Phonology in § 6, autosegmental and metrical phonology in
§ 7) with the more interesting data on fast/casual speech which has been
collected in recent years (§ 2). I hope that this will provide a partial evalua-
tion of these theories and also answer the initial question whether any
phonological theory is able to describe fast speech phenomena within its
own terms, not including incidental factors which are peculiar to these
phenomena but alien to the theory.

2. The data

I have decided to base my argumentation on the data taken from papers


on fast speech written by representatives of the phonological theories that
I am going to discuss. As the major interest here is to show to what extent
a given theory can rationalise fast speech phenomena, I think one has to
be fair and confront the particular theories with their own examples,
rather than base one's criticism on new data, with which the analysts were
not familiar while constructing their arguments. One of the basic prere-
quisites for a phonological change to take place is the requirement for the
items undergoing a change to be adjacent in a string. This phenomenon
called string adjacency or close phonological connection has been observed by
many analysts of fast/casual speech. We note some of the examples pub-
lished recently.

2.1 String adjacency


Stockwell, Bowen, Silva-Fuenzalida, 1956:

es viudo bien, gracias (Harris, 1969)


Largo: [esbyu5o] [mismo]
Andante: [ezbyuöo] [byengrasyas] [mis z mo]
Allegretto: [ezßyu8o] [byeqgrasyas] [mifimo]
Presto: - [mI:mo]

Bolozky, 1977 (Modern Hebrew):

Slow: [yidfok] 'he will knock' [mankal] 'general director'


Fast: [yitfok] [maqkal]

Herok & Tonelli, 1979 (Italian, German):

Slow: [kon permesso] [an pe:tar] [ba:n baosn] 'Bahn bauen'


Fast: [kom permesso] [am pe:tar] [ba:m baoan]
Fast/Casual Speech 93

These clear cases of assimilation which abound in fast speech are often fed
by various deletion processes which the literature on speech style differen-
tiation is also full of. Let us consider some of the examples of deletion and
deletion feeding assimilation:

Gimson, 1980:
I think they are a nice young couple, don't you?

Slow: [ai 0ir)k öei ar 3 nais J'AI] kApl daunt ju:]


Fast: [a 6ujk öear. naij" JAIJ kApl dauntfu:]
I can try and book some seats around the corner
Slow: [ai ken trai an buk sam si:ts raund de ko:na]
Fast: [a kg tra: m buk sip si:ts raun de ko:na]

Zwicky, 1972 (English, Welsh):

I gave him a hat yr oeddwn i yn canu


Ί was singing'
Lento: [ai geiv him 3 hast] [rojöun i: an kani]
Andante: [ai geiv im 3 hae·*] [rojöun i:n kani]
Allegretto: [ai geiv ηι 3 has5] [rojni:n kani]
Presto: [ay gej ip 3 hae5] [roni:n kani]

Herok & Tonelli, 1979 (Italian):

Slow: [abbiamo gridato] [possono pagare] [mand3ano pane]


Fast: [abbiaq gridato] [possom pagare] [mandßam pane]

Another interesting deletion process in fast speech is deletion of the first


vowel in a word. The interesting aspect of this process lies in this that its
across the board application may lead to surface inacceptability, i. e. viola-
tion of constraints on word initial clusters.
Consider the relevant data:

Bolozky, 1977 (Modern Hebrew): 1

Slow: [nejifa] 'a blowing' [metunim] 'moderate'


Fast: [pjifa] [mtunim]

1 A regular obligatory process in M H is the loss of a vowel in the first syllable of non-verbal
forms when a stressed suffix is appended (cf. [kajur] [kfurim] 'tied' Bolozky, 1977:
223).
94 G. Dogil

(English)

Slow: [patejtou] [pic^aemas] [miraekjutas] [makaenik]2


Fast: [ptejtou] [pdßaemas] mraskjulas] [mkaenik]

Zwicky, 1972 (Welsh):3

calon 'heart' calonau 'hearts' tymora 'seasons'


Slow: [kalon] [kalonä] [tamora]
Fast: [kalon] [klona] [tmorä]

Such clusters in Welsh are tolerated up to a point, consider the following:

ceffylau 'horses' hogenod 'girls'


Slow: [kefilä] [hagenod]
Fast: [fila] [genod]

English sometimes like Welsh deletes the whole syllable cf. [ N kaenik] (me-
chanic), or does not delete a vowel: eg., 'deflation', 'revised', etc.
The problem of 'is'/'has' reduction, as well as reduction of other func-
tion words, is an area for itself. Data here is extremely rich and interesting
and I will try to present the most pertinent parts of it, as I think it provides
a very interesting battleground for conflicting phonological theories.
The often pronounced rule of a text-book grammar of English, or any
other stress-timed language for that matter, says: 'form/function words
which are not under stress can be reduced, and/or cliticised, in fast/casual
speech (they are commonly called weak-forms)'.
Examples are quoted in abundance, (Zwicky, 1970; Selkirk, 1972; Sel-
kirk, 1980; Radford, 1981; Postal & Pullum, 1978). The rule, however,
which is true of such form words as determiners, complementisers and
conjunctions 4 does not apply so clearly to other types of non-lexical cate-

2
I think Bolozky's transcription [mkaenik] is incorrect. In my opinion we have to do with
the loss of the whole syllable and only the relic of it (autosegmentalised nasal quality see
§ 7) is present in form of prenasalization of [k] [Nkaenik]).
3
I am quoting here the transcriptions as they appear in Zwicky (1972), although they seem
to be wrong. In all the words quoted accent should appear on the penultimate rather than
the final syllable. Zwicky's mistake may be attributed to the misinterpretation of the tran-
scription systems in classic Welsh grammars where accent mark follows rather than pre-
ceeds the accented syllable.
4
But see Zwicky (1975: 608): "a radio and a television set"; [. . . rejdiou ij a . . .] not [rejdi-
oun 3 . . .]; 'go to an ogre' [gou t3 q owgr. . .] *[gou tan owgr. . .]; 'away in a manger"
[sweij q 3 . . .] *[3wejn 3 . . .]. The explanation on the lines of Giegerich (1981) will be of-
fered in § 7.
Fast/Casual Speech 95

gory items with strong/weak pairs. This fact has already been noticed and
described by Henry Sweet (1908), who mentioned the following examples:

a) I thought of it. [av]


b) What are you thinking of? [AV]
a) He's a fool, [z]
b) What a fool he is. [tz]
a) I don't know whether she'll do it or not. [1]
b) I don't know whether he will or not. [wil]

Sweet's explanation for the constraint on reduction (cf. Sweet, 1980: 31,
1908: 68) was that weak forms can occur only with words they are struc-
turally connected with. That is, string adjacency is not the only require-
ment for fast speech rules to alter and reduce phonological structure: one
also requires structural adjacency. I shall provide the relevant data under a
separate heading. While fast speech rules require statements about struc-
ture adjacency in their SD part, only a phonological theory providing for
such adjacency in its representation can be descriptively adequate.

2.2 Structure adjacency


Whereas in Sweet's examples a strong form of a function word always
stands at the end of a sentence or before a parenthetical, (e. g. He is, if I
may be allowed to say so, mistaken, [iz]), more recently, numerous re-
searchers have pointed out that reduction is not possible before various
adverbials and complement phrases within a sentence:

I wonder where Gerard is today. *[z] (King, 1970: 9)


Mary's more adept at poker than John is at pool. *[z] (Bresnan, 1971)
Ready I am to help you. *[m] (Lakoff, 1970)
D o you know who the king is now? *[z] (Zwicky, 1970)

N o matter how fast one speaks, contraction (reduction) in the above men-
tioned cases is impossible. It is also not possible in other cases mentioned
for auxiliaries in Zwicky (1970).

The fact that it was she will be a blow to the party. *[1]
You and I have gone there once too often. *[3v]
The police boxed in the crowd. (If 'in' not structure-adjacent with
'the crowd'.)

The last examples, which we will discuss in § 7, even in the case of struc-
ture adjacency of 'in the crowd', where reduction of 'in' takes place, can-
not be cliticised to the verb: *[bokstn]
96 G. Dogil

A very specific kind of reduction and cliticisation has been noticed by se-
manticists and syntacticians. The case concerns 'to' reduction and cliticisa-
tion, thoroughly discussed in Lakoff, 1970; Selkirk, 1972; Lightfoot,
1976; Chomsky & Lasnik, 1977; Postal & Pullum, 1978; Andrews, 1979,
among others. Here are some relevant data concerning English verbs fol-
lowed by 'to' which can be reduced in fast speech:

want + to = wanna I wanna look at the chickens.


going -I- to = gonna I'm gonna buy some books.
have + to = hafta You hafta go.
ought + to = oughta You oughta be a secret agent
used -I- to = usta The beast usta provide amusement for the
masses.
got + to = gotta
supposed -I- to = sup- You've gotta be here.
posta [sposta]
He was supposta go out.
The process, which is very general for those verbs, involves the deletion of
a coronal consonant (cf. Selkirk, 1972: 200) and can be also generalized to
other verbs (in very fast speech)

went -I- to —• wenna -<- We wenna Chicago,


demand + to demanna ->- I demanna see a lawyer,
intend + to ->• intenna I intenna sue.

Notice, however, that the items undergoing this process have to be struc-
turally adjacent:

They want, to all intents and purposes, to destroy us.


* They wanna, all intents and purposes, to destroy us.
I am going to the meeting of the United Ornithologists of Great Britain.
* I am g o n n a . . .
I have to the present day avoided eating the flesh of such creatures.
* I hafta the present day . . .
I want, to be precise, a yellow, four-door De Ville convertible.
* I wanna, be precise . . .
We cannot expect that want to be satisfied.
* We cannot expect that wanna be satisfied.

Beside these structure adjacency violations, which are visible from surface
syntax, to-contraction (adjuction) is constrained in more sophisticated
ways. Consider the following paradigm first noticed by Larry Horn:
Fast/Casual Speech 97

a) Teddy is the man I want to succeed.


b) Teddy is the man I wanna succed.

Example a) is ambiguous and can mean:

I want Teddy to succeed.


I want to succeed Teddy.

However, b) has only the latter interpretation (I want to succeed Teddy).


That is, if we want to bring out the meaning "I want Teddy to succeed" we
cannot adjoin to, no matter how fast we speak. This paradigm holds for all
the verbs that allow to-adjunction (cf. Lightfoot, 1976; Chomsky & Las-
nik, 1977). The paradigm is also clearly present in the competence of na-
tive speakers of American English, and hardly learnable by foreigners.
I once devised an experiment (1977 in Poznaü), in which I asked 10 na-
tive speakers and 10 Poles with a fluent command of American English to
answer the following two questions:

What do you use to take pictures with?


What do you usta take pictures with?

The questions were recorded by a native speaker who was instructed to


speak as fast as possible but to bring out the relevant difference in mean-
ing (which was realised as contraction in one case, no contraction in the
other).
The questions were embedded in a large test, which had all the features
of a comprehension test. All other questions had no grammatical queries
in them. The results of the test surpassed my best expectations. All native
speakers clearly spotted the difference in meaning by answering the first
question with 'camera' (or make of cameras) and the second (contracted)
with a Prepositional Phrase 'with a camera'. None of the Poles made this
distinction. This indicates that to-adjunction, which is one of the most
typical cases of fast speech processes, is structure dependent, and that this
structure on which it depends is an integral part of the competence of a
native speaker, which has nothing to do with the performance constraints
on which fast speech descriptions seem to be based. Thus at least in this
case, and I am sure that such cases can also be found in other languages,
fast speech phenomena have to be described in terms of competence,
rather than performance variables. Let us here close our collection of data
and see whether any of the phonological theories is able to make anything
rational of it.
98 G. Dogil

3. Pbonemics

It may be unfair to place discussion of phonemic solutions to fast speech


phenomena just behind a lengthy argument for the necessity of compe-
tence description in the account of speech variants. However, even if we
drop the competence-performace distinction, phonemic accounts, which
persistently turn up in phonological descriptions, deserve consideration.
A common procedure in phonemic analyses of speech style was to de-
fine these phenomena as: "free variation at this [phonemic] level, presum-
ably controlled by stylistic choice . . ." (Stockwell, Bowen, and Silva-Fuen-
zalida, 1956: 408). Not being very fussy about internal inconsistency
within the theory (free variation at the phonemic level), anyone would
agree that various realizations of items like for instance:

es viudo 'he is a widower' it would have been funny


Slow: [esbyu5o] [it wud haev bin fAni]
Fast: [ezßyu5o] [ire bin fAni]
(Stockwell et. al., 1956) (Selkirk, 1972)

are not free variants in any sense, but rather that there is a regular phone-
tic correspondence between these realizations. This goal, establishing
phonetic correspondence, is the weakest point of any phonemic theory,
not only in descriptions of speech styles. The only possiblity for a phon-
emicist is to perform analyses separately for each style, which leads to dif-
ferent phonemic solutions. So described styles are then defined as 'coexist-
ent phonemic systems' and the linguistic variation is accounted for as
'phonemic stability vs. instability'. Such an analysis obviously loses all the
intuitive insights about speech variation. Additionally, the very restrictive
view of phonology which is characteristic of pronouncements of phonemi-
cists, that phonological representation is unstructured and that nonphono-
logical information should not be used in phonemic analysis, forbids this
model of phonology to make any claim (observation) about the cases of
structural adjacency which I noted in § 2.2. I think this suffices to claim
that phonemic theory does not possess either necessary or sufficient ap-
paratus to describe speech styles, their variation and linguistic constraints
on their use. There is, however, one area in speech style research which is
fairly neatly covered by phonemics. The correspondence between the
choice of style and the choice of dialect (cf. Dressier 1972a: 15, 1975: 222)
is, in my opinion, sufficiently accounted for by the definitions of 'coexist-
ent phonemic systems' and 'phonemic stability vs. instability' (cf. Fries &
Pike 1949). However, the problems of regular phonetic correspondence
between styles and across styles and the structure dependence of fast
speech rules remain unaccounted for. I will argue that this is a problem of
Fast/Casual Speech 99

all theories that concentrate on phonological representations while paying


little attention to phonological processes (cf. H e r o k & Tonelli, 1979:
43 ff.). T h e most recent theory that considers establishing 'proper' p h o n o -
logical representations is Natural Generative Phonology, which I will dis-
cuss immediately.

4. Natural Generative Phonology (NGP)

N G P is a reaction to Generative Phonology (standard version of


Chomsky & Halle, 1968), a reaction which in my opinion is based on a
misunderstanding of the goals of Generative Phonology. N G P under-
stands SGP as another form of discovery procedure, superior to phonemic
analysis, which aims at establishing 'proper' or 'psychologically real'
phonological (systematic phonemic) representation. I am not going to
argue why this is a misunderstanding (an interested reader may confront
Anderson, 1974, Gussmann, 1980, or Kenstowicz & Kisseberth, 1977).
T h e apparent criticism provided by N G P , which argues f o r limitation of
abstractness of underlying phonological representation, is based on the as-
sumption that phonological rules express only surface generalizations, i. e.
those that can be stated by reference to phonetic environments, including
the syllable boundary and the pause boundary. This claim, called the True
Generalization Condition (TGC, cf. Hooper, 1976), actually does nothing
more than relate surface forms. 5 This is obviously not enough to account
f o r the fast speech phenomena mentioned in § 2.1 and § 2.2 above. Again,
nothing is said about the 'process' of phonological change which takes
place in fast speech. Let us illustrate this by looking at some explicit at-
tempts to account for speech variation in N G P .
Rudes (1976) claims that there are only four possible variants of Ί like
Edinburgh' in one idiolect (Western New England and M o h a u k Valley di-
alect area) (cf. Figure 1).
I do not think that one can maintain Rudes' derivation and interpreta-
tion. Firstly, it is unbelievable that a native speaker has just four options to
choose from. Secondly, given Dressler's comment on the interdependence
between dialect and style, it is hardly likely that a speaker derives his fast-
allegrissimo forms from slow-careful forms as Rudes suggests (cf. Dress-
ier, 1975, Rennison, 1981).
Another attempt to account f o r fast speech phenomena within N G P has
been made by Bolozky (1977). We have presented some of his data in
§2.1. Bolozky claims (1977: 220) that fast speech is a system of rules in it-
self having also its own constraints, constraints which should be banned

5
Gussmann (1980 ch. 1) has convincingly argued that TGC makes impossible any signifi-
cant conception of phonological or morphological structure.
100 G. Dogil

I like Edinburgh

Slow-careful Φ Φ φ φΦ φ Φ
äe ldek ed η blrou
Φ φΦ ί
Casual-moderate φ φ φ

äe ldek ed η btroa 0

Fast-allegro $ $ $ $ $
äe ldeked n^traa®

Fast-allegrissimo $ $ $ $
ä ldked mblra u

Figure 1

from normal speech. Under closer investigation all these constraints turn
out to be universal, rather than language or style specific. I will offer an
explanation for them in § 7. Bolozky (1977: 221) mentions the normal
speech processes and constraints that do not apply in fast speech. That
some normal speech constraints are loosened in fast speech, or even new
contrasts emerge (cf. Gussmann, 1980: 127-28), is self-evident, and can be
accounted for on universal grounds (cf. § 5, § 7 following).
The elimination of a speech process from fast speech is a more compli-
cated issue, and can be accounted for in terms of process phonology,
which the NGP is not. The only example (quoted after Rudes 1976) where
the process

V [ a long] / [a voice]

in English is banned from fast speech is not very convincing. 6 Besides that,
Bolozky concentrates on TEMPO as a variable in fast/casual speech thus
introducing an undefined concept which is alien to his theory. The depen-
dence of derivational constraints on tempo has to be shown using the ap-
paratus of one's own theory and not assumed. Calling some phenomenon
'TEMPO bound' is disguising but not explaining it.
Summarizing: I am very critical of NGP as a model for describing fast
speech. First of all, it is not structured enough to account for phenomena

6 The whole problem may lie in the improper phonetic interpretation that Rudes provides.
The absence of lengthening of /i/ in front of a flap in [hint] which is a fast speech variant
of [hlttt] can be explained in such a way that flapping is understood primarily as laxing rule
and not as voicing rule. The same applies f o r glottalization in [hi' ni:a] which is not fol-
lowed by vowel shortening (cf. Kiparsky, 1979: 437); Gussmann (1980: 1 2 7 - 2 8 ) provides a
much better example).
Fast/Casual Speech 101

mentioned in § 2.2. Secondly, it obscures the distinction between standard


and dialect, and by its statements of surface relatedness is not able to show
the interdependency between dialects and casual styles. Last but not least,
through its taxonomic interest it is in principle unable to show the 'regular
phonetic correspondence' which is so characteristic of speech variation.
Having presented two taxonomic models let us look now at the process
models of phonology and see whether those can avoid the pitfalls of
phonemics and N G P and whether they establish a solid basis for the anal-
ysis of speech style. I start with process phonology in its most extreme
form i. e. Natural (Process) Phonology invented by D. Stampe and deve-
loped by Dressler, Drachmann, Donegan, Tonelli and others.

5. Natural (Process) Phonology (NPP)

N P P is extreme in this sense that it disregards all the distributional criteria


of items, or rather makes them derivative from natural phonological pro-
cesses. 7 The idea is that every speaker of natural language is equipped
with a class of universal phonetic substitutions (natural processes) which
constitute his creative knowledge of phonology, i.e. enable him to con-
struct all plausible sound patterns. In the context of a particular language,
these universal substitutions are modified accordingly, constituting pro-
ductive phonological rules as well as live phonotactic constraints. Phono-
logy also includes a fairly large unproductive part, i. e. rules that have to
be learnt, but those can also be explained as fossilised, morphologised, or
telescoped natural processes (cf. Dressier, 1977b). The very strong side of
the N P P is its simplicity and intuitive appeal. It is enough to postulate just
two basic types of processes, which are called fortitions (clarification pro-
cesses) and lenitions (obscuration processes) respectively. The two types
of process also fulfill two distinct functions in phonology: the obscuration
processes serve the pronouncability of phonetic strings, the clarification
processes their perceptibility. 8 Fortitions and lenitions are also kept strictly
apart in their order of application, fortitions universally preceding leni-
tions. The considerable differentiation of individual phonologies in natu-
ral languages is explained in the N P P by various constraints that individ-
ual languages place on process application. Formally, however, those
constraints are universally valid and can be presented as universal hier-

7 In Dressler's polycentristic model the idea of natural process is generalized to all other
components of grammar (cf. Dressier, 1977).
8 This correlation, which has been convincingly established in many places, may dull the at-
tention which has to be paid to phonetic and phonological detail. If this is not the case we
may be confronted with notional fallacy of describing a form by its function, which was
abolished in linguistic methodology half a century ago, but which unfortunately crops up
in some of the N P P analyses.
102 G. Dogil

archies of process application going from unconstrained application,


through limitation (morphological, lexical or other) to total suppression
of a process (cf. Stampe, 1969: 1972; Dressier, 1977b). What is interesting
for our discussion is that, looking at the data in § 2, we find out that all
the substitutions noted there are in fact examples of natural phonological
processes, namely the obscuration processes (lenitions). Thus we notice
nasal assimilation, syncopy, laxing etc. There is not a single example of
what we could phonetically define as a fortition process. On top of that,
. . interstylistic differences of phonetic patterns become easily derivable
if one assumes that the constraints on the application of phonological pro-
cesses vary systematically from style to style" (Herok & Tonelli, 1979: 45).
Thus, one can define casualness or stylistic variation in terms of applica-
bility of a universal phonetic process. This brings us very close to the de-
finition of speech style; none of the previous theories has ever been so
close to this goal. We notice that the applicability constraints of the uni-
versal phonetic processes form a necessary criterion for a definition of
style; is it also a sufficient criterion? It probably is for the data in § 2.1.
Notice, however, that NPP faces difficulties in constraining its processes.9
The NPP, it needs to be stressed again, does not distinguish between pro-
cesses and phonotactic constraints; just the opposite, it claims that all pho-
notactic (morpheme structure) constraints can be derived from the limita-
tions on natural processes:

A language X can be said to choose among this class (of universal phonetic substitutions,
G. D.) all the processes operative in X, and adapt them in such a way that they act as lan-
guage-specific morpheme structure conditions and phonological rules respectively, produ-
cing but language-specific phonetic output forms; that is to say, the application of natural
phonological process within a particular language is controlled by language-specific con-
straints (Herok & Tonelli, 1979: 44-45).

As I understand it, there is nothing then to prevent the change of the con-
straints along with the change of chosen processes and limitations of
them. The data in § 2, however, shows clear limits on application which
do look like constraints; cf. Zwicky's Welsh and English data (§ 2.1).
I will assume then, that some form of phonotactic or similar constraints
have to be added into the description of fast speech data if such a descrip-
tion is to be sufficient (cf. § 7). Another area where the NPP seems to be
insufficient is connected with the data in § 2.2. I have provisionally men-
tioned that any adequate account of this data has to have a possibility of
making statements about structural adjacency of the items which undergo

' To put it bluntly, why isn't [papapa . . .] or something of this sort an optimal outcome of
NPP (I am not sure about the [p] as an optimal outcome; however, the unconstrained natu-
ral application of universal substitutions for vowels does ultimately lead to [a] (cf. Done-
gan, 1979)).
Fast/Casual Speech 103

changes in fast speech. This provides a real problem for the NPP analysis,
as it views phonological strings as unstructured. This proviso definitely
applies to any syntactic, morphological or semantic structuring of a pho-
netic string, which automatically leaves 'to-adjunction' phenomena men-
tioned in § 2.2 unexplained. Donegan and Stampe (1978: 27 ff.) stress ex-
plicitly the prosodic structuring of phonetic strings, but the only prosodic
concepts they have worked out are the syllable and the accentual measure.
This enables them to explain a lot of fast speech data, but the lack of
other, higher, prosodic categories leaves another stock of data unex-
plained.
Summing up: although being highly sympathetic to Dressler's state-
ment, "Das geeignetste phonologische Modell für die Allegroregeln
scheint D. Stampe's 'Natural Phonology' zu sein", I am obliged to say that
in its present form this model is not sufficient to describe the relevant
data. It seems obvious to me that the representations with which the
model works have to be enriched in structure. In view of the data in § 2.2,
the very restrictive conception of phonological representation cannot be
maintained; otherwise, very important observations have to be left unex-
plained. Leaving 'to-adjunction' and similar phenomena aside, the repre-
sentatives of the NPP have noticed themselves the relevance of grammati-
cal and lexical considerations in speech-style related areas. For instance
Dressier (1975: 227) points out that Allegro rules can be observed in dia-
chronic phonology especially as: " . . . 'unregelmäßiger Lautwandel' und
Mutilationen, die sich besonders bei Auxiliarformen, Pronomina, Arti-
keln, Partikeln . . . in Dokumenten vergangener Sprachepochen finden".
The phonology which programmatically excludes grammatical and lexical
information can make only anecdotal, theory-alien observations of such
phenomena. Dressler's extension of the NPP in the form of a polycentrist-
ic grammar is in principle able to account for such interactions between
various components; however, we are still waiting for an explicit account
of natural processes in other components, and of naturalness conflicts (for
some preliminary attempts cf. Dressier, 1977a, b; Dogil, 1980, 1981). It is
to be stressed, however, that a process phonology, an example of which is
the NPP, provides much better account of speech variation than a taxono-
mic phonology. Let us consider then another model of process phonology,
which unlike NPP has always payed very much attention to structure.
This model is Standard Generative Phonology.

6. Standard Generative Phonology

One of the basic assumptions of a generative phonological framework has


always been that phonological structure of linguistic strings (sentences in
TG, which I assume) follows from their syntactic properties. This particu-
104 G. Dogil

lar assumption induces an analyst to encode the structural adjacency ob-


servations like those noted in § 2.2 in a phonological string, enabling
principally the explanation of data with which other theories had such in-
credible problems.
Another assumption of SGP, which has been often criticised but even
more often shown to be indispensable (cf. Kenstowicz & Kisseberth, 1977,
ch. 1; Gussmann, 1980), has been that underlying phonological forms
(systematic phonemic representations) have to cover the whole range of
morphophonological variation. This can also include the phonostylistic
variants of lexical items, i.e. our data presented in § 2.1.
Thus, methodological principles of SGP provide a sufficient back-
ground foi· an adequate theory of phonological variation. Let us see how
this has been made use of. The seminal account of speech variation based
on string adjacency data (§ 2.1) within the SGP has been provided by Har-
ris (1969). Harris, devising his rules for various speech styles in Spanish
(he has concentrated on andante and allegretto), has noticed that the only
difference between them is the presence vs. the absence of a grammatical
boundary in the structural description of the respective rules. Consider his
rule of nasal assimilation (Harris, 1969: 14, 56, cf. figure 2).

α cor
α cor
β ant
β ant
[ + nasal] γ back ANDANTE
γ back
δ distr
δ distr
+ obstr

α cor α cor
β ant β ant
[ + nasal] / — (#) ALLEGRETTO
γ back γ back
δ distr δ distr
+ obstr

Figure 2: Nasal Assimilation Rule in Spanish (Harris, 1969: 14, 56).

In Allegretto the word boundary ceases to be a block to the phonologi-


cal rule. Hence a speech style can be defined theory-internally as a sensi-
tivity of a general phonological rule to various boundary markers. Notice
that the same mechanism can explain some of the data mentioned in § 2.2.
Selkirk (1972) explains this data by using the so-called 'trace theory' of
movement rules which has been suggested by Chomsky and his students,
and which assumes that transformational movement rules do not just
Fast/Casual Speech 105

move constituents around but that they leave 'traces' in the original places
out of which these constituents have been moved. 10
Selkirk (1972: 74 ff.) proposes that word boundary markers are intro-
duced in initial phrase markers, and when transformations move or delete
elements they leave the boundary markers unaffected. In this way she ex-
plains lack of copula contraction in English (cf. § 2.2), and the same
procedure can be used to explain 'to-adjunction'. 11
Consider the following pair of sentences:

a) Who do you want to succeed?

b) Who do you wanna succeed?

As we pointed out in § 2.2, if a) derives from underlying

a') You want who to succeed


adjunction is not possible no matter how quickly or carelessly one speaks.
This is explained in this way that 'who' which is moved by 'wh-movement'
leaves its boundaries behind and they in turn block the application of the
'want to wanna' rule.
Selkirk's analysis faces a lot of problems of detail, as also do other ex-
planations of this data based on other assumptions about syntactic theory
(cf. King, 1970; Bresnan, 1971; Lakoff, 1970; Postal & Pullum, 1978;
Chomsky & Lasnik 1977, 1978). The discussion of those could lead us too
far astray, and I would rather present an alternative account in § 7. Now I
would like to concentrate on one basic inconsistency which is involved in
this 'boundary sensitive' account of speech style of SGP. One of the goals
of SGP is to construct its descriptions in such a way that rules describing
more general phenomena should be formulated in simpler terms. This is a
methodological requirement of the theory and cannot be in any sense dis-
pensed with, if a theory wants to preserve its value. Notice, however, that
phonological rules for fast speech, which describe phenomena of a greater
degree of generalization than rules for slow speech, are formally more
complex, i.e. they include boundary markers. This obviously leads to a
serious inconsistency in the theory. An attempt has been made by McCaw-
ley (1968) to avoid this inconsistency by specifying a 'rank' on the rules
rather than mentioning boundaries explicitly in the rules themselves.
Thus, andante rules will have 'rank # ' while allegretto rules have 'rank #
# ' , i.e. allegretto rules would apply across # boundaries but not across
# # boundaries. I think, however, that this is a formal trick rather than
anything else, and would only reposition the discussion of the status of

10
For details cf. Radford (1981: 181-96).
11
But see Postal & Pullum (1978 p. 19ff.)
106 G. Dogil

grammatical boundaries from their place in phonological representations


into some other place.
Another area where the requirements of the theory are different from
the requirements of description is that the rules which are formally similar
in their descriptive statements should be collapsed into one rule. Thus, if
we consider a colloquial pronunciation of a word like German 'geschnit-
ten' we can observe the deletion of two unstressed vowels, which in conse-
quence leads to the phonetic form [kfnitn]. The SGP theory would re-
quire collapsing of these two deletion processes into a single rule of the
form:

e -»· 0 / / [ + stress] ( / / = Mirror image rule environment.)

This formulation enforced by the theory clearly misses the descriptive fact
that constraints on preaccentual deletion are quite different from the con-
straints on postaccentual deletion (cf. Dressier, 1975).
Still another area where the theoretical assumptions of SGP lead to un-
desirable results when fast speech data are considered is the account of
phonotactic constraints on representations. The classic SGP (cf. Chomsky
& Halle 1968; Stanley, 1967) accounts for these in the form of language
specific, linear 'morpheme structure constraints'. Thus there is a MSC in
English which prohibits two obstruents from following each other at the
beginning of a word. This constraint, and also many others, are falsified
by the data presented in § 2.1 (cf. Bolozky's and Zwicky's data).
Last but not least, one of the most important goals of SGP, the attempt
to cover the total range of morphophonological variation with the same
type of phonological rules, makes it impossible to capture the observation
that it is only the most 'natural', minimal phonetic substitutions which fi-
gure in speech variation accounts. The common procedure of saying that
it is the 'late phonological rules' which are a matter of study does not
make the issue in any way clearer, as the SGP is not in a position to define
what is meant by 'late phonological rule' and how it is to be distinguished
from 'early phonological rule'. Of the many attempts which try to do that
(Koutsoudas, 1977; Wurzel, 1970; Dinnsen & Eckman, 1977; Leben,
1977; Linell, 1977, etc.), the most promising seems to be the hypothesis of
the so-called 'strict-cyclicity' of phonological rules. This hypothesis distin-
guishes between 'cyclic' and 'post-cyclic' phonological rules. It seems to
me that most of the rules which figure in speech style variation could be
subsummed under the latter group (cf. Mascaro, 1976; Rubach, 1981).
Summing up, one is obliged to say that the SGP, although providing a
sound frame for the description of fast speech, still does not provide a
model which would define the special character of stylistic speech varia-
tion. Its representations, features and rules do not make these phenomena
follow from the way they are represented, but leave them rather as no
Fast/Casual Speech 107

more than arbitrary typological observations. Let us look then at the most
recent development of generative phonology, which attempted a basic re-
vision of some of the postulates of the SGP, and see how this accounts for
the phenomena noted in § 2.

7. Metrical and autosegmental phonology (MAP)

Metrical and autosegmental phonology, which can be said to have com-


plementary areas of application, can be subsumed under one heading
based on their attempt to revise the linear conception of phonological re-
presentation within generative grammar. Autosegmental phonology,
which has been primarily concerned with tone and vowel harmony pro-
cesses (cf. Goldsmith, 1976; Clements, 1977), argues for a multi-linear
phonological representation with harmony or tone features appearing on
separate levels called tiers. Segmental and 'prosodic' tiers are mediated by
n-ary branching trees. An autosegmental representation in a tone language
(Lomongo) might look like the following:

segmental tier: t s w e m i "You who lead me away".

tonological tier: L Η Η

The very simple procedures of autosegmental phonology have made it


possible to capture many intricate features of tonal and harmonic systems
which were by definition excluded from the SGP descriptions. An exten-
sion of autosegmental theory which is interesting from the point of view
of the analyst of speech style variation has been attempted in Kahn (1976).
Kahn has suggested that not only tones and harmonic features but also
syllables should be represented on a separate tier. Thus a representation of
an English word 'asparagus' would look something like the following:

a s p a s r a g s s segmental tier

syllabic tier

An interesting idea introduced in Kahn (1976) and developed in Skib-


niewski (1980) is that rules associating syllabic and segmental tiers will
change with respect to speech tempo. In this way, different types of asso-
ciations will provide different contexts for the rules, which in turn lead to
differences in phonetic outputs = differences in speech style. This model
108 G. Dogil

is clearly superior to the SGP model in several respects: it avoids internal


inconsistency; it clearly defines a group of rules that figure in speech style
variation as rules depending on syllable structure. However, this model is
also as clearly insufficient: it does not solve the problem of the apparent
violations of the Morpheme Structure Conditions; it has also nothing to
say about the data we presented in § 2.2. On top of that I have serious re-
servations about the unconstrained application of the autosegmental for-
malism. In my opinion, autosegmental phonology is a hypothesis about
the organisation principles concerning some suprasegmentals such as tone
or vowel harmony, and it should not be used as a discovery procedure for
supra/subsegmentals. The lack of such kinds of constraint has led to nu-
merous 'autosegmental' analyses which have nothing to do with supraseg-
mental phenomena, and may lead to even more (cf. McCarthy, 1979; Har-
ris, 1980). Merely to label a set of phenomena as 'autosegmental' or
'non-linear' in a grammar overlooks the fundamental questions: why are
these phenomena, and not others, autosegmental, and why do autoseg-
ments exist at all? Grammars which simply list an apparently heterogen-
ious set of constructions as 'non-linear' (cf. Halle & Vergnaud, 1980) lack
both generality and naturalness and are psychologically implausible.12
Another modification of the SGP, which has been at its outset con-
cerned with the problems of stress and rhythm, and which is commonly la-
belled 'metrical phonology', claims that every phonological representation
has a specific suprasegmental organisation. This organisation (also called
metrical or prosodic structure) is represented in a form of branching trees
with categorial nodes like the following: syllable (σ); foot (Σ); phonologi-
cal word (ω); phonological phrase (Φ); intonational phrase (I); utterance
(U). I assume that the tree is binary-branching on the lexical level (σ, Σ, ω)
and multiply-branching on the higher levels. Additionally, for the pur-
poses of stress, rhythm and accent, the categories are put in functional re-
lations; strong/weak in the lexical part; and rise/fall (or higher/low, (Cle-
ments 1981) or 'cricothyroid'/'sternohyoid' ( j / j . ) pulse (Gibbon, this
volume) in the phrasal part.
This distinction in labelling enables us not only to distinguish between
'lexical' and 'phrasal' phonology (cf. Kiparsky 1981) but also shows rela-
tive independence of rhythmic patterns within words and of words them-
selves and their role in the placement of intonation contours (cf. Gibbon,
1981).

12
The type of constraints I have in mind could be something like the following:
"Once a certain feature is autosegmentalised, it has to be autosegmental everywhere in a
given language, i. e. there is no possibility of autosegmentalisation of features which apply
to individual contexts or parts of representations only."
or something like suggested (but not followed) by Clements, 1977:
"Each autosegmental tier has to be affected independently by rules specific to it."
Fast/Casual Speech 109

The prosodic/suprasegmental organisation is independent of syntactic


structure; however, the prosodic domains (σ, Σ, ω, Φ etc.) are defined on
the syntactic domain.
In this way the assumption of SGP that the phonological properties may
follow from syntactic properties is preserved. Selkirk (1980c) suggests the
following schematic characterisation of this theory:

{St Si, Sk} OS/P { P j . . . . Pn}

Syntax Phonology, Phonology 2

Phonology 1 Constitutes the block of rules which depend on syn-


tactic or morphological structure (phono-syntactic or
morphophonological rules). As these rules never fi-
gure in speech style accounts (cf. § 5, 6), we will not
be concerned with this part of the model.

Syntax ->· I will assume here one of the versions of the so-called
T-model proposed within the Extended Standard
Theory of Generative Grammar. 13
13
Specially two models may be taken into consideration: that of Chomsky & Lasnik (1977):
Base Rules
i
move-α
1
S-Structure

Phonetic Form Logical Form


Filters Control
Deletions etc. Construal etc.

and that of Riemsdijk & Williams (1981):

Base Rules
1
move-NP
I Binding
NP-structure - {LF, Indexing - semantic
i Construal Interpretation
move-Wh
i
S-Structure
i
Filters
Deletions etc.
Phonetic Interpretation
110 G. Dogil
Fast/Casual Speech 111

Φ S/P -»· This is a central part of phonology which defines the


proper mapping of syntactic structures onto prosodic
structures. It is done in a form of procedure which for
each prosodic category k defines its syntactic domain
( D 0 ; its principles of construction (C^) and its promi-
nence principles (P^). For example, the prosodic cate-
gory phrase (Φ) will be mapped in the following way
(cf. Selkirk, 1980a: 15 ff.):

D<D ->- Every Φ ends in the head of a syntactic


phrase.
Co —<- i) An item which is the specifier of a
syntactic phrase joins with the head
of the phrase,
ii) An item belonging to a "non-lexi-
cal" category such as Det. Prep.
Conj. Aux. joins with its sister con-
stituent.
Ρφ -*• Given the nodes φ[Νι, . . -,Ν η ]φ, N n is
strong (is a pulse).

Such a mapping, which has to be provided for all prosodic categories


characteristic of a given language, creates a proper representation in
Phonology 2. This representation has the form of a prosodic tree (cf. Fi-
gure 3).

In what follows I will argue that such a type of representation is neces-


sary and sufficient to properly describe all the data mentioned in § 2. Ad-
ditionally it will be claimed that the domains of speech style processes
which we disjunctively defined as string adjacency and structural adja-
cency can be jointly accounted for under the assumption of prosodic adja-
cency.

The cases I have noted in § 2 all concern natural phonological processes


of assimilation or reduction (followed sometimes by cliticisations). It has
been noticed that those assimilations and reductions become more numer-
ous with an increase of speech tempo or casualness. Sometimes, however,
there are blocks on those processes. Let us consider assimilations first.
Given our representation all segments can be identified as belonging to
a certain prosodic category. Phonological rules are also made sensitive to
these prosodic categories. 14 Let us take, for instance, a rule of nasal assi-
milation. In slow speech in German, this rule can apply only within the

14
I assume that only natural phonological processes are sensitive to prosodic structure.
112 G. Dogil

scope of a foot (Σ) (cf. Herok & Tonelli examples in § 2.1). In faster
speech, however, the scope of this rule becomes greater; in German it will
be a word (ω), in Italian a phrase (Φ). The same is true of all other assimi-
lation processes noted in § 2.
Prosodic structure, understood as a specific mapping of syntactic struc-
ture, also explains the fact why assimilation does not take place in certain
contexts. Let us consider the notorious 'to-adjunction' case in some detail.
The phonological process concerns a coronal consonant in a specific
context: (deletion after a nasal, deletion after another coronal, voicing as-
similation as in 'hafta', flapping, etc.). All of these natural processes,
which in English are foot bound (their scope is a foot), can in fast speech
apply within the domain of a word (ω), or a phrase (Φ). They are barred,
however, from application on the higher prosodic domains. That is why
the application of assimilation in such cases where verb and 'to' belong to
two different prosodic phrases, or even intonational phrases, is not possi-
ble. Consider the following examples from § 2.2:

((They want)i (to all intents and purposes)i (to destroy us)i) u
* They wanna . . .

(((I am going)® (to the meeting)a>)i)u


* I am g o n n a . . .

Sometimes, however, even phrase internal 'V + to' structures do not con-
tract; consider the following:

a) (Who do you wanna succeed)<D


b) * (Who do you wanna succeed you)<j>

This is captured within metrical theory by the assumption that the pros-
odic structure is a mapping of the syntactic structure. The syntactic struc-
ture for b) will contain an empty category15 between a verb and 'to'; it will
look something like:

[Who [you want [[e] to succeed you]]]

This empty category is visible to the assimilation rule, i. e. the rule cannot
apply within this context. There are certain problems with this analysis,
which have been noted very often in recent literature on syntax (cf.
Chomsky & Lasnik, 1977: 78; Postal & Pullum, 1978; Andrews, 1978;
Jaeggli, 1980; Chomsky, 1981), because [e] (also called wh-trace), is not
the only empty category figuring in the syntactic structures which are

15
For a syntactic motivation for this and other empty categories see Chomsky (1981).
Fast/Casual Speech 113

mapped onto prosodic representation. The problem is that none of the


other empty categories P R O and [e]>jp is visible in the same sense as [e].
Thus, neither PRO nor [ej^p block contractions in:

Who do they wanna visit?


Who do they want [PRO to visit [e]]

Some of the guys usta audit my course.


[Some of the guys] used [[eJ^jp to audit my course]

Leaving the syntactic questions like cyclic or non-cyclic character of move-


ment rules (in the first case we end up with even more empty categories
(traces) on the S-Structure) I suggest that the explanation of visibility of
[e] and invisibility of other empty categories has to do more with prosodic
processing than with syntactic processing. Jaeggli (1980), Chomsky (1981:
180-82), Riemsdijk & Williams (1981: 175-177) and Postal & Pullum
(1982), all argue that it is the syntactic processing of empty categories
which decides about their visibility in prosodic rules. In my opinion the si-
tuation is much simpler and has to do with the ability of competent
speaker to recognize the potentials for prosodic salience within a sentence.
Notice that only those empty syntactic categories which can be filled with
lexical material are visible to prosodic structure. Thus I suggest a con-
straint on mapping which may read something like the following:

Only those empty categories which can contain


lexical material play a role in prosodic structure.

Notice that the constraint does not specify whether these are the catego-
ries of NP-structure, S-Structure, Logical Form, or any other relevant re-
presentation, thus providing for prosody to be an independent module in
Generative Grammar, "filtering the sequences generated by the syntactic
(or other, G. D.) component" (cf. Ronat, this volume) 16 . This constraint
also provides a neat explanation for a large portion of reduction data men-
tioned in § 2. Consider the impossibility of reduction even in the fastest
speech in cases like:

I wonder how much wine there is in the bottle.


Ready I am to help you.
What are you thinking oft

16
Such a broad formulation of the constraint on prosodic processing opens a possibility of
accounting for apparent lexical exceptions to 'to-adjunction' (cf. Postal & Pullum, 1978;
Andrews, 1979). We can claim, for instance, that lack of assimilation in cases like 'thought
to -» thoughta' is due to the thematic structure of these verbs.
114 G. Dogil

Under metrical theory as assumed here (cf. Selkirk 1980a: 19ff.), non-Iex-
ical items may be reduced in English 17 if they occupy a weak position in a
phrase. Selkirk describes it by means of the Monosyllable Rule:

<ow
I
Σ -*• a w , if ω dominates a non-lexical item.
I
σ

T h e Monosyllable Rule is phrase (Φ) bound and it applies only when a


non-lexical ω is weak with respect to a strong ω contained within the same
phrase. Consider the following pair of examples:

a) (the cops) φ (boxed)® (in)<j> (the crowd)®


b) (the sluggers)® (boxed)® (in the crowd)® (Selkirk, 1980a: 20)

In a), the prepostion 'in' does not reduce because it is the only ω within a
phrase; in b) it reduces as it is cow relative to 0)s within the same phrase.
This difference is explained by the difference brought up by mapping of
the S-structures.
The S-structure for a) is approximately:

T h e cops boxed [[in]p]pp [the crowd]NP

whereas for b) it is:

T h e sluggers boxed [[in]p [the crowd]j\jp]pp

Similarly in the prosodic processing of

I wonder how much wine there is in the bottle.


Ready I am to help you.
What are you thinking oft

the empty head of the underlined function words will be counted as a pos-
sible salience bearer. Through a prosodic restructuring, the function
words assume the strong position in their phrases, and thus become im-
mune to reduction by the monosyllable rule (for one possible form of this
restructuring see Dogil, 1980b; Ladd 1980; Ronat, this volume).

17
This seems also to be true for German and, I would expect, other stress-timed languages
as well.
Fast/Casual Speech 115

In the cases presented above the reduction or its lack has been explained
by the constraints on mapping, but there are also constraints specific to
the prosodic component itself that restrict contraction. For instance, if
three function words come together in a phrase, one of them (possibly the
middle one) will be strengthened and thus may not be reduced (cf. Giege-
rich, 1981; Williams, 1980 on similar prosodic transformations).
Prosodic constraints can also explain many peculiarities of another area
typical of fast/casual speech, namely cliticization. I will not discuss any
details of cliticization here (cf. Zwicky, 1977). Let me mention, however,
some prosodic impediments which the MAP can capture. Cliticization is
usually understood as a adjunction of reduced material to the elements
immediately adjacent in the string. However, sometimes cliticization does
not follow reduction. MAP picks up those places in a fairly natural fash-
ion. One such case is when the reduced word would violate the phono-
tactic constraints on the syllable structure. Note that MAP defines the
principles on construction and prominence, as well as the syntactic domain
for each of the prosodic categories. Thus the syllable will also be defined.
The English syllable, for example, is defined as a metrical tree of the
following type shown in Figure 4.18 Now if a reduced word would in any
sense violate this template it may not get cliticized.

(w) (s) s (s) (w) (w)


Appendix
([ + cor])

Figure 4: Metrical tree templates for English syllables.

Even if cliticization is to follow reduction i.e. there is no violation of


the well-formedness of prosodic structure, the MAP can predict which
structures better undergo cliticization processes. If we assume with Gie-
gerich (1981: 5) that in English "Every lexical item in a string contains at

" Cf. Halle & Vergnaud, 1978, 1980; Kiparsky, 1979; Mohanan, 1979; Selkirk, 1980b, for
other proposals concerning the syllable structure.
116 G. Dogil

least one terminal S, with its sister to its right" we can see why English has
primarily enclitics and why cliticization goes more easily with final
stressed words and monosyllabic lexical items. (Consider from this view-
point Zwicky's data in § 2.)
Prosodic well-formedness on the level below word (ω), can also explain
some other apparent blocks to reduction. Consider, for instance, Bo-
lozky's and Zwicky's data on Modern Hebrew, Welsh and English men-
tioned in § 2. All of the reductions and lack of reduction there can be ex-
plained by well-formedness constraints on syllable structure (cf. Kiparksy,
1979; Lowenstamm, 1981; Kaye & Lowenstamm, 1982), and, in the cases
like 'deflation', 'revised', foot structure (the initial syllables are footed and
thus cannot be reduced; cf. Selkirk, 1980b; Dogil, forthcoming).
Summing up, we may say that MAP, with general principles assumed by
it, provides all necessary and sufficient criteria for the description of fast/
casual speech. This theory also possesses the necessary clarity and preci-
sion (cf. Vergnaud, 1977) so that the linguist who employs it is not forced
to adjust it to those features which are peculiar to the data he is examin-
ing. Quite to the contrary, the MAP model rationalises the special pro-
perties of fast/casual speech, in the sense that they follow directly from
the way the phenomena are represented. This does not presuppose that
MAP is the final stage in the research of those phenomena. Socio-, psy-
cho- and discourse-context features (cf. Dressier, 1975; Postal & Pullum,
1978) play a very important role, and have to be scrutinized in any full ac-
count of speech style variation; however, they may only function as an in-
dependent addition to the basic grammatical description.

References

Anderson, S. (1974). The Organization of Phonology. New York: Academic Press.


Andrews, A. (1978). Remarks on To adjunction. LInq. 9.: 261-68.
Aronoff, M. & M.-L. Kean (eds.) (1980). Juncture. Saratoga: Anima Libri.
Bell, A. & J. Hooper (eds.) (1978). Syllables and Segments. New York: North Holland.
Bolozky, S. (1977). Fast speech as a function of tempo in Natural Generative Phonology.
Journal of Linguistics 13: 217-38.
Bresnan, J. (1971). Contraction and the transformational cycle in English. Mimeo.
Butts, R. & J. Hintikka (eds.) (1977). Basic Problems in Methodology and linguistics. Dor-
drecht: Reidel.
Chomsky, N. (1981).Lectures on Government and Binding. Dordrecht: Foris Publications.
Chomsky, Ν. & M. Halle (1968). The Sound Pattern of English. New York: Harper & Row.
Chomsky, Ν. & H. Lasnik (1971). Filters and control. LInq. 8: 425-504.
Chomsky, Ν. & H. Lasnik (1978). A remark on contraction. LInq. 9: 268—74.
Clements, G. (1977). The autosegmental treatment of vowel harmony. In Dressler et al.
(eds.) (1977: 111-121).
Clements, G. (1981). A hierarchical model of tone. In Dressler et al. (eds.) (1981: 69-77).
Dinnsen, D. & F. Eckman (1977). The atomic character of phonological processes. In Dress-
ier et al. (eds.) (1977: 133-139).
Fast/Casual Speech 117

Dogil, G. (1980a). Elementary accent systems, wiener linguistische gazette 24: 1-23. Revised
version published in Dressler et al. (eds.) (1981: 89-101).
Dogil, G. (1980b). Focus marking in Polish. Linguistic Analysis 6: 221-45.
Dogil, G. (forthcoming). Destressing or defooting in English: on deciding between linear
and non-linear phonology. Folia Linguistica Europea.
Donegan, P. (1978). On the natural phonology of vowels. Working Papers in Linguistics 23.
Ohio State University: Columbus.
Donegan, P. & D. Stampe (1978). The syllable in phonological and prosodic structure. In
Bell & Hooper (eds.) (1978: 25-35).
Drachmann, G. (ed.) (1975). Akten der 1. Salzburger Frühlingstagung fiir Linguistik. Tübingen:
G. Narr Verlag.
Dressler, W. (1972a). Phonologische Schnellsprechregeln in der Wiener Umgangssprache.
wiener linguistische gazette. 1: 1-29.
Dressier, W. (1972b). Allegroregeln rechtfertigen Lentoregeln. Innsbruck: IBS.
Dressler, W. (1975). Methodisches zu Allegroregeln. In Dressier et al. (eds.) (1975:
219-234).
Dressler, W. (1977a). Elements of a polycentristic theory of word formation, wiener linguis-
tische gazette 15: 13-32.
Dressler, W. (1977b). Grundfragen der Morphonologie. Wien: Verlag der Österreichischen
Akademie der Wissenschaften.
Dressler, W. & F. MareS. (eds.) (1975). Phonologica 1972. München: Fink Verlag.
Dressler, W. & O. Pfeiffer (eds.) (1977). Phonologica 1976. Innsbruck; Innsbrucker Beiträge
zur Sprachwissenschaft.
Dressier, W., O. Pfeiffer & J. Rennison (eds.) (1981). Phonologica 1980. Innsbruck; Inns-
brucker Beiträge zur Sprachwissenschaft.
Fries, W. & K. Pike (1949). Coexistent phonemic systems. Language 25: 29-50.
Gibbon, D. (1981). A new look at intonation syntax and semantics. In A. James and P. West-
ney (eds.). New linguistic impulses in foreign language teaching. Tubingen: G. Narr. 71-98.
Gibbon, D. (this volume). Intonation as an adaptive process.
Giegerich, H . (1981). On the nature and scope of metrical theory. Bloomington: IULC.
Gimson, A. (1980). An introduction to the pronounciation of English. London: Arnold.
Goldsmith, J. (1976). Autosegmentalphonology. Bloomington: IULC.
Gussmann, Ε. (1980). Studies in abstract phonology. Cambridge, Mass.: M I T Press.
Halle, M. & J.-R. Vergnaud (1978). Metrical structure in phonology. Ms., MIT.
Halle, M. & J.-R. Vergnaud (1980). Three dimensional phonology. Journal of Linguistic Re-
search 1: 83-106.
Harris, J. W. (1969). Spanish phonology. Cambridge, Mass.: M I T Press.
Harris, J. W. (1980). Nonconcatenative morphology and Spanish plurals. Journals of Linguis-
tic Research 1: 15-33.
Herok, T. & L. Tonelli (1979). How to describe phonological variation. Papers and Studies in
Contrastive Linguistics X: 41-55.
Hooper, J. (1976). An introduction to Natural Generative Phonology. New York: Academic
Press.
Jaeggli, O. (1980). Remarks on To contraction. LInq. 11.1.
Joos, M. (ed.) (1963). Readings in Linguistics I. New York: American Council of Learned So-
cieties.
Kahn, D. (1976). Syllable Based Generalizations in English Phonology. Bloomington: IULC.
Kaye, J. & J. Lowenstamm (1982). Syllable structure and markedness theory. GLOW, Pisa
(proceedings to appear).
Kenstowicz, M. & Ch. Kisseberth (1977). Topics in Phonological Theory. New York: Academic
Press.
King, H . (1970). On blocking the rule for contraction in English. LInq. 1: 134-36.
Kiparsky, P. (1979). Metrical structure assignment is cyclic. LInq. 10: 421-41.
118 G. Dogil

Kiparsky, P. (1981). Remarks on the metrical structure of the syllable. In Dressler et al.
(eds.) (1981: 245-257).
Koutsoudas, A. (1977). On the necessity of the morphophonemic-allophonic distinction. In
Dressler et al. (eds.) (1977: 121-127).
Ladd, R. (1980). The Structure ofInternational Meaning. Bloomington: IULC.
Lakoff, G. (1970). Global rules. Language 48: 76-87.
Leben, W. (1977). On the interpretive function of phonological rules. In Dressler et al. (eds.)
(1977: 21-29).
Lightfoot, D. (1976). Trace theory and twice moved NP's. LInq. 7: 559-82.
Linell, P. (1977). Morphonology as part of morphology. In Dressler et al. (eds.) (1977:
9-21).
Lowenstamm, J. (1981). On the maximal cluster approach to syllable structure. LInq. 12:
575-604.
Mascarö, J. (1976). Catalan Phonology and the Phonological Cycle. MIT dissertation. Distri-
buted by IULC.
MacCarthy, J. (1979). Formal Problems in Semitic Phonology and Morphology. MIT disserta-
tion, unpublished.
MacCawley, J. (1968). The Phonological Component of a Grammar of Japanese. The Hague:
Mouton.
Mohanan, K. (1979). On syllabicity. In Safir (ed.) (1979: 182-191).
Postal, P. & G. Pullum (1978). Traces and the description of Englisch Complementizer Con-
traction. LInq. 9: 1-29.
Postal, P. & G. Pullum (1982). The contraction debate. LInq. 13: 122-138.
Radford, A. (1981). Transformational Syntax: A Student's Guide to Chomsky's Extended Stand-
ard Theory. London: CUP.
Rennison, J. (1981). Bidialektale Phonologie. Wiesbaden: F. Steiner.
Riemsdijk, Η. & E. Williams (1981). N P structure. Linguistic Review 1: 171-217.
Ronat, M. (1982). Logical form and prosodic islands. This volume.
Rubach, J. (1981). Cyclic Phonology and Palatalization in Polish and Englisch. Warszawa: Uni-
wersytet Warszawski.
Rudes, B. (1976). Lexical representation and variable rules in Natural Generative Phonology.
Glossa 10: 111-150.
Safir, K. (ed.) (1979). Papers on Syllable Structure, Metrical Structure and Harmony processes.
Cambridge, Mass.: MIT.
Selkirk, E. (1972). The Phrase Phonology of English and French. MIT dissertation, distributed
by IULC: Bloomington.
Selkirk, E. (1980a). On Prosodic Structure and its Relation to Syntactic Structure. Bloomington:
IULC.
Selkirk, E. (1980b). The role of prosodic categories in English word stress. LInq. 11:
561-605.
Selkirk, E. (1980c). Prosodic domains in phonology: Sanskrit revisited. In Aronoff & Kean
(eds.) (1980: 107-129).
Skibniewski, L. (1980). Fast speech and syllabification in English. Ms., UAM Poznan.
Stampe, D. (1969). The acquisition of phonetic representation. CLS 5: 443-54.
Stampe, D. (1972). How I Spent my Summer Vacation. Ph.D. dissertation, distributed by
IULC: Bloomington.
Stanley, R. (1967). Redundancy rules in phonology. Language Ay. 393-435.
Stockwell, R., J. Bowen & I. Silva-Fuenzalida (1956). Spanish juncture and intonation. Lan-
guage 32. Reprinted i n j o o s (ed.) (1963: 406-18).
Sweet, H. (1890, 1908). The Indispensable Foundation. Repr. by Henderson (ed.) (1971). Lon-
don: Oxford University Press.
Vergnaud, J.-R. (1977). Formal properties of phonological rules. In Butts & Hintikka (eds.)
(1977: 299-318).
Fast/Casual Speech 119

Williams, E. (1980). Remarks on stress and anaphora. Journal of Linguistic Research 1;3:
1-17.
Wurzel, W. (1970). Studien zur deutschen Lautstruktur. Studia Grammatica VIII, Berlin.
Zwicky, A. (1970). Auxiliary reduction in English. LInq. 1: 323-36.
Zwicky, A. (1972). On casual speech. CLS 8: 607-15.
Zwicky, Α. (1977). On clitics. In Dressler et al. (eds.) (1977: 29-41).
ANTHONY FOX

Subordinating and Co-ordinating Intonation Structures in the


Articulation of Discourse

The standard approach to the description of intonation, especially in the


extensive pedagogical tradition of English intonation studies, is to estab-
lish an intonation unit (the 'tone-group' or its terminological equivalent),
to assign to this unit a pattern, usually with certain structural features such
as the 'nucleus', and to give to this pattern a meaning, generally one which
reflects the 'attitude of the speaker'. Though this approach is useful, and
has produced valuable results, it suffers from an important defect as a
model for the investigation of intonation in discourse: it can only deal
with short utterances which are largely isolated from their context. Only
in very few respects does it allow reference to more extended discourse:
certain structural features of the tone-group, notably the location of the
nucleus, are seen as reflecting the 'information' status of the element in
question, as 'new' or 'given', which naturally requires reference to a wider
context; and certain assumed meanings of the patterns themselves, such as
'statement' or 'question', inevitably relate to the function of the utterance
in discourse.
Some recent work on English intonation, such as Brazil (1975, 1978),
Brazil, Coulthard and Johns (1980), and Brown, Currie and Kenworthy
(1980), reflects a greater interest in the role of intonation in discourse.
Features that have been particularly noted as having relevance for dis-
course are the overall pitch-level (Brazil's 'key'), which can relate to the
introduction of new topics (cf. also Brown et al, p.35) and the choice of
nuclear 'tone', which can reflect the status of the information communi-
cated in the context in question by, in Brazil's terms, either 'proclaiming'
it or 'referring' to it.
This recent work partly overcomes one weakness of traditional studies
by being prepared to look beyond the tone-group, but it shares with these
studies some of the other limitations in its application to discourse. Firstly,
it remains atomistic in the sense that it is content to isolate individual fea-
tures of the intonation pattern (the overall pitch level or the type of nu-
clear pattern) and to ascribe to them a specific meaning ('new topic',
'known information' 'referring' etc.) without assessing the interrelation-
Subordinating and Co-ordinating Intonation Structures 121

ships of the features themselves. Secondly, it seeks to identify such a


meaning as something which is external to intonation and to which the in-
tonation pattern refers.
The present paper explores a different approach. It attempts to go be-
yond the traditional framework by not restricting the description to indi-
vidual tone-groups, but it also deliberately avoids isolating features from
their intonational context. Certain features of the pattern are assessed in
terms of their internal function within an intonation structure rather than
their external reference to specific discourse meanings. Some parts of an
intonation pattern, in other words, establish relationships with other
parts, and their role can be seen as that of creating an intonational structure
which can be projected on to discourse as a whole. Thus, rather than ask-
ing 'what is the discourse function of this or that feature of intonation?'
we should ask 'what is the significance of the choice of this or that intona-
tion structure in this discourse context?'

II

The examination of internal relationships within a more comprehensive


intonational structure is, as we have seen, at variance with the traditional
approach to intonational description. Larger intonation structures are
generally considered only in conjunction with specific grammatical struc-
tures - 'the intonation of sentences containing relative clauses', and so on
- rather than as autonomous entities. Nevertheless, a few pointers in this
direction may occasionally be found in the standard works. Four main ap-
proaches to such matters may be encountered.
1) Patterns are occasionally given meanings which are of an 'internal'
or 'structural' kind rather than of an 'attitudinal' type, for example mean-
ings such as 'continuative' or 'non-final'. Generally, the 'non-final' pattern
is rising and the 'final' falling. This unfortunately does not take us very
far, as it is easy to find final rising patterns and non-final falls.
2) The next step is to look for various recurrent sequences of tone-
groups. Armstrong and Ward (1926) consider the various combinations of
their tunes I and II, and the majority of the other handbooks give lists of
possible sequences. But these lists remain rather arbitrary; there is no dis-
cussion of the nature of the relationships between the tone-groups or the
nature of the structures created.
3) From here one may proceed to classify the types of sequence found.
Palmer (1922) distinguishes 'co-ordinating' and 'subordinating' sequences
based on the identity or otherwise of the tones involved: co-ordinating se-
quences have the same tone in each tone-group, while the tones of subor-
dinating sequences differ. Crystal makes a similar distinction (Crystal and
122 A. Fox

Quirk, 1964; Crystal, 1969) but on a different basis: for him, subordina-
tion occurs with identical tones differing in pitch range.
4) A more elaborate approach is to recognise a larger intonational en-
tity with various kinds of internal structure. Though there are hints of this
kind throughout the literature, explicit proposals are few. Schubiger
(1958) suggests a unit with a structure analogous to that of the tone-
group: a string of tone-groups where one is 'nuclear'. Several types are re-
cognised with the nuclear tone-group appearing in various positions in the
sequence with either a rising or a falling tone. A more recent attempt in a
more explicit theoretical framework has been made by the present writer
(Fox, 1973) where a 'paratone-group', with various structural relations
within it, has been postulated.
All these proposals suggest quite strongly that the tone-group is not an
isolated entity, but contracts relationships with other tone-groups in se-
quences. Clearly, any investigation of the role of intonation in discourse
must attempt to establish the nature of these relationships and to evaluate
their significance for the structure of discourse.

Ill

There is evidently a connection between certain co-occurring tone-groups


which justifies our treating the combination of tone-groups as a sequence
and not merely as an arbitrary succession of independent entities. There is
also evidently a relationship of some kind between the nature of the con-
nection and the specific intonation pattern (the type of 'tone') that the
tone-group contains. But an initial problem encountered in discussing
such matters is that it entails establishing some inventory of intonation
patterns for the language, and in the case of English there is little agree-
ment as to what the system of patterns might be. There are good reasons
for this lack of consensus; it stems from the nature of the relationship be-
tween the various patterns. The patterns cannot be reduced to a simple
paradigm of mutually substitutable and systemically equivalent items;
there are affinities between certain types of patterns, and the 'variants' are
not analogous to allophones of phonemes but are meaningfully different.
In short, then, a simple list of 'tones' will not really do.
This is a serious and important problem, but one which is of only se-
condary relevance in the present context. For simplicity, therefore, a crude
binary division will be made initially into those patterns which end with a
low pitch (L) and those which end high or mid (11). No particular theoreti-
cal significance is attached to this division, and further distinctions will be
introduced where appropriate.
Granted such a two-fold categorisation, we can immediately establish,
as the simplest kinds of sequence, the following combinations:
Subordinating and Co-ordinating Intonation Structures 123

H + L, L + L, L + Η , Η + Η
What kinds of relationships between the tone-groups in such sequences
can be established? T h e particular relationship that we shall investigate
here is that of dependency, i. e. we can establish 'subordinating' and 'co-or-
dinating' sequences, where the former has a dependent and an independent
tone-group, and the latter only independent tone-groups. Dependency is
in essence a co-occurrence relationship: a tone-group is subordinated to
another if its occurrence depends on the occurrence of that tone-group,
such that it cannot occur alone.
If we examine the above types of sequence from this point of view, we
find that different sequences appear to have different kinds of internal re-
lationships. The type H + L (where, it will be recalled, Η represents any
pattern not ending in a low pitch, e. g. high rise, low rise, level, fall-rise
etc., and L represents any low-ending pattern, e.g. high fall, low fall,
rise-fall) seems to be interpretable as forming a subordinating sequence
where the Η pattern is dependent on the L pattern. Thus, in an utterance
such as:
when I get 'back / I'll give you a ring
(where / indicates a tone-group division and ' and - indicate nuclear pitch
patterns) the first tone-group is subordinated to the second. T h a t this re-
lationship is not a grammatical one, despite the fact that it parallels the
grammatical dependency in this example, is shown by reversing the order
of the clauses:
I'll give you a 'ring / when I get "back
Provided the same sequence of patterns is used, an Η-type in the first
tone-group and an L-type in the second, the first tone-group is subordi-
nate in both cases.
We could attempt to characterise the dependency relations here in terms
of 'information distribution' or 'textual' status: the information conveyed
by the first tone-group is in some sense less significant than that conveyed
by the second. But the intention is that we should for the moment avoid
all such 'external' interpretations. T h e dependency relation here is in the
first instance a structural relation between tone-groups.
The relations established in the second type of sequence, L + L, appear
to be different from the ones just considered. Here the impression is not
of a dependent tone-group followed by an independent one, but rather of
two independent tone-groups in a co-ordinating relationship. An example
is:
I'll give you a "ring / when I get "back
where the falling patterns could also be rising-falling. Despite the gram-
matical subordination of the second clause, there is no intonational prior-
ity here; the two tone-groups appear as equivalent entities.
T h e L + Η sequence is more complex, and more difficult to analyse. In
124 A. Fox

fact, the analysis appears to depend on the pattern of the second tone-
group. The Η pattern may be a high rise, as, for example, in a typical 'tag
question':
(a) you're from "London / aren't you.
or a low rise, as in:
(b) I'm going to "Leeds / on -Friday
or a fall-rise, as in:
(c) we can start axgain / if you 'like
Examples (a) and (b) seem to differ in respect of the relationships be-
tween the tone-groups. In (b) it seems clear that the final tone-group is
subordinated to the first. Several interpretations of this well-known Eng-
lish pattern are found in the literature, such as 'a fall with a rise in the
tail', or a 'double tonic' with two nuclei, etc. But there is a consensus that
the final rise is in some way an appendix to the fall of subordinate status
(cf. Halliday's term 'minor information point' - Halliday, 1967). Here it is
treated as a sequence of two tone-groups, where the second is subordi-
nate. It can thus be seen as the reverse of the Η + L sequence discussed
above.
This interpretation will not do for example (a), however, which, despite
the apparent similarity to (b), is best seen as a coordinate structure, since
the two parts seem relatively autonomous. It is thus analogous to the L +
L type, and in fact the final high rise can be replaced by a fall in another
type of tag question without affecting the structural relationships. Identity
of pattern in the two tone-groups is thus not essential for coordinate
structures. (Incidentally, the same anomaly applies in the syntactic struc-
ture of tag questions: though syntactically coordinate, these sentences are
exceptional in having a different structure - declarative and interrogative
- in the two coordinate clauses.)
Although example (c) is of a type which is not as frequently discussed as
(b), it seems to fall into the same category as (b), namely it is a subordinat-
ing sequence with a final dependent tone-group.
The remaining type of sequence is Η 4- Η. This is the least frequently
encountered of the basic sequences (Armstrong and Ward, 1926, do not
include sequences of 'tune II' among their possible combinations), but it
certainly does occur, as in the following examples:
(a) are you coming 'with me / to the 'pub?
(b) I said I "would / if he "asked me
(c) if I ask you "nicely / will you 'come?
For (a) and (b) the coordinate interpretation seems most plausible, but
(c) is more doubtful and would also be open to a subordinating interpreta-
tion, comparable to the Η + L type. Such utterances as this may well be
structurally ambiguous; they are in any case not common.
Summarising the findings so far, it can be claimed that tone-groups in
sequences may be seen as falling into two basic kinds of relationship: a
Subordinating and Co-ordinating Intonation Structures 125

subordinating relationship, in which one tone-group is dependent on an-


other, and a co-ordinating relationship, where the tone-groups are linked
but independent. It is important to note that it is tone-groups that are sub-
ordinated or co-ordinated, and not intonation patterns ('tones') as such,
since on the evidence of the above examples there is no absolutely consist-
ent relationship between the pattern used and the status of the tone-group
in which it occurs. It must also be reiterated that these relationships are to
be seen as internal to the intonation structure rather than as reflecting in a
straightforward way any direct discourse function. As will be seen below,
these structures can be projected on to discourse, but this projection must
take into account whole structures rather than individual tone-groups.
It is convenient to introduce a more explicit means of representing the
various structural types of sequences, independently of the pattern used. If
we use 'a' for independent tone-groups and 'b' for dependent ones, the
structures discussed above fall into three types:

ba (H + L, perhaps some instances of Η + H)


ab (L + H, where Η is a low rise or a fall-rise)
aa (L + L, most instances of Η -I- H, and L -I- Η where Η is a high
rise)

The sequences examined so far have been minimal, with only two tone-
groups. Longer sequences do, of course, occur, with a corresponding in-
crease in the possible combinations of pattern types. Again restricting our-
selves to the basic dichotomy Η versus L, sequences of three tone-groups
give us eight possibilities, those of four tone-groups sixteen possibilities,
and so on. In terms of the types of structural relationship contracted by
tone-groups in these sequences, however, the possibilities are more re-
stricted, and longer sequences can be seen as extensions of the basic types.
One way of expanding the basic types of sequence is to allow more than
one dependent tone-group. Thus the ba type can be extended to bba, bbba
and so on:
On "Sundays / if the "weather's good / and I haven't got too much
"work to do / I go "climbing
Similarly the ab type can be extended to abb, abbb and so on:
I go "climbing / on ,Sundays / if the .weather's good / and I haven't got
too much ,work to do
Incidentally this example shows the inappropriateness of treating the
fall + low rise combination as a single 'double nucleus' tone-group, since
the rise can be repeated indefinitely (cf. Fox, 1973). These two types of ex-
pansion can also be combined, giving bab, babb, bbabb etc. An example of
type bbabb would be:
On "Sundays / in the "summer / I go "climbing / if the /weather's good /
and I haven't got too much /work to do
126 A. Fox

Sequences may also have more than two independent tone-groups, giv-
ing structures such as aaa, aaaä and so on:
this is "John / my "brother / from "Sheffield
are you 'John / Bill's 'brother / from 'Sheffield?
The 'tag-question' type of co-ordinating structure can also have more
than one L type pattern before the rise:
You're John "Smith / from "Sheffield / 'aren't you?
though repetition of the high rise would be unusual in such structures.
Finally, multiple subordination and co-ordination can be combined in a
single sequence to produce bbaa, baab etc. An example with the structure
bbaaabb would be:
On "Sunday / if the "weather's good / I shall go "climbing / with my
"brother / from "Sheffield / I , think / if I haven't got too much ,work to
do.
The structural principles underlying longer sequences are thus quite
easily established, and the patterns can be generalised into a formula:

b 0 aib 0

where subscript numbers indicate the minimum number of instances.


There is evidence, however, that greater complexity is possible, where co-
ordinate independent tone-groups may have tone-groups dependent on
them individually. This results in an interruption of the sequence of struc-
turally equivalent tone-groups, with structures such as baba, babbab and
so on. The direction of dependence of subordinate tone-groups between
independent ones may be either way, i.e. (ab)a or a(ba). Both these pos-
sibilities may be illustrated by the following examples taken from recorded
data:
a(ba) and this is a difficulty which in "creases / as we get more "stu-
dents / coming into the "university
(ab)(ba) it doesn't depend on "seasonal / 'trade / but offers a comfor-
table "home / to a number of "residents

A formula to generalise all these types would be:

(bGa b 0 )i

This would expand into a, ba, ab, bab, aa, baa, aab, aba, baba etc.
Structures such as this may seem quite complex, but by the standards of
syntactic structure they are extremely simple. In fact, intonation structures
never approach the complexity of syntax, as certain kinds of structuring
seem to be excluded. It is not possible, for example, to embed one tone-
group inside another (unlike the embedding of sentences) or to have dis-
continuity in a tone-group. Even the possibility of more than one level of
Subordinating and Co-ordinating Intonation Structures 127

dependency - a tone-group subordinated to a dependent tone-group — is


extremely doubtful, since all potential cases are open to a simpler interpre-
tation. A recorded example is:
because I feel sure if they "did this / the students would .work / for
everything that really "mattered
We could see this as having a structure such as (bb)a, with two levels of
subordination, but a single level, with both subordinate tone-groups de-
pendent on the same independent tone-group, is equally plausible.

IV

In the above discussion we have been concerned with establishing relation-


ships between tone-groups which create intonation structures. The next
step is to consider the role of such structures in the articulation of dis-
course. This is not the same as asking what particular 'tones' can be said
to 'mean' in discourse since, as we have seen, the intonation pattern of a
tone-group is not necessarily a sufficient guide to its structural role. For
this reason, any attempt to give specific discourse functions to 'tones',
such as Brazil's 'referring' and 'proclaiming', is unsatisfactory, as it as-
sumes a consistent discourse function for each tone. But neither can we
assign such a consistent function to specific tone-group categories (de-
pendent and independent): it is intonation structures as a whole that must
be projected on to discourse and which can be said to have significance
within it.
In many cases it is certainly possible to see subordination of tone-
groups as reflecting subordination of the information they contain, but
this is not always the case. As an example, consider one of the classic cases
of intonational 'disambiguation', the 'alternative question'. Utterances
such as 'would you like tea or coffee' are said to be capable of two inter-
pretations according to the intonation pattern used (cf. Lee, 1953). With
two tone-groups of which the first has a rising pattern and the second a
fall (would you like 'tea / or "coffee) the addressee is offered a choice be-
tween two alternatives; but if both tone-groups have a rise (would you like
'tea / or 'coffee) these are merely examples from an open list. In this
standard interpretation the function of intonation is here quite idiosyn-
cratic; this is a distinction that is specific to lists and it cannot be genera-
lised with other cases. Seen from the point of view of intonation structures
in the sense of this paper generalisation is somewhat easier.
In terms of the present framework the 'choice' version is a subordinat-
ing structure (ba), while the 'example' version can be seen as co-ordinating
(aa). This structural distinction reflects the 'textual' status of the parts: the
co-ordinating structure is open-ended, and may be continued indefinitely,
whereas the subordinating structure suggests two complementary parts,
128 A. Fox

'non-final' and 'final'. Such an implication is not restricted to lists, but can
occur more widely; subordinating structures, with their hierarchical order-
ing, suggest integration, while co-ordinating structures imply mere concat-
enation.
But it must be noted that it is not possible to give a specific semantic
function to the 'tones' themselves. The rising patterns have a different sta-
tus in each case, and it is not the 'tones' that determine the meaning but
rather their structural roles. Hence other 'tones' which would be structur-
ally equivalent convey exactly the same implication: the rise of the 'choice'
version could be a fall-rise or a level pattern, while any pair of co-ordinate
tone-groups (such as fall -I- fall, or fall-rise + fall-rise) would preserve
the implication of the 'example' version.
Furthermore, it is clear that what is chosen here is the structure as a
whole rather than individual tone-groups. It is not possible to associate a
subordinate tone-group with 'subordinate information' in a straightfor-
ward way. 'Tea' is in no sense subordinate to 'coffee'; nor is it 'given'
while coffee is 'new'. It would also make little sense to say that the
speaker is 'referring' to tea, but 'proclaiming' coffee. We must therefore
assign significance to the choice of whole structures.
This example shows how a sentence of a specific syntactic type may in-
teract with very general characteristics of intonation structure, resulting in
an apparent syntactic function for intonation. It is of course, evident that
this is not really a syntactic function of intonation, but a more general dis-
course articulating function which can be open to more specific interpreta-
tion in conjunction with a particular type of sentence. In fact, it is clear
that certain types of sentence may be able to exploit general functions of
intonation in quite specific ways, and that certain sentence types may tend
to be regularly associated with particular intonation structures. This is not
meant to imply that intonation structures occur obligatorily with certain
sentence types; it merely follows from the uncontroversial observation
that both sentence structures and intonation structures carry significance
in discourse, and may often converge. It is thus possible, to a limited ex-
tent at least, to identify some likely correspondences between syntactic
features of utterances and types of intonation structure.
The most common type of structure for utterance containing more than
one tone-group is a subordinating one of the type ba. This structure is
general whatever the syntactic relationships contained within the utter-
ance. Certain types of utterance nevertheless seem to prefer a structure in
which the subordinate tone-group follows, i.e. ab, where the final tone-
group has a low rise or a fall-rise. Particularly susceptible to this kind of
treatment are:

(i) noun phrase tags:


he's a nice "chap / John (is)
Subordinating and Co-ordinating Intonation Structures 129

(ii) some final comment clauses:


he'll come to "morrow / I should "think
(iii) some final adverbials:
it'll be "fine / to -morrow

These utterance types have in common a final 'weak' element.


Other types of utterance prefer a co-ordinating intonation structure, aa.
These include utterances with the following kinds of element:

(i) relative clauses (usually of the non-defining type if they have a sepa-
rate tone-group):
there's "Smith /who wrote that book on "speech acts
(ii) appositional phrases (also usually non-defining):
I'll introduce you to "John / my "brother
(iii) sentential relative clauses:
I missed the "lecture / which was a "pity
(iv) adverbial comment clauses:
he's "failed / as we ex"pected
(v) tag questions (most), with the same or different patterns in each
tone group:
he's from "Birmingham / 'isn't he?
he's from "Birmingham / "isn't he?
(vi) unlinked clauses:
I don't "know / I've for"gotten

What these utterance types have in common is an element which is struc-


turally parallel to all or part of the main sentence.
More than one of these categories may be involved in any one case, re-
sulting in more complex structures, but these structures will follow exactly
the same principles. We may, for example, find utterances such as:
he's going to "Edinburgh / -John is / to-morrow
with both a noun phrase tag and a final adverbial; or
I met "John / your "brother / -yesterday
with both an appositional phrase and an adverbial. Where the criteria for
a co-ordinating structure are satisfied as well as those for a subordinating
structure, we may get subordinate tone-groups which are co-ordinate with
one another:
"John / my "brother / lives in "Kent
he's a nice "chap / -John / your -brother
In both these examples an appositional phrase forms a tone-group which
is co-ordinate with another, but both tone-groups are subordinated to an-
other by virtue of other criteria.
It must be emphasised that these correspondences between types of syn-
tactic element and intonation structure are in no way obligatory but are
130 A. Fox

simply likely correspondences, motivated by the kind of informational sta-


tus that these syntactic elements tend to have. The intonation structures
themselves are by no means restricted to these cases and may occur wher-
ever analogous informational relationships are indicated. For example, an
utterance such as:
there's a "fly / in my "soup
is, by the use of a co-ordinating structure (in this case not motivated by
any syntactic criteria), given an implication of containing two parallel
pieces of information, whereas the same utterance spoken with a subordi-
nating structure, e. g.:
there's a "fly /in my "soup (ba)
or: there's a "fly / in my ^soup (ab)
contains a main centre of information and a dependent one, though dif-
ferently organised in each case.
The importance of intonation structures in discourse thus becomes evi-
dent, with the speaker's being able to structure his utterances so as to re-
flect the status of the various parts as pieces of information.

We have so far considered the relationships between tone-group only in


terms of the dichotomy subordinate/co-ordinate. It is possible to go a
little further by considering different categories of subordinating relation-
ship depending on the intonation pattern (the 'tone') used. Even here,
however, it is more profitable to see these categories in terms of their con-
tribution to the intonation structure itself rather than in terms of any ex-
ternal meanings of an 'attitudinal' kind.
The major type of subordinating relationship gives the structure ba, in
which the patterns are generally Η -I- L. The Η-type pattern may be any one
of a variety of different forms, such as high rise, low rise, fall-rise, level, and
an indefinite number of varieties of each. Similarly the ab structure allows
different possibilities in final position, primarily a low rise and a fall-rise.
What differences of meaning can be associated with these differences of
pattern in subordinate tone-groups? In the literature on the subject we
generally find only vague suggestions rather than systematic analyses.
Kingdon (1958), for example, gives 'prelusory' as the meaning of the low
rise in non-final position, while the fall-rise here contains an additional
element of 'insinuation'. Similarly, Armstrong and Ward (1926) suggest
that the fall-rise in this position is a more 'emphatic' form of the rise,
while Palmer (1922) considers it to be 'less aggressive'. Jassem (1952) dis-
tinguishes between the low rise and the level tone here, but his explanation
of their meanings is singularly uninformative: the former indicates that
'something more is to follow', while the latter 'leads on to something else'!
Subordinating and Co-ordinating Intonation Structures 131

Halliday (1967) is more explicit. For him the fall-rise in a non-final


tone-group is 'unmarked' in a dependent clause; the low rise is 'unmarked'
in a co-ordinate clause, but 'confirmatory' in a dependent clause. Finally,
we may note Brazil's interpretation: (cf. Brazil et al., 1980) both rise and
fall-rise are 'referring', but the former is 'more emphatic'. The level tone,
on the other hand, indicates that this is 'not a potential end'.
Few of these remarks are of much help in determining the differences
of meaning between the various subordinating patterns. To try to improve
on this, let us examine a few test sentences with a variety of syntactic
structures (taken to neutralise interference from syntactic relations):
(i) if it rains / we can't go
(ii) we can't go / if it rains
(iii) it's raining / and we can't go
(the nucleus of each tone-group is underlined).
We may systematically apply a number of basic subordinating sequences
of type Η + L to these sentences:
(a) low rise + fall
(b) high rise + fall
(c) fall rise 4- fall
(d) level + fall
Evaluation is necessarily subjective, but the following are suggested inter-
pretations.
Pattern (a): this seems to be the most straightforward kind of connec-
tion which simply links the tone-groups together in a subordinating rela-
tionship. We may term this sequence 'associative subordination'.
Pattern (b): though also linking and subordinating the first tone-group
to the second, this sequence seems to convey a rather stronger implication
of connection between the parts. It seems to imply a natural or logical
connection, with the suggestion that the parts belong together as comple-
mentary parts of the utterance. We shall therefore term it 'complementary
subordination'.
Pattern (c): the role of the fall-rise in single tone-groups in English is much
debated, and the consensus is that it often expresses a reservation, contrast,
or concession. There is something of this, too, in its use in a subordinate
tone-group. In contrast with the low rise, this pattern appears, while linking
and subordinating as before, to suggest some sort of opposition or contrast
between the parts of the utterance. An appropriate term to contrast this'pat-
tern with pattern (a) would be 'dissociative subordination'. (The terms 'asso-
ciative' and 'dissociative' are taken from Pheby, 1975).
Pattern (d): this pattern is not particularly common and its precise sig-
nificance is difficult to assess. The implication is, like that of pattern (b),
one of closer association between the parts than mere linkage, and it
might be seen as implying an obvious, almost self-evident connection. For
want of a better term 'close subordination' may be suggested.
132 A. Fox

Those experienced in attempting to give labels to intonational meanings


may be rightly suspicious of these 'meanings', but they will also be aware
of the extreme difficulties involved here. But the approach to these mean-
ings offered here, where patterns are assigned functions within the struc-
tural relationships rather than in external terms, seems to offer a stricter
control over the labels.
Again it might be possible to suggest likely correspondence between
these patterns of subordination and certain syntactic configurations, but
they are even less reliable than those given for the subordinating/co-ordi-
nating distinction. The 'associative' and 'dissociative' types link up with
Halliday's treatment of the low rise as 'unmarked' for co-ordinate clauses
and the fall-rise on 'unmarked' for subordinate clauses, while the 'comple-
mentary' pattern is certainly appropriate for alternative questions involv-
ing closed lists as in the example discussed earlier. But again it must be
said that such correspondences are not at all obligatory and these patterns
may be found in any utterance which requires the appropriate implication.

VI

The notion of 'discourse' adopted so far has been rather narrow, being ef-
fectively limited to long utterances by a single speaker. It is interesting to
consider the possibility of widening the scope of the concepts discussed
here to include more extended discourse involving more than one partici-
pant. Can intonation structures of the kind envisaged here cross the
boundaries between the utterances of different speakers? Since several of
the 'non-final' dependent patterns can also occur in apparently independ-
ent tone-groups, one possibility might be to regard such tone-groups as
forming incomplete structures, to be completed by another speaker.
An obvious example of this kind is provided by the high rising pattern
found regularly in 'yes/no' questions. In terms of our present framework,
such patterns could be seen as independent tone-groups, but we could also
consider them to be simply the first, dependent, part of a structure which
is completed by the reply. Alternatively, and without invoking another
speaker's reply, we could interpret such questions as the first part of a
truncated alternative question, with a suppressed '. . . or not' which would
again complete the structure. This explanation for the rise in questions has
a long history; it was put forward by Coleman (1914), and has recently
been revived, in a somewhat different theoretical framework, by Pope
(1976). The 'alternative question' approach would also perhaps explain
why yes/no questions regularly have a rise, while wh-questions regularly
do not: the latter cannot be supplemented by '. . . or not'.
The fall-rise pattern could perhaps be seen in a similar way. We have al-
ready noted a relationship between the interpretation of the meaning of
Subordinating and Co-ordinating Intonation Structures 133

this pattern in independent tone-groups as 'reservation', 'contrast', etc.


and its 'dissociative' function in subordinate tone-groups. Again the inde-
pendent use could be regarded as an incomplete structure, as the inde-
pendent first (or second) half of a structure whose other part is provided
by another utterance or speaker.
These suggestions as to the extension of the framework are naturally
somewhat speculative, and their full implications have not been seriously
followed up. But an examination of discourse from this point of view
might well reveal patterning of this structured kind. These speculations
aside, however, this paper has attempted to sketch out a possible approach
to longer utterances in terms of intonation structures. T h e aim has been to
show how intonation as a structure may contribute to the articulation of
discourse. In this way it is hoped to overcome the limitations of a more
traditional analysis of intonation functions based on isolated tone-groups
and on the assignment of discourse meanings in an atomistic way. T h e
role of intonational features in discourse cannot be adequately evaluated
unless they are seen in their structural context.

References

Armstrong, L. E. & Ward, I. C. (1926). A Handbook of English Intonation. Cambridge, Hei-


fer.
Brazil, D. C. (1975). Discourse Intonation. Discourse Analysis Monographs 1, Birmingham,
English Language Research, University of Birmingham.
Brazil, D. C. (1978). Discourse Intonation II. Discourse Analysis Monographs 2, Bir-
mingham, English Language Research, University of Birmingham.
Brazil, D., Coulthard, Μ. & Johns, C. (1980). Discourse Intonation and Language Teaching.
London: Longman.
Brown, G., Currie, K. L. & Kenworthy, J. (1980). Questions of Intonation. London: Croom
Helm.
Coleman, H . O. (1914). 'Intonation and Emphasis'. Miscellanea Phonetica. 1, 6-26. London:
International Phonetic Association.
Crystal, D. (1969). Prosodic Systems and Intonation in English. Cambridge Studies in Linguis-
tics 1. Cambridge: University Press.
Crystal, D. & Quirk, R. (1964). Systems of Prosodic and Paralinguistic Features in English. The
Hague: Mouton.
Fox, A. (1973). 'Tone sequences in English'. Archivum Linguisticum 4, 17-26.
Fox, A. (1978). A comparative study of English and German intonation. Unpublished doctoral
dissertation, University of Edinburgh.
Halliday, M.A.K. (1967). Intonation and grammar in British English. The Hague: Mouton.
Jassem, W. (1952). Intonation of conversational English. Wroclaw.
Kingdon, R. (1958). The groundwork of English intonation. London: Longman.
Lee, W. R. (1953). 'Intonation involving choice and exemplification', mf 99, 2-5; mf 100,
35-8.
Palmer, Η . E. (1922). English intonation with systematic exercises. Cambridge: Heffer.
Pheby, J. (1975). Intonation und Grammatik im Deutschen. Berlin: Akademie.
Pope, E. (1976). Questions and answers in English. The Hague: Mouton.
Schubiger, M. (1958). English intonation, its form and function. Tübingen: Niemeyer.
ANNA FUCHS

'Deaccenting' and 'Default Accent'

1. Introduction

1.1. A number of accent placement patterns have of late been discussed in


the literature that pose difficulties in neither meeting the conditions as-
sumed to hold with 'normal' stress (semantically, formally), nor being de-
scribable as 'contrastive', nor fitting in with narrowly semantic theories ac-
cording to which any accent on an element signals a focussing of precisely
that element.
Bresnan's article 'Sentence stress and syntactic transformations' (1971),
as is well known, initiated an extended controversy as to whether accent
placement is determined semantically or syntactically, and, if the latter,
how (Lakoff, 1972; Berman & Szamosi, 1972; Bresnan, 1972; Bolinger,
1972; Stockwell, 1972). The discussion was in the main based on four
types of syntactic construction (in terms of 'surface structure', of course):
two types of attributive construction, namely, 1. N-l-ίο+Inf (plans to
leave), 2. relative clauses; 3. w/vquestions; 4. subject-predicate construc-
tions. The detail of the syntactic discussion is immaterial here; I take the
stand that any attempt to describe, let alone 'explain' the accent patterns
on a purely syntactic basis is bound to fail 1 . The dilemma that the discus-
sion addressed, in terms of the traditional 'normal'/'contrastive' scheme,
is the following: each of the constructions, beside the pattern commonly
expected for 'normal stress' (accent on the rightmost non-anaphoric lexi-
cal constituent) also shows an ill-behaved one where the accent occurs
prior to that constituent, but cannot be construed as 'contrastive'.
The main examples used in the controversy have been repeated over and
over again and acquired the usual sad fame; still, for ease of reference, I
give one or two 'standard' specimens per construction (cf. Table 1).
The four cases are alike in departing from expected patterns, as
sketched above. Functionally, however, they fall into different categories.
This is reflected in how Bresnan invokes different supplementary semantic
1
See the syntactic counter-arguments, especially in Berman & Szamosi, and the semantic
ones there as well as in Bolinger (1972), Stockwell (1972: 98 ff.); and, above all, Bresnan's
own recourse, in her reply to Lakoff and Berman & Szamosi, to semantic distinctions ('in-
itiatory'/'elicitory' questions, 'concealed partitive', 'topical stress'). The admitted possibil-
ity of dubbing these 'deep syntactic' in my view amounts to a play on words.
'Deaccenting' and 'Default Accent' 135

Table 1
a) expected pattern (rightmost b) 'ill-behaved'pattern
placement)
1 plans to leave plans to leave
2 the doctor I was telling you the doctor I was telling you
about about
3 which turn should we take which turn should we take
4 the sun's shining the sün's shining
my umbrella is dirty my umbrella is dirty

principles (see above, fn. 1); but while her procedure is quite ad hoc
('patching operations', as Bolinger remarks), a systematically functional
analysis shows each of the different cases to be a manifestation of much
more far-reaching regularities.

1.2. Very scant attention is paid to the discourse conditions for different
accent placements by Bresnan, Lakoff, Berman & Szamosi, Stockwell. 2 If
we distinguish between the patterns possible in 'all-new' use and those
that necessarily presuppose 'givenness' of part or all of the construction,
we can account for the alternative placements in half of the cases:
- Accent on the predicate in cases like the sun's shining, my umbrella is
dirty systematically presupposes the subject's (often, both the sub-
ject's and the predicate's) being 'given' via context/situation, while
the sun's shining, my umbrella is dirty are patterns usable where no
such 'given'ness is implied. 3
- While which turn should we take? does not presuppose 'givenness' of
either turn or take, which turn should we take?explicitly does so either
for turn or for both 4 ; the same applies to the host of analogous exam-
ples adduced above all by Lakoff: cf. Bolinger 1972, Berman & Sza-
mosi, Ladd, and also Bresnan's own distinction of 'initiatory' vs. 'eli-
citory' questions (which is awkward as far as the qualifications given

2
The artificiality of their procedures is well illustrated in the introductory footnote to Ber-
man & Szamosi: "We should like to thank many professors, students, and neighbors for
acting as informants, pronouncing long (and boring) lists of sentences for us" (emphasis mine).
' cf. Fuchs (1980) with more literature on this point. Without the persistent neglect of this
simple fact, the myth of 'normal' rightmost accentuation could not so easily have installed
itself.
4
Givenness of only the object is pragmatically improbable with this example, but a syste-
matic possibility for the construction type.
136 Α. Fuchs

for the latter are concerned - cf. Bolinger 1972: 642 - but rightly
keeps apart initiatory and non-initiatory - or 'second-instance' -
uses).5
Functionally speaking, then, the question of rightmost or prior accent
placement is of no particular importance. We may reorder our accent pat-
terns according to the criterion just used, in the way shown in Table 2.

Table 2
'all-new' use possible6 'given'ness (ofpart or all) presup-
posed

1 plans to l6ave/ plans to leave


2 the doctor I was telling you
about/ the doctor I was telling
you about
3 which türn should we take which turn should we täke
4 the sün's shining the sun's shining

1.3. The patterns that necessarily presuppose some 'given'ness are the
ones that will mainly occupy us in this paper. Their system, however, is re-
lated to that of the others, which will therefore have to be sketched, too.
In so doing, I shall briefly touch upon the question of the alternative pat-
terns under 1) and 2) in Table 2 (extensive treatment is to be found in
Fuchs, ms).
All the patterns listed in the left rubric correspond to what is usually
termed 'normal stress', but they are by no means the only ones possible in
'all-new' use. In fact, that they should so commonly be regarded as 'nor-
mal' is, I think, to quite an extent due to one-sided choice of examples
(discussions of accent placement overwhelmingly are concerned with
one-accent patterns - does 'the' accent go on this 'or' on that element? - ,
while in spontaneous speech pluri-accent patterns abound). In my theory,
the patterns are cases of the phenomenon I call integration (Fuchs 1976,
1980), a concept hopefully capable of providing an explication of what is
usually referred to by 'normal stress', and of reconciling 'semantic' and
'syntactic' approaches as well as showing the unity of 'normal' and 'con-
trastive' accent. I briefly summarize the main points (see also Fuchs, ms.):

Constructions (verb + object, subject + predicate, adjective + noun, coordinations, etc.)


whose immediate constituents are both realized by lexical material and both 'new', 'informa-

5
'Second instance' is borrowed from Bolinger; see, e.g., 1952.
' The patterns are not confined to such use, cf. below.
'Deaccenting' and 'Default Accent' 137

tionally relevant', at the given point in discourse, may receive two different treatments: each
IC may be introduced as a separate point of information and, accordingly, accented; or the
whole construction be 'integrated', treated as one 'globally new' unit, by accenting one of the
ICs only - the location of the 'integrative' accent being conventionally fixed.
I call the IC of a construction that is to receive accent in the case of integration its expo-
nent (the subject in subject-predicate constructions, the object in verb -I- object construc-
tions, etc.). The location, of course, often coincides with traditional 'normal stress'.
Accent on the 'exponent' only is systematically ambiguous, the emphasis (or 'focus') sig-
nalled by it has variable scope: both immediate constituents may be 'new', 'informationally
relevant', and integrated; or the emphasis may cover the accented IC only, the other refer-
ring to 'given', 'presupposed' matter. This ambiguity is systematically resolved by recourse to
the context, verbal and/or situational (including the character of the items accented)/

Thus with all cases in the left rubric above, we are in presence of integra-
tion. But while for the bulk of integratable syntactic constructions there is
one location of accent designated for the case of integration, the construc-
tions under 1) and 2) pose a special problem in admitting of two place-
ments each. This question has to be treated in a broader framework
(which I attempt in Fuchs, ms).

1.4. All the patterns we have been considering have one accent for two
ICs. In all those in the left rubric in Table 2 that accent may signal focus
either on the accented item only, or on the whole construction. Lack of ac-
cent thus may have two different conditions there (and the two cases must
be distinguished in any analysis; if not, much confusion arises!): the con-
stituent without accent may either be 'given' or (in the case of integration)
fall under the scope of the emphasis, be 'new' itself (part of the 'globally
new' unit). 8
In the right rubric, the situation is the reverse: the accented constituent
may either be 'new', the rest being 'given', or be itself 'given' (part of one
'globally given' unit), a case quite contrary to our current accent place-
ment 'intuitions'.
It is obvious that, to make sense of this, we have to go into the syste-
matics of the patterns that (necessarily) presuppose 'givenness', in more
detail.

2. 'Default accent'

2.0. Two authors have paid particular attention to the accent phenomena
in sentences presupposing 'givenness': Bolinger and Ladd. Both base their

7
The phenomenon of variable scope has been used to reduce the 'normal'/'contrastive', 'se-
mantic'/'syntactic' dichotomies also by Ladd. For resemblances and differences between
our approaches see Fuchs (ms).
• cf. Ladd (1979: 111).
138 Α. Fuchs

analyses on semantic considerations; Ladd, beyond this, combines seman-


tic and 'syntactic' analysis. Both authors set out to explain cases long neg-
lected in descriptions and not easily accountable for by common notions,
and develop a conception that goes under the name of 'default accent'
with Ladd.

2.1. Ladd introduces his concepts of 'deaccenting' and 'default accent'


with the following examples:
(1) A. Has John read Slaughterhouse-Five?
B. No, John doesn't r£ad books
(2) A bill was sent to Congress today by President Carter which would
require peanut butter sandwiches to be served at all government
functions. At a press conference today, a group of senators led by
Barry Goldwater of Arizona denounced the measure. Goldwater
said . . .
(3) Harry wants a but his girlfriend would prefer an American
car.
He comments (104 f):

"In each of these examples we see the deaccenting of various items felt as 'given' or 'in the
discourse': books in (1), the measure in (2), car in (3). In each case the deaccented noun has
been somehow referred to or alluded to earlier in the discourse . .

and goes on to formulate

"a corollary of deaccenting, which is that in order for an item to be perceived as deaccented,
the accent must fall elsewhere, and it is thus possible for a word to be accented as it were by
default. In (1) we can say that the accent is on read merely to deaccent books. This is not a fo-
cus accent but a default accent. We are not focusing on read; the positive, marked aspect of
the accentual pattern in (1) is not that read is accented, but that books is deaccented. In (2),
we are similarly unable to find a focus on denounce; the accent is there merely to signal the
deaccenting of the measure."

Some further examples (p. 106 - I change the numbering):


(4) I can't imagine what it would be like to be a dentist - but I'm aw-
fully glad there are guys who want to be dentists.
(5) A. Man it's hot! Doesn't feel like it'll cool off till tomorrow at
least.
B. Yeah, they said it would be hot all day.
(6) What happens to male spiders, anyway? Are they like a lot of male
insects [. . . in that the female eats them after mating?]. (The
speaker has implied that spiders are insects. By accenting insects,
the opposite would be implied.)
(7) (a) John washed the car. I was afraid someone 6lse would do it.
(Implication that I wanted John to do it.)
'Deaccenting' and 'Default Accent' 139

(b)John washed the car. I was afraid someone else would do it.
(Implication that I wanted to do it myself.)
"Someone else in (b) is deaccented to signal coreferentiality with John; the accent
falls on afraid by default."
Ladd concludes (ibid.):

"In all of these cases, then, the accent falls on an item - read, denounce, be, said, a lot of, afraid
- which provides us with little semantic justification for talking of focus; in each case, how-
ever, there are subsequent items in the sentence - books, the measure, dentists, it would be hot
all day, male insects, someone else - which seem to have some reason in the context f o r being
deaccented. From all this we conclude that there are two fundamentally different types of
non-neutral accent placement - narrow focus accent, as discussed in the previous section,
and default accent."

For the sake of my argumentation below, I also adduce the remaining ex-
amples from that section (Ladd, 108 f., numbering of the examples mine):
(8) A. Boy, I really have to get moving. My parents are coming today
and I've still got to finish this paper and get the place cleaned up
and get some laundry done and . . . Wow, I don't know if I'm go-
ing to make it.
a) B. Oh, yeah, I forgot about that. What time are you meeting
your parents?
b) B. Oh, yeah, I forgot about that. What time are your parents
arriving?
(9) A. Why don't you have some French Toast?
a) B. I've forgotten how to make French Toast.
b) B. There's nothing to make French Toast out of.
(10) A. I'm a linguist.
B. How many languages do you speak?

2.2. Ladd makes repeated reference to Bolinger; for the phenomenon in


question, however, he only adduces one example from Bolinger 1958 (and
the remark cited above apropos Bresnan's question types). A more exten-
sive treatment of the question by Bolinger is to be found in his 'The phra-
sal verb in English' (1971).
In Bolinger (1958), there are two examples: (11) I am seldom out of my
castle (uncommented as to semantics) and (12) the Defense maintained that
everybody in Rumania worked for the Government ("the speaker refers to
someone who had been denied re-entry to the United States because of
having worked for the Communist government of Rumania"). Bolinger
comments:

"there is no contrast between 'in' and 'out o f , nor is there any special attention of any other
kind bestowed on the word in; it merely carries a sentence accent which, if it were to fall any-
where else, might be mistaken as contrastive (146)."
140 Α. Fuchs

In 1971, under the heading of 'rectification accent' ("yes-no-emphasis on


a verb"), several examples are discussed in a similar way (46f.):
(13) Why aren't you going? - I am going!
(14) Why didn't you make up the beds? - I did make them up!
(15) He deserved to die. Why didn't someone take care of it? - They
püt him to death!
(16) a) I'm not sure about what t6 review and what not to review.
b) ( = I'm not sure what I should review and what I should not.)
(17) a) That's the sense in which I meant it!
b) ( = That's the sense in which I did mean it!)
(18) These two things don't always go together.
(19) Why didn't you throw it awäy? - I threw it away. You just weren't
paying attention.
According to Bolinger, the accent here has to fall "on an element that is
semantically barren, or relatively so", on "semantically low-content
words" (auxiliaries, prepositions etc., but also the first element in phrasal
verbs). "Rectification accent" is

"part of the more general phenomenon of de-accenting elements that are repeated or presup-
posed. In I did write it - you didn't write it, the nuclear accent is not shifted to the auxiliary
merely to have it there', but in part also to get it off the repeated write. The accent is re-
quired for the yes-no-emphasis, but it would be misleading if it fell on a content word."

So, although he does not use the term 'default accent', Bolinger, like
Ladd, has a conception of accents sometimes being where they are because
they had to be gotten off a repeated or presupposed element (his formula-
tions are somewhat hesitant, though, at least in 1971: " n o t . . . merely", "in
part also").

2.3. Observations, objections


1. Bolinger's and Ladd's theories are not in complete agreement: Bo-
linger speaks of accent placement by default (without using the term) only
where the accent is on a 'semantically low-content' word (for which he
sometimes has to stretch this notion somewhat 10 ). While Bresnan's which
turn should we take figures as a case of 'default accent' with Ladd (115),
Bolinger (1972: 642) explains it in terms of 'deaccenting', but not as a
placement 'by default': " . . . what is involved is the de-accenting of re-
peated elements and the accenting of new elements which is to be found

' The procedure commonly assumed to be used for 'yes-no-emphasis', according to Bo-
linger.
10
In the case of the first element of phrasal verbs, for instance.
'Deaccenting' and 'Default Accent' 141

everywhere . . .". Thus, for Bolinger, the reason for the accent on take is
the common one, the element's being 'new'. 11
2. The mere negative concept of 'default' placement as a corollary of
'deaccenting' is not sufficient to account for the cases so described by Bo-
linger and Ladd. In fact, there are inconsistencies in its application with
both authors:
a) In quite a proportion of Ladd's examples, the accented word is itself
'given': one would accordingly expect it to be deaccented as well, if deac-
centing is semantically-based:

read in (1) is an exact repetition of the lexical part of the verb, as is he in (4); in the beautiful
example (10), in the non-'contrastive' reading where the speaker assumes as common know-
ledge that linguists 'speak (many) languages', speak is implied by the context. For (7) and (8),
context indications are not sufficient to determine with certainty whether the predicate ex-
pressions should be taken as 'given' or 'new': in the former case, the objection made here
would apply; in the latter, the following.

b) In all the others, on the other hand (including (7) and (8), if the
predicates are interpreted as 'new'), the accent is perfectly compatible with
the traditional semantic theory of accent as signalling what is 'new' in dis-
course, without any need for a notion of placement 'by default'. De-
nounced in (2), American in (3), said {5), a lot of {6), make/make out of {9)
(as well as the examples of compounds in a later section, p. 120) all illus-
trate "the accenting of new elements which is to be found everywhere".12

Although Ladd convincingly argues (97-104) that the common 'normal'/'contrastive' dicho-
tomy is too simple, he obviously succumbs to its suggestiveness here himself (which is often
in fact hard to avoid): since the patterns considered are not 'normal', but their accent cannot
be seen as somehow contrasting the item either, some 'other kind' of accenting must be in-
volved . . . (See the rhetorical "what does afraid contrast with?" p. 106, and the whole argu-
mentation esp. on that page.)

So, strictly speaking, none of Ladd's examples corresponds to his descrip-


tion.
c) Bölingens specification that accent by default goes on a 'semantically
low-content' word does not provide a basis for determining its exact loca-
tion, since there are often several such elements competing (why not I dm
seldom out. . ., don't always go. . . etc.?). Bolinger considers alternative
possibilities to some extent: see (16), (17). Their relative status, however,
is not explained; Bolinger seems to consider them free variants, function-

u
This is imprecise in view of the discourse conditions assumed for this pattern by Bresnan,
namely, a preceding we should take one of these turns (1972: 340). The two sentences are of
course highly improbable as a discourse sequence in direct succession; it remains that take
just as well as turn is assumed to be 'given'.
12
The accent on out (of) may surprise at first sight, but is I think analogous to that in throw
out, throw away, etc.
142 Α. Fuchs

ally equivalent — but are they? £add's theory is in principle more specific
with regard to the location of 'default' accent (see below, § 3); still, he
does not take alternative possibilities into account at all (why not doesn't
read books, how many languages do you speak? which would seem very
plausible 'deaccentings' in view of the context givens, after all.).
We will have to determine the status of the alternative possibilities and
search for functional differences corresponding to them. The need for
'positive' criteria is also borne out by the fact that, e.g., the verb and the
object may be accented at the same time, in second instance (see below
§ 3.6.2.). It is impossible in such a case for the accent on the verb form to be
there 'by default', yet its semantic value is the same as in Ladd's examples.

3. Ladd's cases. 'Iteration'.

3.1. Surely we have to leave all the cases where there is perfect justifica-
tion for the accent in traditional semantic terms - case b) above - out of a
discussion of 'default' placement. Since Ladd himself by no means treats
all deaccenting for 'given'ness as resulting in 'default' accent 13 , the amend-
ment should be in line with his thinking.
Although this eliminates a large proportion of Ladd's examples, others
remain that definitely need an explanation, for the reason that the ac-
cented element itself is 'given'.

3.2. It is perhaps consideration of this type of example that leads Ladd to


a statement from which a more specific description might be derived. In a
later section (118 f.), Ladd sets out to account for the precise location of
'default' accent by relating it to his 'focus principle' for the determination
of 'neutral' accent placement. The illustrative example used here is in fact
of the type we are now considering exclusively (John doesn't read books).
Although the 'default' accent here is still viewed as resulting from the
deaccenting of books, which does not fit, Ladd on the other hand speci-
fically posits a 'broad focus' comprising both read and books. He does not
motivate this semantically, and of course it contradicts the rest of the de-
scription - but what it amounts to saying is that there is no difference be-
tween the two constituents with regard to.'newness'/'givenness' ('focus' is
Ladd's term for the content side correlate of accent, thus 'newness' or
whatever other abbreviating term we choose): the paradoxical fact being
that they are not only both 'given', but also both seem to be 'new'.
Are we to posit a kind of 'mirror image' integration? Again, an accent
that, under proper context conditions, semantically refers to a whole con-

13
cf. 1980: 111, where nineteenth century is treated as deaccented, under certain context con-
ditions, without there being any question of 'default'.
'Deaccenting' and 'Default Accent' 143

struction within which there are no differences in 'givenness'/'newness' to


correspond to the difference in accentuation - with a reversal of the ac-
cent pattern as a means to indicate that the 'globally' focussed on con-
struction is also 'globally given' (and with the same kind of systematic am-
biguity as in integration so far: again, the accent may also signal 'narrow
focus')?

Ladd considers this ambiguity (which he nicely illustrates with how many languages do you
speak - see the comment, 109) as strong "evidence for the focus/default distinction", and the
discussion of read books in his book seems to point even more strongly in the direction just
outlined (see Ronat this vol.).

The cases here considered strongly invite such a hypothesis, and the fre-
quency in discourse of the construction type considered may seem to con-
firm it again and again (I held it for a long time). But the generalisation is
not warranted. In fact, the description would only fit constructions involv-
ing a verb as one IC, and not all of these. Besides, the second-instance ac-
cent here is always on the verb, regardless of whether the integrative pat-
tern has it on the co-constituent or on the verb itself (for cases of the
latter sort, see below § 3.5.3). This ties in with the possibility mentioned
above of accent on the verb and its co-constituent in second instance,
where evidently the accent on the verb form cannot be a result of 'deac-
centing' the co-constituent, and calls for a 'positive' explanation of the ac-
cent on verb forms in second-instance sentences - in fact, generally.

3.3. At this point, we need to sharpen our notions regarding the semantics
of accent placement: we are still left with the paradox of constituents to be
characterized at the same time as 'given' and as 'new'; and the peculiarities
connected with accentuation within predicates have to be elucidated
against a more general background.
Abbreviating modes of expression are necessary, but the terms current
in the description of accent function, although mostly indispensable at the
moment, are all unclear or misleading in some respect or other (and of
course no-one is fully satisfied with them). 'Focussing' is useful, but rather
metaphoric. 'Relevance' leaves open the question of a point of reference:
relevant to whom, to what? (it has often in fact been misconstrued in a
psychological sense, 'what the speaker feels is important'); and 'new-
ness'/'givenness' cannot be taken to be what accent signals in a literal
manner either.
1. Hearers know what is 'given' in discourse and what is not; in fact,
have to know it, to be able to competently participate: recent work has im-
pressively shown how interactants constantly have to draw on their fund
of common knowledge, in general and as regards the givens of the situa-
tion at hand, to be able to adequately produce and correctly interpret ut-
144 Α. Fuchs

terances. 14 This kind of knowledge is a prerequisite to communication


rather than something itself to be communicated.
2. The 'relevance' of what a linguistic element communicates is always
relative to some 'issue' (or, with Keenan/Schieffelin, 'question of immedi-
ate concern') as established at that point in the communication, or to be
inferred, 'called up' by the utterance itself. (For much interesting detail,
see Keenan/Schieffelin, 1976.)
3. Accent assignment is not mechanically dependent on factual given-
ness of items in the discourse. We have to distinguish between factual new-
ness/givenness on the one hand and 'newness' ('relevance')/'givenness' as
the signifies, in the technical sense, of accent placement, on the other. 15 A
speaker may insinuate factual givenness by leaving unaccented an item
that he newly introduces, 16 or accent and thereby mark as 'new'/'relevant'
an item that in fact refers to matter given in the discourse.
Factually new items often go without accent (see Fuchs ms.), but there is
a conventional association between factual newness and accent assignment
for certain syntactic categories: thus, roughly formulated, a verb comple-
ment is always expected to be accented if its referent is newly introduced
into the discourse. Leaving it unaccented may insinuate givenness.
Factually given items are susceptible of two treatments: they may be left
unaccented, to signify their givenness, or be accented, to establish them as
a point of relevance/'newness' with respect to the question of immediate
concern at that point. The latter procedure I term iteration.17

Treating a newly introduced element as given purposely disregards a conventional relation


between accent and factual newness; insinuation of this kind often resembles lying, in that to
work well it must not be advertised. 18 Accenting a given element, on the contrary, is an 'offi-
cial' procedure that has to be ratified by the interactants: it relies not only on the conven-
tional meaning of the accent, but also on the commonality of assumptions regarding the giv-
enness status of elements, to realize its specific functions.
'Factually given' is not a simple notion. Of course it does not reduce to 'having been men-
tioned in preceding discourse*. A specific relationship to the question of relevance at that
point is required, and consensus among the interactants regarding such status. I forego any
detailed discussion.
Chafe (1974: 117 ff.) rather surprisingly, and not without justification, argues that while it
is commonly held that elements 'contrastively' accented are new, in fact they are very often

14
Cf. the complex interpretation of deictics, beautifully illustrated in Fillmore (1976), Haw-
kins (1978) for definiteness, more generally Grice (1975), Labov & Fanshel (1977) and
Keenan & Schieffelin (1976). The systematic relevance of the fact is coming to light in an-
alyses of the most diverse phenomena.
» Cf. Halliday (1967: 204), Ladd (1979: 128).
" I am not referring at this point to the accentlessness of a 'new' element resulting through
integration.
17
A term I borrow from Bolinger, who uses it passim (e. g. in 1952), but less technically than
I intend to.
18
For a somewhat different kind of use, see Ladd's doesn't read, trash (1979: 128).
'Deaccenting' and 'Default Accent' 145

given. While we certainly do not agree with his conclusion that accenting for 'new informa-
tion' and for 'contrast' are different in principle, the dilemma vanishes as soon as factual
newness and 'newness' as a signifie are properly distinguished.

4. The contribution of an element to the content of a sentence (and


thus its role in the communication) is not properly describable unless its
syntactic relations to the other elements of the sentence (which are seman-
tic relations, of an abstract character to be sure) are taken into account.
Accentable elements are differently embedded in such relations depending
on their syntactic category, which is why we have to ask not only what ac-
cent signals in general, but also how the general function is differently
manifested in elements of different syntactic categories.
5. Focussing by accent may relate not only to the 'oppositive' semantic
features of the unit accented, i. e. those defined by possible opposition to
other units of their category19, but also to the 'relational' ones, those de-
fined by the syntagmatic relations they stand in. Where the unit focussed
on is one newly introduced, both kinds of feature are indissociably in the
domain of focus. In iteration, where (at least) the oppositive features are
given, the relational ones alone may become the sole object of focussing. 20

An example: A: May I speak to Peter? B: Peter's gone (he just left half an hour ago, and. . .). The
answer clearly is not intended to contrast Peter to anyone else; what is focussed is the 'the-
matic' relation ('theme' in Halliday's sense, roughly): cf. Peter's gone, where Β would simply
be giving some information about the referent introduced. Cf. also: A: You look desperatem·.
My purse is gone. My purse is gone here would be rather perplexing: "Who has been speaking
of purses?".
Sgall et al. (37 f.) speak of the possible 'underlining' of 'grammatemes' (in verb forms) as
opposed to that of lexical elements. This comes quite close to our approach. However,
'grammatical' vs. 'lexical' is not the relevant distinction: for one thing, grammatical elements
often have oppositive meaning features, and focus on them must be distinguished from focus
on their relational features.

6. In order exactly to determine the semantic contribution of accent on


an element, we have to know whether and what other elements in the sen-
tence have accent and what their (factual) givenness/newness status is.

3.4. We are in a better position now for a more thorough examination of


accent placement within predicates. In this connection, the question of
possible alternative placements raised in 2.3 will have to be answered. In
fact, all of Ladd's relevant examples (see 3.1) would have allowed of a dif-

" In discourse, possible oppositions are limited: not all the members of a syntactic category
are in virtual opposition, but only those within the universe(s) of discourse activated at
that point in the discourse.
20 This is not the only systematic possibility: in certain contexts ('contrast'), the relational

features may be given and the oppositive ones focussed.


146 Α . Fuchs

ferent placement much more in line with his deaccenting concept in leav-
ing 'given' matter accentless:
(1') John doesn't read books
(4 ) . . . that there are guys who want to be dentists
(8a ) what time are you meeting your parents?
(8b ) What time are your parents arriving?
(10 ) How many languages do you speak?
(for 8a and 8b I here assume the verb to refer to given matter).
The different placements presuppose different discourse conditions
(different 'questions of immediate concern'). Meanings vary systemati-
cally with a variety of factors; the following, however, seems one possible
paraphrase for (1) as against (1'):
(1) "What you are saying implies the possibility of John's reading
books - well, he doesn't"
(Γ) "You are saying that John reads books - let me tell you that such
isn't the case."21
How are such differences - slight from the point of view of denotation,
but real enough in view of possible interactional consequences - to be ac-
counted for systematically?

3.5. Kinds of oppositive and relational semantic features in predicates. Accent


placements and domains of focus in first-instance and iterating predications.
T h e following is a sketch, and tentative. I have analyzed a great number, at least a hundred,
of instances of iteration, particularly in predicates, f r o m spontaneous discourse (mainly Ger-
man, but also English), comparing contexts, substituting d i f f e r e n t accent patterns within
constant contexts o r imagining different contexts while keeping the patterns constant; but
more analysis is needed. Although the system sketched evolved f r o m such analysis, the writ-
ten formulation at some point developed its own logic, and I have not been able to recheck
all its implications o n an equal number of cases. Also, what I examine in relative detail is the
casuistry of the patterns directly relevant to the argument of the paper only; others are out-
lined at best.
The basic regularities are the same f o r English and German, but there is an important dif-
ference in morphological realisation that I shall sketch in 3.5.4.

3.5.1. Kinds of oppositive feature


The following (at least) may be distinguished:
a) Features defined by the opposition of the 'lexematic unit' to other
units of that category,
b) features defined by the opposition between different tenses,
c) features defined by the opposition between different genera verbi,
d) features defined by the opposition between different modal auxilia-
ries, if present.
21 A n o t h e r possible accent pattern would, of course, be John does not read books, and a possi-
ble paraphrase: "you are insisting that J o h n reads books - let me tell y o u such is not the
case".
'Deaccenting' and 'Default Accent' 147

3.5.2. Relational features. The more complex the internal buildup of a ver-
bal expression, the more different are the relations established within it
and to elements outside it. Thus a passive predicate 'orients' the action de-
noted differently with respect to the participants than an active one, a mo-
dal auxiliary within the predicate may strongly restrict the degree to which
the action, state etc. denoted is ascribed to the subject, etc. I shall concen-
trate here on two relations that are established by any lexical predication 22
and that I term, provisionally perhaps, 1) connection, 2) ascription.
The two relations go undistinguished in most accounts of "the" func-
tion of predication. Their distinction from each other (as well as from
other semantic features of the verb form), however, clearly shows in that
they may separately constitute the domain of emphasis; cases of this sort
also allow assignment to each of them of a formal locus within the verb
form.
'Connection' corresponds more to what is sometimes called the 'nexus'
function of predication, 'ascription' more to what is called its 'statement'
function. 'Connection' is effected through the bringing together of a (po-
tential) subject and the expression of an action, state, etc., in the form of
an infinitive, a participle, or the 'bare stem' of a finite form; 'ascription' by
the use of a finite form (including imperatives), the prerequisite for insert-
ing the complex created by 'connection' into a higher-level discourse unit,
as a (relatively) independent 'move'.

A distinction between two 'fonctions verbales' beyond the expression of the lexical content is
drawn by Benveniste (1966: 154): 'fonction cohesive', 'fonction assertive'; and Searle's dis-
tinction between 'predication' and 'illocutionary acts' also is relevant here. The three dicho-
tomies have many common elements, but are not parallel. Searle's is more comprehensive
than Benveniste's in sharply dividing the illocutionary aspects from the others. His practical
(not theoretical) neglect of formal linguistic distinctions, however, leads him to disregard
other differences. From the viewpoint of linguistic form-function correlations, Searle's 'pre-
dication' has to be further subdivided (as have the 'illocutionary acts'), and this subdivision
entails a somewhat different formulation of his primary distinction. Searle's term and in
large part his description of 'predication' make it look as though the relation were tied to a
finite verb form, but in fact he also considers other syntactic frames: see the discussion 30 f.
regarding expressions like I promise to come. Searle's recourse to 'deep structure', in order to
instal a subject-predicate relation between the relevant terms here too, amounts to stating a
common semantic relation in both constructions while abstracting from the formal/func-
tional difference. If we take the difference into account as well, we have two relations within
the finite predication, the common one and the one proper to the finite form.
Illocutionary acts, too, fall into two basic categories from the viewpoint of linguistic
form-function correlations: 1. the very narrow system of first-layer, 'forced-choice' speech
actions instituted in the linguistic form: any finite utterance has to take the form of either an
affirmation or a request for action or a request for information; 2. the system of speech ac-

22
I limit the treatment to predications that contain some lexical element, and to finite predi-
cations - extension to the whole system of verbal forms may necessitate some reformula-
tion.
148 Α. Fuchs

tions building on this first layer, the rules for which always specify a) one of the 'first-layer'
actions, b) social/pragmatic factors of the situation. (See the Labov & Fanshel model of dis-
course analysis, 1977.)

3.5.3. Outline of possible accent placements and domains offocus in first-in-


stance and iterated predications.
A. One-accent patterns
(1) First instance
Let us suppose a 'given' referent in subject role; the predicate introduces a
new activity, state etc.
a) If the predicate consists of a simple form (lexical verbal stem + end-
ing) or a combination of auxiliary/ies plus a form containing the lexical
stem, the accent is always on the verbal stem; it relates to the oppositive
and the relational aspects of the whole expression indissociably. A rough
paraphrase: "With respect to the question of immediate concern, and in
particular the subject referent we have in view, it is of relevance that the
activity, state, etc., lexically denoted by the verb applies - in the man-
ner/to the extent specified by the rest of the sentence, in particular the
mode of ascription, various modalities as expressed by modal auxiliaries
etc., negation . . .".
John's leaving "with respect to the question of immediate concern, and
particularly to John whom we just have in view, it is relevant that (pres-
ent) leaving is the case"
Is John leaving? ". . . it is relevant that it is not known whether . . ."
John's probably leaving. . . etc.
b) Lexically, the activity, state, etc., to be predicated is often not de-
noted (or in fact not denotable in the language) by just a simple verb stem,
and combinations of verb stem plus some 'specifier(s)' are used. Such
combinations may be integrated under one accent. 23 In many construction
types, it is a 'specifying' element that bears this accent: the direct object,
e. g. (den Hund ausführen, read a book), the nonverbal element in 'phrasal
verbs' (ausschlafen, stattfinden, throw out, take placej24; for other construc-
tions, integrative accent placement is on the verb (a fact little noted so far,
by the way, but see Schubiger, 1958: 85): ich schlaf so schlecht in letzter
Zeit; I haven't slept well of late.
The complex thus integrated contracts the same 'relevance' relation-
ships as a lexically simpler predicate.

23
Under the specific conditions that I am just delineating, this will be the case most of the
time.
24
The accent goes on the 'outermost' element to be integrated (see below). - Expressions
with a predicative nominal belong here, too; their behavior in first instance as well as in
iteration shows some characteristic peculiarities, however.
'Deaccenting' and 'Default Accent' 149

(2) Iteration
The activity, state, etc., expressed by a lexically simple form or a combina-
tion of verb plus 'specifier(s)' is given.
a) Accent may go on any part of the predicate in order to 'contrastively'
focus its oppositive semantic content (provided the element has any such
content). We will not be concerned with this possibility any further.
b) Any of the relational features may become the object of focussing,
which is formally expressed by accenting that part of the expression which
contains its formal locus.
Comprehensive rules determining the formal loci of the two relations
(and of others) are not easily formulated in a few words, in view of the
possible formal diversity and complexity of predicate expressions. For a
concise formulation, we will first have to determine the structural order-
ing of the elements liable to enter into predications. For the moment, I
shall set up a very tentative, and incomplete, 'inner-outer' scale, from left
to right: expression of finiteness - passive auxiliary - verb 'theme' (partici-
ple/infinitive/'bare stem' of simple form) - . . . — specifier 1 (non-verbal
parts of 'phrasal verbs' etc.) - specifier 2 ('inner' complements) - specifier
3 ('outer' complements) - . . .
Ascription may then be said to have the 'innermost' element as its for-
mal locus, connection the next 'outer' one. Since these elements may
either be combined in one word or dissociated, depending on the internal
buildup of the predicate (and there are some differences in this respect be-
tween English and German morphology), different possible accent place-
ments and systematic ambiguities result. However, the formulations above
(which simplify but, I hope, do not distort) should be able to account for
the relevant placement patterns, in English as well as German.
To summarize: ascription is focussed by accenting the form containing
the 'innermost' element of the predicate (the expression of finiteness - but
see note 22). Connection is focussed by accenting the form containing the
next 'outer' element. Where the two elements coincide in one form, inter-
pretation of the accent as a focussing on ascription or on connection syste-
matically depends on context.
A precise account of the functional side of focus on connection or on
ascription would presuppose quite an elaborate systematics of the possible
interactions between the parts of the predicate among each other and with
the other parts of the sentence, under different context conditions. A very
rough, and partial, sketch follows (cf. also § 3.6.2).
Every predicate 'ascribes' anew. 'Ascription' is never 'pure', always in
some mode: request for action, for information, (relative) affirmation (i. e.
including negation, modalisation via auxiliaries etc.). Ascribing a given
event to a participant in focussing on connection makes connection itself the
dimension of relevance; it affords a means to specifically concentrate on
the affirming, questioning, denying etc. of the factuality of the event (un-
150 Α. Fuchs

der the aspects/to the extent specified by the rest of the sentence) whose
possible relation with the participant is itself not news.
With focus on ascription, ascription itself is 'called up' as a dimension of
relevance. Since the event-participant relation is 'given in the discourse',
calling up this dimension of relevance results in establishing a relation to
the 'source' of that givenness: prior ascription (in some mode . . .). This
kind of focussing thus affords a metacommunicative device for specifically
referring back to a prior ascription while effecting a new one (in one of
the possible modes, again, and under the aspects/to the extent specified by
the rest of the sentence). 25

There are several quite explicit references to the phenomenon of focussing on relational
aspects of the predication (but without distinction of different relations): Halliday, 1967:
204; Sgall et al., 1973: 37f.; Worth, 1964: 699).

B. Pluri-accent patterns
a) Focus on connection and focus on ascription are combinable within
one predicate expression: see examples (38) and (39) below.
b) They are also (even jointly?) combinable with independent accenting
of 'specifiers' such as an object expression: see examples (36) and (37).

The 'specifier' may in that case be iterated itself; it may also be part of a 'new' predicate, and
bear an integrative accent expressing focus on the 'new' event. - In combination with the de-
signation of a 'new' event, the pragmatic function of focus on ascription is somewhat differ-
ent from what was said above: it may be used to imply the possibility of contradiction, e. g.,
in such cases. These uses are not treated here.

3.5.4. English vs. German. There is a difference in morphology (and per-


haps in part also of conventionally adopted perspectives) between English
and German that I cannot state quite satisfactorily yet. In German, a sim-
ple verb form may be iterated to focus on ascription, while English always
seems to require a two-term verb expression for the purpose: either a
'phrasal verb' (threw it away, cf. German findet statt) or an expression con-
taining an auxiliary: if no other one is present, do has to be used. He speaks
quite a few languages would not be taken up by means of he speaks many
languages (but do you think that alone qualifies him for the job?) except per-
haps by an incompletely bilingual German.
However, iteration of a simple verb form for focus on connection seems
possible in English.
Iterated simple verb forms are thus more ambiguous as to domain of fo-
cus in German than in English.

25
Metacommunicative function is present also in the iterative focussing on the 'thematic' re-
lation illustrated above § 3.3.
'Deaccenting' and 'Default Accent' 151

3.6. Illustration
3.6.1. Ladd's relevant examples illustrate focus on connection. The rules
given above under A. 2)b) should make it clear why it is only the object or
the predicate noun that goes unaccented although the lexical content of
the verb is no less given (cf. also above § 3.2 regarding Ladd's 'broad fo-
cus').
(1) John doesn't read books "there has just been a question of reading
(a) book(s) in relation to John. Well, this relation does not obtain."
(4) . . . but I'm awfiilly glad there are guys who want to bέ dentists "I
have just spoken about being a dentist, and what it is like (for 'one',
for 'a guy*) to be one. Well, I'm glad there are 'ones', 'guys' who
want it to be the case for them."
(10) How many languages do you speak?"You have just given me to un-
derstand that you speak (many) languages. Well, of how many is it
the case?"
The alternatives proposed in 3.4. would be the patterns in order for fo-
cus on ascription (for (4'), obviously, this would require a context where
want to is given, otherwise the pattern could not be so interpreted).
In the examples with focus on connection, there are other 'new' ele-
ments beside the one focussed: we said above that 'ascription' is effected
anew with any finite form while focussed only under certain conditions;
likewise, the negation in (1), want to in (4), how many in (10) are inte-
grated with the focussed-on element, which makes for a particularly nar-
row bond, in terms of discourse articulation, between them and the 'factu-
ality' ('connection') feature. In the alternative versions, the negation and
the interrogative elements are more narrowly tied in that sense to the fea-
ture of ascription (in a certain mode, etc.) 26 .

Even out of context, the alternative versions sound more 'insistent', which is explainable by
their focus on 'ascription'. The specific character of this kind of focus becomes even more
notable when comparing the versions in question to another possibility, accent on not, bow
many etc. instead of on the verbal expression. This is the pattern one naively expects in the
case of the predicate's being given as long as the possibility of focussing on relational fea-
tures is not recognized. It is in fact a systematic possibility in this case, as is accent on the ne-
gation/question word and the verb expression, but articulates the discourse in a slightly dif-
ferent manner, 'calls up' other/additional contexts of relevance in comparison to the patterns
here treated.

3.6.2. Examples from spontaneous discourse.


(20) (about a toy supposed to spin around, but which did so only after
a series of attempts:) 's dreht sich!(it's spinning!)
(21) (after several attempts at seeing a bird pointed out to speaker by
interlocutor:) Ich seh ihn!(I see it!)
26
want to in (4') is not counted since, as stated, it would have to be 'given' for the pattern to
be usable as focus on ascription.
152 Α. Fuchs

(22) also sehn wir uns heut abend? (well then are we meeting tonight -
interlocutor had mentioned, some time before, that she might be
kept from going to a party that night to which both were invited).
(23) Wann kommt denn der Ralf eigentlich? Say, when is Ralf coming? Α.
question 'out of the blue'; however, Ralf was supposed to drop in
some time that afternoon.
(24) A: wie ist das nun/findet die Versammlung statt heute abend? (now
what are we at/is the meeting tonight taking place?
B: die findet statt/jaa (it is taking place/yes)
(25) (Child has asked his mother to make ravioli for lunch; she half-
promised. After a while he comes into the kitchen, sees her cook-
ing and says:) gelt du kochst Ravioli? (You are making ravioli, aren't
you?)
(26) A: aber der betreffende [a farmer, AF] kommt natürlich für so was
nicht aus Kanada hierher (but the fellow won't come over from
Canada for something like that)
B: äch/so'n tJrlaub/wenn man das damit verbinden kann (oh I
don't know/a holiday/if you can combine both)
A: jaa/Landwirte mächen nicht oft Urlaub (farmers don't often go
on holidays you know)
(27) jetzt hör doch auf{now do stop), jetzt sei doch still (well do be quiet):
a very frequent type - iterated imperatives.
(28) du machst mich nervös! (you do make me nervous) — exclamation of
a teen-age daughter to her mother
(29) Child: wäs ist mehr/hundert Pfennig oder ne Mark? (what's more:
a hundred Pfennigs or one Mark?)
Mother: was meinst dü denn (well what do you think)
Child: hundert Pfennig (a hundred Pfennigs)
Mother: nee/hundert Pfennig sind ne Mark (no: hundred Pfennigs
are one Mark)
(30) (referring back to interlocutor's report that her baby fell off the
table) du sag mal wann ist sie denn eigentlich runtergefallen so um
wieviel Uhr/wenn das schon länger her ist. . . (actually when did
she fall down, at what time approximately. If it was some time
ago . . .). Speaker goes on to discuss symptoms of brain concus-
sion.
(31) die haben ihr dann gesagt das sei doch 'ne reine Impülshandlung
was sie da vorhätt und sie sollt's lassen/und sie hat's dann gelassen
(and those people told her that what she was planning was a mere
impulsive action and that she had better give it up/and she did give
it up. . .)
(32) (to a child fearfully inquiring about the modalities of an imminent
vaccination; after some reassuring information) außerdem bist du
doch auch schon ein paarmal geimpft worden (and besides, you
'Deaccenting' and 'Default Accent' 153

have been vaccinated a few times before, haven't you? - seil, it's
happened to you before, it's not new to you).
(33) (regarding a man claiming to have suffered injustice at his job) ich
hab neulich auch mit dem Μ. gesprochen der das ja alles miterlebt
hat/und der hat auch gesagt der sei schlecht behandelt worden (and
I've recently been talking to Μ., who has witnessed all those things
after all, and he says X. has been treated badly - scil. as he claims).
(34) bist du denn nun in Heidelberg gewesen? (well have you been to
Heidelberg after all? - interlocutor had been planning a trip to H .
but been very unsure whether she would actually go).
(35) (in a discussion) das kannste nicht in die Waagschale werfen/das
kannste nicht in die Waagschale werfen/darauf kommt's überhaupt
nicht an (you can't throw that into the balance/you can't throw
that into the balance/that's not at all what matters)
(36) (speaking about what kinds of accent patterns to collect. Certain,
it turned out during the conversation, could be neglected for the
moment.) Gut, dann läss ich die jetzt erst mal/aber ich sämmel Iter-
ation (ok, so I'll leave those for the moment/but I will sample itera-
tion).
(37) (does Christopher's pair of jeans need washing, in view of an im-
minent visit?) Jeans wäscht man doch nur einmal in der Saison/
oder? (but jeans you only wash once a season, don't you?)
(38) ja und habt ihr ihn denn dann eingeläden? (well and did you invite
him in the end?)
(39) (A and Β have been waiting for the postman: they can usually no-
tice it when he passes. After an hour or so of unsuccessful watch-
ing:) A: du ich geh jetzt mal an den Briefkasten. Β (sharp protest):
aber ' s ist doch keine Post gekommen. (A: Listen, I'm going to the
letter-box. Β: But the mail hasn't come!)
(40) Grocer watching customer stow away the things bought: Geht's
so? Customer: Wenn Sie mir ne Tüte geben könnten. . . (G: will
you manage? - standard question at the time plastic bags were gen-
erously given; C: If you could give me a bag . . .)
I have refrained from supplying accents in the English versions, but
hope the reader will do so (cf. § 3.5.4).

In (20), (21), (22), the placement of accent itself would also warrant first-instance interpreta-
tion; the contextual givens only make it clear that what is involved is focus on 'connection'.
The German examples, formally, would admit interpretation in terms of focus on ascription
as well; English is less ambiguous here, we would have to have either it's spinning, I sie it, are
we meeting, or it is spinning, I do see it, are we meeting. From (23) through (35), first instance
is manifestly excluded; the only one-accent patterns possible in that case would be accent on
a 'specifier' (the subject in 23, the non-verbal element of the 'phrasal verb' in 24, the object in
25, etc.). Semantically, (23) could be interpreted as focus on connection or on ascription: the
former insisting more on the moment of realisation, the second on the fact that Ralf's coming
154 Α. Fuchs

has been an issue between the interlocutors. The English version would have to be differ-
ently accented in the two cases: when is... vs. when's... coming. Formally, the same ambi-
guity is given in (25) and (26) (not in the English versions, though), but the larger context
(perhaps not entirely graspable from the extracts) makes for focus on 'connection' ('factual-
ity'); (27), too, leaves two options: emphasis on prior ascription or on realisation (a possible
situation for the latter: interlocutor is hesitating whether to stop a certain activity or not, and
I might give the advice: hör auf). Generally, focus on connection emphasizes the uncertainty
of the event in the perspective of the situation preceding it, while focus on ascription relates to its
relevance from the point of view of the issue at hand (cf. 30!); the reader may test this on most
of the examples, in their actual form and context as well as with possible accent/context sub-
stitutions. Examples (32) (formally, accent on been would be equivalent in the English ver-
sion) and (33), accent on has, illustrate cases where the verb form contains several auxiliaries
and infinitives, the rules for which remain to be stated in detail. Example (35), where verb +
complement form a semantically indissoluble unit, shows particularly clearly how it is the
whole group, not just the complement ('specifier') that is given (in fact, the iterated version is
preceded by its verbatim first-instance equivalent). (36) and (37) show the possibility of com-
bining accent on the verb form and on the 'specifier', (38) and (39) combination of focus on
connection and on ascription - a very elegant, and perfectly common, handling of perspec-
tives. (Note the accent shift in the 'phrasal verb' in (39): eingeladen as against first-instance
eingeladen) There is of course no compulsion to use this or that kind of iterative focussing in
a given context: the speaker is to a large degree free to take this or that perspective (cf. also
Ladd, 128: ". . . the speaker simply chooses to connect answer to question in a slightly differ-
ent way"); thus in (31), the speaker connects und sie hat's dann gelassen to the issue at hand,
while the context may well lead one to expect emphasis on the uncertainty of the event in the
past situation. 'Rhetorical' exploitation is frequent, particularly in politeness strategies (cf.
Brown/Levinson): in (40), the customer is asking for a plastic bag, but insinuating that the
grocer is about to give her one on his own initiative.

4. Bolinger's cases. Hierarchy of accent assignment.

With those of Ladd's examples that are not explainable by common se-
mantic theory without some additional specifications, the source of the
difficulty, as I have tried to show, lies in the presence of covert 'relational'
meaning features that may be focussed on by accenting the verb form just
as the lexical content may - a source of considerable, though systematic,
ambiguity.
Bolinger's and Ladd's descriptions share the notion of a placement 'by
default'. But their perspectives are different. In fact, as we mentioned
above, Bolinger dismisses Bresnan's what tum should we take as being per-
fectly in accordance with the usual semantic account (neglecting the fact
that take is not meant to be 'new' in it), while this category of cases is at
the core of Ladd's concern. In Ladd's approach, the 'default' examples are
analyzed in terms of their deviation from 'normal' and 'contrastive' accen-
tuation; what Bolinger tries to account for is non-'contrastive' accent on a
'low-content' element, a deviant case in terms of his semantic theory (this
emphasis, as we mentioned above, leads him to treat even put him to death,
threw it away etc. on that model). While those cases of Ladd's that we
'Deaccenting' and 'Default Accent' 155

have been trying to explain are definitely not describable in terms of 'ac-
cent on a low-content element', those of Bolinger's examples that do not
involve accent on a verbal form require an explanation different from the
one given so far. Bolinger rightly invokes the higher-level distribution of
accent as a reason. I should like to speak of 'higher level accent', however,
where he speaks of 'sentence accent' as a factor (a sentence does not ne-
cessarily have one accent, as Bolinger is of course well aware; the accent
on the 'low-content' element may indirectly result from accent attributed
to a constituent a level higher than itself, but lower than sentence level).
And 'dependence on a low-content element' is an overgeneralization.
To substantiate this, we need a descriptive framework that assigns ac-
cent not per sentence, but per constituent (which is in accordance with Bo-
linger's general approach), and in so doing takes exact account of the syn-
tactic hierarchy. Let me sketch such a system in a few lines:27
1) A construction with immediate constituents A and Β has the follow-
ing possibilities of accent distribution: (i) A + B (ii) A + B (iii) Ä + B (iv)
A + B. The first is used to make mention of the referents involved while
presupposing them as 'given' and as not constituting points of relevance at
that place in the communication; the second, to present each separately as
a point of relevance. One of the one-accent patterns (see Fuchs ms.) may
be used for integration of both ICs into one point of relevance. The integ-
rative pattern is always ambiguous, its other systematically possible use be-
ing focus on the constituent accented only. The non-integrative one-ac-
cent pattern is often limited to use for focus on the accented IC only; see,
however, section 3.4. case 2b) and the analyses below.
2) Whenever an IC that is to receive accent according to 1) is in itself
complex, for the placement of accent within it 1) again applies (with the
obvious exclusion of pattern (i)).
3) This goes on as long as there is further syntactic ramification.
4) Higher-level choices may bar the possibility of some lower-level
choices with respect either to accent placement or to its interpretation. I
mention the principle, but will not go into the question any further here.
The following two examples (from Fuchs, 1976) illustrate the diagram-
matic representations used below:
(41) (Speaker has taken a bicycle to the repair shop because the gear
switch does not work. After explaining this) und wenn Sie sich das
Ventil am Vorderrad nochmal anschauen/ich glaube da entweicht
Luft (and if you'd please have a look at the valve on the front
wheel/I think there is air escaping [lit. there escapes air]).

27
More detail in Fuchs 1976 ch. 3, and 1976, 1980 passim. To simplify, I neglect the ques-
tion of the lexical/nonlexical character of the constituents. The concern for the syntactic
hierarchy advocated here has nothing in common of course with Ladd's notion of a hier-
archy of accentability.
156 Α. Fuchs

(42) das ist also genau das was die Transformationsgrammatik oder
Konstituentengrammatiken durch rekursive Regeln äh erzeugen
(so this is precisely what ['that which'] [the] transformational
grammar or constituent grammars er generate with recursive
rules).
I shall not especially take account here of the treatment of such consti-
tuents as regularly go without accent even when 'new' (ich/1\ da/there etc.)
and just put them in their proper place in the diagram without further
marking. Otherwise, Ά ' marks any non-terminal constituent that, on the
respective hierarchical level, is to receive accent, either as the exponent of
an integrated construction or as a point of relevance in its own right. To
mark the former case, I shall enclose the construction in parentheses with
an index Right indexes ν and g mark 'new' and 'given', respectively.
Needless to say, the accent patterns in the examples are such as they are
in virtue of the specific meanings that were to be conveyed in the contexts
of their use. Other patterns could have been chosen, f o r other purposes.
T h e hierarchical analysis here proposed differs from other cyclic ap-
proaches in stressing the semantic choices effected at each level. 28 N o t e
that the trees are not intended to reflect linearity.

(43)

(1) ich An

(2) {glaube A)!

(3) da A N

(4) ('entweicht Luft)\

28 The 'tree' representation is thus obviously quite different in interpretation from that used
by Liberman & Prince (1977).
'Deaccenting' and 'Default Accent' 157

(6) Α oder A (7) (durch A erzeugen)\


I I
(8) die Trans .. . Konstituenten ... (9) (rekursive Kegelri)\

Those of Bolinger's examples that involve accent on the verbal auxiliary or


on the first part of a phrasal verb are to be accounted for as outlined in
3.4.; all the others are explainable on the basis of the general principle of
"deaccenting what is given and accenting what is new" if the rules of rami-
fication as sketched above are taken into account.

(45) lam seldom out of my castle

seldom

am

out of {my castle)Q


158 Α. Fuchs

(46) everybody in Rumania worked for the government

A (worked for the government)g

everybody A

in (Rumania)g

(47) I'm not sure about what to review and what not to review

am A

(not sure)Q A

about A

A and A

what A what A
A . / \ .

to (review)q not (to review)^

(48) That's the sense in which I meant it


that

is A

(the sense)c A

which (I meant it in . . .)q


'Deaccenting' and 'Default Accent' 159

Provided the IC analyses are basically correct, it would thus seem that we
can explain all the accents in question by stating that the 'low-content' ele-
ment is, at that hierarchical level, the only non-given, i. e. new constituent.
(I have dozens of examples of this kind from spontaneous discourse in
German; so my claims are not limited to the few examples above. Cf. also
Fuchs 1976: 309 f.)
We will thus want to keep the part in Bolinger's explanation that
amounts to saying that the accent results from the interaction of higher-
level stress assignment with the 'givenness' of the co-constituent. That the
accent should be on a 'low-content' element, on the contrary, generalizes
what is only a possible side-effect that strikes us when it occurs because it
so patently contradicts common expectations. The attempt to cover the
iteration cases by the same explanation led to unwarranted semantic classi-
fications: for why should throw in throw away be 'low-content' in compar-
ison to away? For the kind of examples we are treating in this section,
however, the notion of 'accent by default' has some truth: the constituents
accented in them would not receive accent in first-instance use (except if
they were meant to establish separate relevance relationships within the
discourse) were it not for the fact that the dominating constituent has an
accent to be realized somewhere, while their co-constituent is 'given' (and
not meant to be 'iterated') and thus cannot take it.
There is no 'dependence on a low-content element' for the distribution
pattern described to occur, nor is the reason for the accent being where it
is that it "would be misleading if it fell on a content word": For one thing,
the 'given' co-constituent may itself be 'low-content', cf. am liebsten würd'
ich mit dir fahren (I wish I could go with you - to someone who has just
been talking about a projected trip); komm mal zu mir (do come to me):19

" English: A: . . . rather an intricate pröblem! Β: Yes. But let's not go into it now. A pub-
lished example: O'Connor & Arnold, 1973: 226 (I render their 'tone group' notation by
simple accent marks): But h6w did you manage it? - There was nothing to it. It couldn't
have been simpler.
160 Α. Fuchs

For another, accent by this kind of assignment may well be placed on a


'content' word (which is hardly surprising since all that is involved are the
semantic rules for accenting plus syntactic hierarchy conditions):

(50) (Beginning of a telephone conversation:)


A: Wie geht's/ärbeitest du viel/oder sind jetzt Semesterferien
(how're you doing/are you working a lot/or is it the break between
terms now)
B: Semesterferien sind ja die Zeit in der man arbeitet (why the breaks
between terms are the very time when one works)

Semester- A
ferien

{sind Α) ι

die A

Zeit (in der man arbeitet) Q

It will have become clear by now t h a t . . . in which I meant it and . . . in which I did mean it,
or . . . what to review... and . . . what I should review... are not really equivalent, nor are
any other alternative accent placements, 'low-content' or n o t . . . the sense in which I meant it,
e. g., may presuppose something like "there has been a question of my meaning 'it' in some
specific sense, and now you are presenting this sense; well:. . ."; . . . in which I did mean it
specifically presupposes a questioning of the fact. Again, the differences are small from the
point of view of denotation, but real enough interactionally.

5. Conclusion

5.1 We are still at an elementary stage in the analysis of accent, and on the
whole our systems of rules for its use are based on a few simple notions,
just as the data base mostly is poor. Many patterns of accentuation and
their uses have come under attention only quite recently.
The patterns mentioned in this paper do not fit into current models of
description in that their accents are neither interpretable as a foreground-
ing ('contrastive' or otherwise) of the content of the element in the usual
sense, nor can they be described as 'normal', 'neutral' placements in view
of the current rules for those. Their deviation from current expectations
derives from several different basic principles of accent use, however,
which is one reason why attempts to account for them by some one addi-
tion to the model used by the author (Bresnan's proposal, e.g., or the 'de-
'Deaccenting' and 'Default Accent' 161

fault' notion in either Ladd's or Bolinger's version) cannot be successful.


Bresnan's cases, as we have tried to show, are to be accounted for in three
different ways: a classification of patterns according to whether they ad-
mit of 'all-new' use (1.2); elucidation of the rules for choice of integrative
accent placement (cf. 1.3); analysis of the conditions under which accent
may fall on elements 'given' in the discourse (cf. 1.4). The cases adduced
by Ladd and by Bolinger under somewhat differing notions of accenting
'by default' are not unitary either. Of the former, quite a percentage in
fact need no explanation beyond the usual semantic one (2.3); their inclu-
sion among the cases to be elucidated is probably an unwanted effect of
the 'neutral'/'contrastive' dichotomy, which holds its sway even if one
knows - and formally declares - it to be inadequate (as I know from hav-
ing often been trapped myself). The remaining ones are based on different
general principles: the possibility of focussing 'given' elements, for certain
communicative purposes, on one hand (ch. 3), the principle of semanti-
cally-based choice of accent pattern at each level of the semantic hier-
archy, on the other (ch. 4).

5.2 For our analysis of the 'default' cases, we have had to sharpen current
basic notions in several respects. Most importantly:
1) The terms 'given' and 'new' in the description of the function of ac-
cent do not always designate the same. We have to distinguish between
two facts: factual 'givenness'/'newness' of the element in the discourse (a
notion that does not simply reduce to that of prior mention), which does
not in any mechanical way determine accent placement; and the conven-
tional meaning associated with accent/accentlessness, for which we should
perhaps agree to settle on some other technical term.
2) The semantic function of accentuation is characterized by a high de-
gree of systematic variability with contexts: 'ambiguity' in a somewhat ne-
gative terminology, systematic adaptability to different purposes, more
positively. The choice of an accent pattern is not determined by context
conditions, since it has an autonomous semantic function (although cer-
tain accent patterns may show a high degree of affinity to certain context
types). Its interpretation, on the contrary, is systematically dependent on
context givens of all sorts (cf. Halliday, 1967: 206): word class, syntactic
relations to other elements, accentedness/accentlessness of those ele-
ments, (factual) givenness/newness in the discourse etc.
3) It is not only the semantic content of an element in the usual sense,
as defined by its opposition to other elements of its class, that can be fo-
cussed on by an accent; the 'relational' semantic features as defined by the
syntagmatic relations it stands in, too, figure importantly as possible do-
mains of focus.
4) The syntactic hierarchy of a sentence does not determine accent
choice, but plays an important role as a framework for the choices to be ef-
162 Α. Fuchs

fected. Accent distribution is chosen, on a semantic basis, at each level of


the syntactic hierarchy anew. At every level, a semantically-based choice is
made between a number of possible patterns. 30

'Sentence accent', then, is 'sentence' accent in the sense that the informational focussing ef-
fected by it - which may be bestowed on any of its constituents, and on several at a time - is
fully interpretable only against the background of the relations the constituents stand in
within the sentence. The term must not mislead us into thinking that what an accent func-
tionally relates to is simply the sentence as a unit. If it did (in a 'delimitative' function), the
rules for its placement would in all probability be simpler than they are (for how should the
accent help delimit the unit, otherwise?), and an unalloyed 'syntactic' approach might have
been sufficient to capture them.

5.3 We have long had to contend with 'contrastive' accent as a unit pre-
sumably sui generis, when what is involved is just accent and some 'con-
trastive' use, or context (Bolinger, 1961; Gibbon, 1980). It looks as if 'de-
fault accent' were on its way to becoming hypostatized into another such
'kind of accent'. But, again, although the rules for the use of accent in in-
teraction with the local semantics, syntax, and discourse conditions are in
part rather intricate, and in spite of extensive '-etic' semantic variation,
'-emically', focussing by accent is unitary.
'Default' accent is 'focus' accent. 31

A terminological recommendation: we should, to avoid further hypostatizations, be careful


about the use of labels of the form 'x-accent' (I know it is hard to avoid them altogether).
Conditions of use must not be made into 'kinds of accent', terminologically. 32

5.4 In an interesting article, Bearth has recently drawn attention to the


fact that pitch differences are exploited to signal 'information value' in
tone languages, too. The fact in itself, although probably not common
knowledge, should not be too surprising; we know that pitch differences
tend to be functionally interpreted along several dimensions, to be organ-
ized into different functional systems within the same language (see also
Martinet, 1968). What is interesting in the present context is the appar-
ently far-reaching analogy in the modalities of the interaction with syntac-

10
Four formal patterns are available, but semantically, they correspond to many more possi-
bilities: separate focus vs. integration, focus on a new element vs. iteration, etc.
31
If one were to speak this sentence, in the context here presupposed, there might well (not:
'would have to') be accents on the subject, on is, and on the predicate noun - and analysis
in terms of 'default' impossible. (Interestingly, while one quite commonly accents many
constituents in a sentence, one usually underscores but one.)
12
If the notion of 'kinds of accent' is fruitful at all, it should certainly be reserved for units
formally distinct, such as Bolinger's A-, B- and C accents. Still, mutatis mutandis the same
objection may apply; the differences between A, Β and C ought probably to be described
in the framework of some other more general intonational dimension (cf. Knowles this
volume).
'Deaccenting' and 'Default Accent' 163

tic patterns. According to Bearth, the 'high-low alternation' associated


with information value mostly takes the form of a modification of inher-
ent tones. Now these modifications are observed to occur in nominal com-
pounds and in constructions involving verb + object, e.g., in ways very
much reminiscent of integration (1980: 126, 128). More directly interest-
ing to our present discussion yet: 'relational' verbal categories seem to be
expressed by separate particles e. g. in Toura, one of the languages of illus-
tration, in the form of 'predicative markers', and these are susceptible of
the tone treatment according to information value. As is to be expected
from the question Bearth's article addresses, he explicitly compares the
phenomena described to those obtaining, e.g., in English. To the extent,
however, that the facts remain unclear for English, his treatment of their
tone language counterparts is approximate. The more we know about the
regularities of accent placement in our languages, the better we will be
able to solve related problems in typologically different ones. 33

References

Bearth, T . (1980). "Is there a universal correlation between pitch and information value?" In:
Brettschneider, G. & Lehmann, Ch. (eds.). Wege zur Universalienforschung. Tübingen:
Gunter Narr. 124-130.
Benveniste, E. (1966). "La phrase nominale". In: Benveniste, E., Prohlemes de linguistique ge-
nerale. Paris: Gallimard. 151-167.
Berman, A. & Szamosi, M . (1972). "Observations on sentential stress". Language 48:
304-328.
Bolinger, D. (1952). "Linear Modification". PMLA 67: 1117-44
Bolinger, D. (1958). "A theory of pitch accent in English". Word 14: 109-149.
Bolinger, D. (1961). "Contrastive accent and contrastive stress". Language 37: 83-96.
Bolinger, D. (1971). The phrasal verb in English. Cambridge, Mass.: M I T Press.
Bolinger, D. (1972). "Accent is predictable (if you're a mind-reader)". Language 48: 633-644.
Bresnan, J. (1971). "Sentence stress and syntactic transformations". Language 47: 257-281.
Bresnan, J. (1972). "Stress and syntax: a reply". Language 48: 326-42.
Brown, P. & Levinson, S. (1978). "Universals in language use: politeness phenomena". In:
Goody, Ε. N . (ed.). Questions and politeness. Cambridge: Cambridge University Press.
56-289.
Chafe, W. L. (1974). "Language and consciousness". Language 50: 111-33.
Fillmore, C. J. (1976). "Pragmatics and the Description of Discourse". In: Schmidt, S. J.
(ed.). Pragmatik/Pragmatics. Vol. 2. München: Wilhelm Fink. 147-174.
Fuchs, Α. (1976). " ' N o r m a l e r ' und 'kontrastiver' Akzent". Lingua 38: 293-312.
Fuchs, A. (1980). "Accented Subjects in 'All-New' Utterances". In: Brettschneider, G. & Leh-
mann, Ch. (eds.). Wege zur Universalienforschung. Tübingen: Gunter N a r r . 449-461.
Fuchs, A. (ms.) 'Instructions to leave': on integrative ('normal') accent placement.
Gibbon, Dafydd (1980). "A new look at intonation syntax and semantics". In: James, A. &
Westney, P. (eds.). New Linguistic Impulses in Language Teaching. Tübingen: Gunter Narr.
71-98.

33
I thank Harald Fricke, Richard Geiger and Dafydd Gibbon f o r close critical reading of
the manuscript.
164 Α. Fuchs

Grice, Η. P. (1975). "Logic and conversation". In: Cole, P. & Morgan, J. (eds.). Syntax and
Semantics. Vol. III. New York etc.: Academic Press. 43-59.
Halliday, Μ. A. K. (1967). "Notes on transitivity and theme in English". Part 2. Journal of
Linguistics 3: 199-244.
Hawkins, J. A. (1978). Definiteness and Indefiniteness. A Study in Reference and Grammaticality
Prediction. London: Croom Helm.
Keenan, E. O. & Schieffelin, B. (1976). "Topic as a discourse notion: a study of topic in the
conversation of children and adults". In: Li, C. N. (ed.). Subject and Topic. New York etc.:
Academic Press. 337-384.
Labov, W. & Fanshel, D. (1977). Therapeutic Discourse. New York etc.: Academic Press.
Ladd, D. R. jr. (1979). "Light and shadow. A study of the syntax and semantics of sentence
accent in English". In: Waugh, L. & van Coetsem, F. (eds.). Contributions to Grammatical
Studies: Semantics and Syntax. Leiden: E. J. Brill. 93-131.
Lakoff, G. (1972). "The global nature of the nuclear stress rule", language 48: 285-303.
Liberman, M. & Prince, A. (1977). "On Stress and Linguistic Rhythm". Linguistic Inquiry 8:
249-336.
Martinet, A. (1968). "Accent et tons". In: Martinet, Α., La Linguistique synchronique, etudes et
recherches par A. Martinet. Paris: Presses Universitäres de France. 141-161.
O'Connor, J. D. & Arnold, G. F. (1973). Intonation of colloquial English. 2nd, rev. ed. Lon-
don: Longman.
Schubiger, M. (1958). English Intonation. Its Form and Function. Tubingen: Niemeyer.
Searle, J. R. (1969). Speech Acts. An Essay in the Philosophy of Language. Cambridge: Univer-
sity Press.
Sgall, P., Hajitovä, Ε. and BeneSovä, Ε. (1973). Topic, Focus and Generative Semantics. Kron-
berg: Scriptor.
Stockwell, R. P. (1972). "The role of intonation: reconsiderations and other considerations".
In: Bolinger, D. (ed.). Intonation. Harmondsworth: Penguin Books. 87-109.
Worth, D. S. (1964). "Suprasyntactics". Proceedings of the Ninth International Congress of Lin-
guists. Den Haag - Paris: Mouton. 698-704.
DAFYDD GIBBON

Intonation as an Adaptive Process

1. Aims and scope of this study

This paper describes a procedural model for English intonation, and ap-
plies it to complex dialogue data. In the first section, descriptive (§ 1.1.)
and heuristic (§ 1.2.) assumptions are discussed. A set of descriptive cate-
gories for English intonation is presented in § 2, followed by an analysis
of complex intonation data in § 3 and an outline of a procedural syntax
for English intonation in § 4. In § 5, some extensions of the notion of
'process' to include 'adaptation to context' are proposed, and in § 6 the
main properties of the theory are briefly evaluated in terms of a dynamic
speaker-hearer model.
The main points developed in the paper concern
(1) the metalocutionary hypothesis (§ 1.2.);
(2) the status of discourse tokens (§ 1.2., § 3);
(3) the articulatory bases of intonation and their relation to perceptual
patterns (§ 2, § 5);
(4) the pulse accent theory (§ 2);
(5) iterative and recursive processes in discourse phonology, organised in
prosodic frames (§ 4);
(6) the adaptation of intonation processes to context, described as a feed-
back system (§ 5);
(7) consequences of this procedural approach with regard to the method-
ological problem of 'fuzziness' in intonation description (§ 2.3., § 6).
A major aim of this approach is to provide conceptual bridges between
phonetic, structural and functional aspects of intonation, and for this rea-
son concepts are adapted from a fairly wide selection of disparate fields,
from the physiology of speech to discourse analysis. Emphasis is laid on
analysing 'data in use' rather than intuitively constructed 'data'.

1.1. Intonation form and function


The view is taken here that the forms of intonation can initially be de-
scribed at a linguistic level sui generis, and that there is an 'interpretation
function' of a quasi-semantic kind mapping intonation patterns on to lo-
cutionary structures at various levels of description. This is a classic view
of the status of intonation, and is in one form or another characteristic of
166 D. Gibbon

most major authors from Pike (1945) through Jassem (1952) and Crystal
(1969). I have called this view, in an explicit form, the 'metalocutionary
hypothesis', since intonations mark, inter alia, structural properties of lo-
cutions and their functions in dialogue, acting as a 'suprasegmental' or, se-
mantically speaking, 'metalocutionary' system (Gibbon, 1980).
In addition to the classical component of this view, the following hypo-
thesis is maintained: the metalocutionary interpretation function presup-
poses a level-selection or domain-switching function which selects the
relevant level of locutionary structuring for intonational marking. This
level-selection function defines stylistic or functional variation in intona-
tion, and is sensitive to the discourse context. Locutionary structure is
more complex than the intonational means available for indexing it, and
includes at least the following kinds of structure as values of the level-se-
lection function and thus as domains for prosodic indexing:
(1) Word and syllable structure.
(2) Phrase structure: relations between syncategories and categories, spec-
ifiers and heads of phrases (e. g. in an X syntax), early vs. late linear
position (each paired with weaker and stronger accents, respectively);
constituent boundary indexing.
(3) Semantic structure: operator/operand scope, as with negation, quanti-
fication, degree and focus adverbs (stronger accent on the operand).
(4) Topical (semantic frame) structure: anaphora/contrast, other given/
new relations in topic development (also weak/strong).
(5) Speaker attitudes of modal (knowledge, belief, obligation) and apprai-
sive types (emphasis, pejoration, amelioration).
(6) Discourse organisation: turn-taking processes, speech act sequencing,
indexing of completion/noncompletion of dialogue constituents.
(7) Discourse type (style, genre, register) indexing: both via the selection
function itself and via specific genre markers (e.g. the 'chroma' fea-
ture of § 2.1., § 3).
For instance, in read-aloud citations, level (2) will tend to dominate; in
spoken narrative, (4) is more relevant, while in small-talk, (6) may be
dominant. A case of domain selection between levels (2) and (5) or (6) is
discussed by Bing (this volume), Boves & al. (this volume).
This descriptive background is stated here in nuce (cf. also Gibbon,
1982) since, although the body of the paper is more concerned with the
formal organisation of intonation processes, it presupposes this back-
ground.

1.2. Intonation and data complexity


The research strategy underlying the work described in this paper in-
volves, primarily, finding methods of systematically describing complex
intonation data, in particular discourse tokens paired with context specifi-
cations. The strategy proceeds from an informal 'interpretation' of forms
Intonation as an Adaptive Process 167

and functions of such data in the direction of a more precise formal de-
scription; an important feature of the strategy is that it is necessary to pro-
ceed along a broad front, taking form and function description along
roughly at the same pace.
It has proved heuristically useful to distinguish between two data types
and to try and combine use of these types in the descriptive cycle. These
two types centre around concepts such as paradigmatic, static, structural,
classificatory on the one hand, and syntagmatic, dynamic, procedural, raw on
the other. Some of these words may seem odd partners, and there is syste-
matic overlap between the groups, but the data types, which I shall refer
to here as 'P-data' and 'S-data', are real enough in practice. P-data are ed-
ited productions of sentences or sentence constituents, preplanned, read as
citation forms (i.e. as unimaginatively as possible), and paired with para-
digmatic metalinguistic judgments of constituency type ('sentencehood',
'intonation group') or more complex equivalences ('ambiguity', 'substitut-
ability', 'paraphrase'); these subsentential paradigmatic judgments are of-
ten supplemented by sentential functional, syntagmatic judgments on ana-
phora, subjecthood, 'focus-of. S-data are discourse tokens with context
specifications ('context' in the sense of 'cotext' as well as the more general
sense), from a variety of registers along scales such as 'most spontaneous'
to 'most rehearsed'; the context specifications are syntagmatic metaliguis-
tic judgments on function in discourse, and paradigmatic judgments on
constituent types are regarded as derived from these (cf. also Gibbon,
1976a: 44f.).
In developing precise descriptions of selected points, P-data are indis-
pensable but they embody certain heuristic assumptions:
(1) The style/register of pre-planned utterances read and judged as cita-
tion forms is representative of all speech and allows access to the un-
derlying 'language'.
(2) The 'clear case' technique based on straightforward P-data judgments
allows quasi-inductive 'predictions' to be made about less clear cases
on the basis of an exact theory, providing a suitable strategy for dis-
covering the underlying 'language'.
Both of these assumptions are, however, questionable. To avoid the res-
trictions which they bring, a therory, whether developed using P-data or
S-data or both, must be tested continually on S-data in order to falsify ex-
isting descriptions by extending the data base. One valuable contrary tech-
nique to (2) is, in fact, the description of easily isolated, conspicuous, but
by no means 'clear' cases; continued interest in 'call contours' (cf. Gibbon,
1976: Ch. 4; Ladd, 1978), at first sight a peripheral issue, is a case in point.
This technique is, no doubt for psychological rather than logical reasons,
widespread but unsung in all branches of linguistics. A reason for using
S-data, counter to (1), is to account for 'functional' properties of speech:
'functionalistic' need not mean 'seeking teleological explanations', it can
168 D. Gibbon

also mean a descriptive approach with the sense of 'embedded in a contex-


tual matrix'. In this sense, 'function' is no less a formal or structural no-
tion than that of grammatical function. In a process-oriented approach,
S-data therefore have more than a mere alibi role in linguistic heuristics.
Within the present research strategy, S-data have also made in desirable
to incorporate into the description as coherent an account of the process
properties of speech as possible; these include the temporal organisation
and the functional or stylistic variation of speech. If intonation is con-
ceived as the temporal organisation of pitch in speech, not an over-con-
troversial conception, then P-data at once become highly suspect where
intonation description is concerned. Likewise, some of the very functions
of intonation are to mark styles or speech registers, again restricting the
utility of P-data.
In view of S-data judgments on temporal organisation and functional
variation, distinctions such as language/speech, competence/performance,
programme/execution seem too simple to be of much relevance. Where a
'real-time job' (e.g. speaking) is involved, it seems reasonable to assume
that any programme will take a real-time environment into consideration.
P-data, however, may convey an impression of a-temporality, owing to
their pre-edited nature. This study is a step in the direction of a process-
oriented explanation.

2. Descriptive categories for English intonation

The descriptive categories required for the analysis of English intonation


are, fundamentally, dynamic relations between articulations and percepts;
a fully described category would be a pair (articulation, percept). The term
'percept' refers to the user's perception both of his own productions and
of incoming singals; in the first case they include proprioceptive as well as
auditory percepts (cf. also § 5 and § 6). The articulations are thoracic and
laryngeal gestures, and the percepts are essentially pitch height and trajec-
tory gestalt impressions; temporal patterning of intensity plays a related
role. Presumably, the relations containing these pairs are learned during
the earliest stages of speech acquisition. It should be noted that the rela-
tion between the phonological features outlined here (e.g. Cricothyroid
pulse) and the physiological reality (e. g. the plurality of factors affecting
cricothyroid muscle operation) is not a simple one; the categories are no
less abstract than the segmentally relevant articulatory categories (e.g.
Coronal, Voicing, & c.), and are mnemonically convenient abbreviations
for complex activites (cf. Hardcastle, 1975).
The relationship between articulations and percepts is syntagmatic, part
of a process sequence within a feedback cycle, not a static 'correlation'. As
Intonation as an Adaptive Process 169

an expository aid, however, indications of 'correlations' will be provided,


using the symbol ' =
Articulatory categories will be in the foreground of attention, since they
allow a simpler overall description. Three kinds of descriptive category
will be distinguished: primitives (§ 2.1.), complex processes (§ 2.2.), and
strategies (§ 2.3.).

2.1. Primitives
The primitives are local (short-scope; roughly speaking: word-oriented)
and global (long-scope; roughly speaking: phrase/sentence/text-
oriented).
The local primitives are accent pulses and their modifications (some
modifications being reflexes of global features determined by construc-
tions or strategies).
(1) Accent pulses. Accents are conceived as laryngeal pulses. The larynx
is regarded as a complex elastic body whose steady state with respect to
vocal cord tension is altered by 'stretching' ('Cricothyroid') and 'com-
pressing' ('Sternohyoid') pulses. After a pulse, the larynx returns (other
things being equal) to its pre-pulse state. The auditory reflexes of these
modifications are a pitch change in one direction on the leading flank of
the pulse, followed by a change in the reverse direction after the pulse.
The articulatory 'correlates' are conceptually simpler than the auditory
patterning, and are presupposed by the latter. They are more abstract,
however, from an empirical point of view. Aspects of the pulse theory of
accent are shown pictorially in Figure 1.
The two phonological accent types are (note that the names are not in-
tended to imply a simple association with single muscles):
i. Cricothyroid pulse ( = upward pitch movement), denoted by 'f';
ii. Sternohyoid pulse ( = downward pitch movement), denoted by 'J,'.
The I pulse can be felt with a finger as a narrowing of the intercartilage
ridge at the front of the larynx, the J, pulse sometimes as a lowering of the
entire larynx. Not all short-range pitch movements are functions of laryn-
geal pulses. Pitch perturbations of supraglottal origin, the use of subglot-
tal mechanisms, and other factors, affect vocal cord stretching and com-
pression. Bolinger's Accents A and C could be partially explicated in these
terms as f and J, respectively; the ensuing fall or rise is a natural auditory
reflex of the post-pulse relaxation predicted for unmodified cases by the
laryngeal pulse theory. Other more complex forms of accent are due to
pulse modifications. Similarities with Lieberman's theory (1967) are evi-
dent, at least for the \ pulse.
(2) Accent modifications. The central accent modification is pulse am-
plitude^ degree of pitch change, = Δ ί 0 (i. e. change in fundamental fre-
quency), = prominence), which realises accentual gradation. A second set
of accent modifications affects pulse timing relative to syllable structure:
170 D. Gibbon

Pulses with (cf. Bolinger's


normal fade A and C)

. . . plus delayed (cf. Kingdon's


triggering delayed fall)

. . . plus slow (cf. rise-fall)


rise time

. . . plus delayed
(cf. 'emphatic
triggering and
rise-prefixes')
slow rise time

(cf. 'list' &


. . . plus tension 'interrogative'
increase terminals)

. plus equilibration (cf. 'call'


contours)

Duration of accented syllables

Figure 1: Schematic outline of pulse types and modifications (baseline modifications not
shown); comments apply mainly to rise pulses.

i. pulse leading flank:


a. delayed triggering (syllable-initial, post-syllable-onset, post-syl-
lable-nucleus, post-syllable), as in Kingdon's delayed rise-fall, Bo-
linger's variant of Accent A (characteristic of some dialect and
style variation and some types of emphasis, interacting here with
syllable onset timing features which produce fortition, e. g.
stronger aspiration);
b. slow rise time involving syllable lengthening (accounting for some
rise-fall tones, Armstrong & Ward's emphatic rise-prefix, King-
don's 'homosyllabic prehead');
Intonation as an Adaptive Process 171

ii. pulse trailing flank:


a. fade (the neutral case - cf. remarks above);
b. cricothyroid tension increase ( = rising pitch, Φ pulse);
c. laryngeal equilibration ( = level tone, cf. 'chroma', Gibbon, 1976a,
b; 'stylization', Chao, 1956; Ladd, 1978; Brazil's ο tone, 1978 &
this volume).
It seems plausible to postulate a more abstract category of accentual
pulse, with pulse polarity as a modification, e. g. with f as unmarked and
J, as inverted, i. e. marked pulse. Evidence for this may be seen in Bing's
discussion of equivalences between Bolinger's A and C accents (1980), as
well as in the peakline-baseline complementation feature discussed below
under (2) v.
The global primitives can be divided into boundary features and base-
line features: they are properties of complex processes (cf. § 2.2., and the
γ-frames and π-frames of § 4.).
(1) Boundary features. The boundary features mark the beginning and
end of laryngeal pulse trains:
i. Initial boundary: a complex function of Anacrusis/Prehead and
Head/Onset height specification, where the function of Head height
(Crystal, 1971; cf. also Trim, 1964: 378; Brazil's 'key'; Lehiste's 'para-
graph' boundary marking, 1975; Brown & al.'s 'initial stressed peaks',
1980: 136; cf. also Ozga, 1980) appears to be primary, strategically
modifiable by Prehead height (cf. Bing's initial boundary tone, 1980).
In articulatory terms, Head height appears to be contextually deter-
mined to some extent by subglottal conditions.
ii. Final boundary, with several different feature types:
a. Lengthening of accented syllable and other timing factors involv-
ing delay, such as 'terminal pause';
b. Pulse mirroring: a sequence of [a pulse] A [ — a pulse] pulse (the
use of square brackets and ' — a ' is that of generative phonology;
' Λ ' means concatenation), with the second pulse before the soon-
est segment which is next lowest in sonority in the subsequent un-
accented stretch (cf. Liberman's similar explanation of tone dis-
tribution in 'call contours', 1978). This also allows a generalisation
over call contour phenomena and some other contour types. If the
specification is [a = f], then the following inversion, [ — a], or j.,
can be explained functionally as accelerating the natural post-pulse
decay which already receives perceptual prominence by virtue of
the syllable lengthening function (cf. also § 2.3., the Principle of
Sequential Pitch Contrast).
c. Cricothyroid tensing ( = pitch rise).
d. Larynx equilibration ( = pitch sustention).
The occurrence of pulse mirroring, tensing and equilibration in both lo-
cal and global functions may be seen as an indication of a functional am-
172 D. Gibbon

bivalence of 'level' assigament which is characteristic of prosodic pattern-


ing (cf. § 2.3.).
(2) Baseline features. Throughout the history of English intonation de-
scription, indications are to be found that two kinds of 'baseline' are in-
volved: a 'baseline proper', a virtual pitch value toward which unaccented
syllables tend (depending on their relative proximity to a pulse), and a line
along which auditory pitch peaks tend to be aligned (cf. Bolinger's 'tan-
gent', or B o d y / H e a d contours). I shall call the former baseline and the lat-
ter peakline. T h e most similar recent phonological uses of this distinction
are by Brown & al. (1980) and Bing (1980). Given a baseline and a peak-
line, a number of global relational properties of prosodic patterning can
be defined over (peakline, baseline) pairs:
i. Bandwidth ('span', 'range') is a simple peakline-baseline relation de-
fined as pitch difference ( = Af 0 ).
ii. Peakline slope, [a Apeakline] (i.e. type of change in peakline: up,
down, level), with downdrift as the neutral case and a rise (cf. Pal-
mer's 'scandent head') or more complex tangents as a marked case.
Whether peakline slope induces baseline slope (or vice versa) is an
open question. Three main kinds of 'level' peakline slope may be dis-
tinguished:
a. low, as in relaxed conversational reporting style, often taken as the
standard American English pattern but also characteristic of anal-
ogous styles in other dialects of West Germanic languages;
b. sustained mid, as in one speech genre of overt narration (as in E(^)
of § 3); this feature is similar to the laryngeal equilibration noted
twice above (cf. § 2.3.; and iv, below);
c. high pitch, as in 'uncontrolled', excited speech.
iii. Peakline-baseline convergence, [peakline —<· ρ baseline], again of two
types:
a. an unmarked type, the 'trailing o f f ' which is presumably allied to
neutral downslope and may not need to be postulated separately
from this;
b. sustained convergence, as in mid-discourse 'narrative tags' (cf. § 3,
§4).
iv. Peakline-baseline 'identity', [peakline = p baseline]: this is strict
'chroma' or 'stylization', which occurs in its most extreme form in
chants and song (Chao, 1956), and is entirely dependent on larynx
equilibration supported by negatively phased auditory feedback (cf.
§ 5 , § 6).
v. Peakline-baseline complementation, in which relative peakline-base-
line heights are inverted; this is an interesting theoretical possibility
with considerable descriptive potential. Figure 2 illustrates a recorded
example, in which the upper frame is a train of | pulses (with 'up-
ward' relaxation indicated by rising lines), and the lower consists of f
Intonation as an Adaptive Process 173

pulses (with 'downward' relaxation indicated by falling lines). This


feature provides a device for describing certain features of style and
dialect variation in a unified fashion, and distinguishes some global
uses of \ / \ alternation from the local uses already noted, with cases
such as the following (where means ' a is lower in pitch than β'
and α)^β means ' a is higher in pitch than β'):
a. Edinburgh dialect: [peakline )p baseline], Glasgow dialect: [peak-
line (e baseline] (Brown & al., 1980: 19);
b. Nortn German dialects: [peakline )p baseline], South German dia-
lects: [peakline L· baseline] (an oversimplification, but cf. v. Essen's
"Zickzackmelodie", 1956);
c. In standard English dialects, child-adult and adult-child speech,
with [peakline (p baseline], often functionally alternating with
[peakline ^baseline] (cf. Figure 1).
d. As a stylistic discourse strategy in German (cf. Gibbon & Selting,
1983).

π-frame
[peakline(p baseline]

BASELINE

PEAKLINE
young hedgehog trundled along through the leaves in the greenstuff in the wood . .

n-frame
peakline) p baseline]

PEAKLINE

BASELINE
he'd never been outside of the wood before

Figure 2: U s e of different peakline-baseline specifications to mark stages in the development


of an adult-to-child narrative (from a B B C Children's H o u r broadcast).

2.2. Complex processes


The problem of representing intonation patterns in formal terms is still
unsolved, though a wide range of suggestions has appeared in recent
years. In the present theory, the line taken is different again, and uses out-
line augmented transition networks to define construction types as pro-
cesses. In order to clarify the distinction between two fundamental posi-
174 D. Gibbon

tions with regard to notational systems, it is expedient to distinguish be-


tween categorial and procedural explications of intonation structures.
For some aspects of intonation structure, a representation of depend-
ency relations (e.g. head/satellite) is required (cf. § 1.1., especially accen-
tuation). For this purpose a temporally uninterpreted phrase structure or
categorial syntax would be appropriate. For instance, if α is an Intonation
Group, and β is a Nucleus, then α/β is the category of Head ( = Preto-
nic); if γ is a Head ( = Onset), then γ\(α/β) is Kingdon's category of
Body, i.e. the sequence of accented and unaccented syllables between
Head ( = Onset) and Nucleus. Semantic interpretations (e.g. in terms of
'topic' and 'comment' or more structured explications of such rfotions) can
then be built constructivistically on these categories; cf. remarks above in
§ 1.1. on semantic interpretation. The integer notation of Chomsky &
Halle (1968) and particularly the (s, τι)) notation of autosegmental and
metrical phonological theories can possibly be re-analysed as approxima-
tions to categorial syntaxes for intonation.
However, this kind of syntax seems likely to be too rigid for much of
the prosodic patterning found in English. There are other formal proper-
ties of intonation constructions for which a procedurally interpreted auto-
nomous rewrite grammar or a transition network are more suggestive.
The latter approach will be taken here. A distinction will be made between
fixed, iterative and recursive processes. The thesis is put forward here that
all three process types are required to describe the prosodic patterning of
discourse. They are not necessarily all required for the description of P-
data.
Fixed processes are items consisting of fixed sequences of global and lo-
cal specifications. They are found in the stereotyped patterns associated
with pragmatic idioms (greeting, interjections, exclamations, & c.), and in

Levels of analysis Structures postulated at each level


1. Major accentual
pattern, 'bar': (head - (break(s))) - nucleus

2. Accentual sequen-
cing: aj—. . .—aj|—. . .—a;n—. . .—am

3. Syllabic sequen-
cing: si-. . -Sg- . -Sjp. · -Sj 0 -. • -Sk- · -s p

Figure 3: Symbolisation of levels of analysis in partial explication of Klinghardt & Klemm


(1920), from Gibbon, 1976: 106 (the bracketed plural - '(s)' - is used to denote an
optionally iterative category).
Intonation as an Adaptive Process 175

some theories they are postulated as the only type, enumerable in an in-
tonation lexicon (Liberman, 1978). They will not be considered here.
Iterative processes have frequently been postulated for the description
of accent sequences in more or less explicit fashion for many decades. In a
previous critical survey of research I represented one iterative property of
'Tone Groups' in the diagrammatic form of Figure 3; in Fox (this volume)
and Halliday (1967), this iterative property is made explicit. Simple tran-
sition networks formulated as finite state machines may be used to de-
scribe this property of intonation patterning. Since 1978 I have been using
transition networks for instruction purposes to synthesise 'stylisations' of
pitch accent sequences in English (in a sense approximating to the usage
of t Hart) with a microcomputer (cf. Gibbon, 1981b); networks and tones
are interactively definable by the operator. Several approaches have used
similar notions (Reich, 1969; t Hart & Cohen, 1976; Gibbon, 1981a; Pi-
errehumbert, 1980). It is evident that a formal explication of rhythm,
among other things, will have to rely heavily on an interative principle at
some point, whether this is understood, in real terms, as a cerebral 'clock'
frequency or as a perceptual scanning and pattern-recognition principle
(cf. Neisser, 1967).
In a previous study I have shown that in German at least two levels of
iterative process can be identified in complex expository discourse (Gib-
bon, 1981a). At the lowest prominence level, there are sequences such as r,
r r, rr r, . . . with various explainable 'exceptions', where V symbolizes a [
accent and V'symbolises a more prominent [ accent (i. e. with higher pulse
amplitude or following cricothyroid tensing). At the next prominence lev-

[loytnse]

• » © - h S
Figure 4: Outlines of augmented transition networks for the German data from Gibbon
(1981).
176 D. Gibbon

el, sequences such as f , r f , rrf,. . . (with intervening lower prominence


pulses) occur, where r has the same meaning and / d e n o t e s a f pulse with
post-pulse syllable lengthening and, optionally, leading flank modifica-
tions. This structure is generated by the network represented in Figure 4
(cf. also Figure 5).
Each of these levels is obviously right-regular. However, the similarity
between the levels suggests a limited degree of recursion as well as itera-
tion. This suggestion makes for 'flatter' structures than binary trees with
(s, w) pairs, where this distinction is not made; a result which also has intu-
itive appeal. Complex prosodic patterning in S-data therefore points to-
wards a necessity for combining the two kinds of 'control flow'. Figure 5
shows a simplified network for some aspects of the data discussed above
(A), and a generalisation of this network (B) which uses recursion, taking
an initial maximum or 'nuclear' prominence assignment (e.g. the integer
1) as input and then choosing between pulse output or calling itself with
reduced prominence value. On completion of interation, a terminal pulse
with the currently specified value of β added to a minimum pulse value α
is generated. A complex analysis and more highly structured model is dis-
cussed in § 4; for this model, two types of complex process dominating ac-
cent sequences are postulated: γ-frames and π-frames, concerned with the
global specification of baseline and boundary features; respectively.

IrtfH
Figure 5: A (top). Iterative network equivalent to Σ2 and Σ3 of Figure 3. Higher values of α
are interpreted here as higher pulse prominence (a function of pulse amplitude and
polarity).
Β (bottom). Generalisation of the above by including recursion. The current value
of β, a pulse prominence increment value, is passed to the network as a parameter
on each call. Lower values of β are interpreted as higher pulse prominence.
Intonation as an Adaptive Process 177

2.3. Strategies
The preceding discussion has shown several points at which similar pro-
cesses appear to be in operation at different levels of structure. One rela-
tively trivial set of examples encompasses 'relaxation' after a f pulse and
the overall downslope or downdrift of the peakline; in this set, various ar-
ticulatory mechanisms contribute towards producing an auditory impres-
sion of falling pitch. A perceptually or functionally oriented priciple apply-
ing here might be a principle of cooperation of means. Another case could be
the use of laryngeal and subglottal gestures with timing modifications to
intensify a paticular auditory effect such as a pitch rise. On the other
hand, the use of a low Prehead to modify a high Head, producing a com-
plex initial boundary tone, or of wide pitch bandwidth (peakline minus
baseline), or of alternating f and j. pulses, or of pulse pairs to acceler-
ate a pitch fall at terminal boundaries, points to the operation of another
perceptual principle: a principle of sequential pitch contrast. Finally - con-
trasting with the last-named principle - in level tones, in sustained-peak-
line speech, and in stereotypic speech involving [peakline = ρ baseline]
specifications, a principle of equilibration appears to be in operation, with
the two laryngeal functions operating antagonistically and under the stabi-
lising influence of negatively phased auditory feedback: a pitch drop trig-
ger the I mechanism, and a pitch rise triggers the { mechanism (and/or
the respective subglottal mechanisms). The effect is to counteract accen-
tual overshoot (for f) or undershoot (for | ) , as well as to work against re-
laxation and downdrift effects. In a previous study of 'call contours'
(1976a: Chap. 4.3.) I called this functional feature of discourse phonology
'chroma', taking the term from Bachem (1950); see also § 5 below.
It is not clear how the notion of strategy is to be integrated into the
overall description; the ideas behind the notion are simple, however. First,
strategies are intended to represent the complementary functional status
of perceptual as against articulatory processes within a dynamic feedback
cycle. Second, they represent an attempt to generalise over processes
which are phonologically different in structure but related in function.
Third, if considered without a hypothesis on underlying articulatory pro-
cesses, they are likely to lead to indeterminacy and uncontrolled variety in
descriptions, since arbitrary decisions have then to be made in chossing
between perceptual models. That there is indeed little clarity in this field is
perhaps informal support for the necessity of a dynamic production-per-
ception relation such as that postulated here.

3. Intonation in dialogue structure

The metalocutionary hypothesis outlined in § 1.1. is part of theory of in-


tonation semantics pertaining to properties of locutions which are indexed
178 D. Gibbon

by intonational processes. T h e present section is an attempt to combine


this approach to intonation semantics with the phonological intonation
theory developed in § 2, by modelling selected aspects of the intonation of
a discourse token, and sketching their relationship with - in this case - se-
mantic aspects of dialogue progression (mainly levels (4) and (6) of § 1.1.).
T h e semantic mapping itself is not explicitly modelled.
T h e discourse from which the present examples are taken is on a com-
mercially available recording, which readers should consult (Crystal &
Davy, 1975, Extract 4; the accompanying book contains a tonetic tran-
scription). It is an extract from an informal tale-swapping session among
acquaintances, in which overall semantic coherence is provided by a single
superordinate frame: pigs, experiences with pigs, and pejorative appraisal
of these experiences. T h e extract consists of three anecdotes, each with its
own distinct subframe, with bridging dialogue; only the first will be con-
sidered in detail. The anecdotes and bridging portions constitute two
kinds of discourse:
(1) Narrative within a 'stable' semantic subframe, with a single, long, un-
disputed narrative turn.
(2) Subframe negotiation between anecdotes with 'unstable' turntaking
patterns which include considerable turn-overlap.
This patterning suggests that an appropriate model for such discourse
will have to include a self-regulating, context-testing principle to account
f o r the alternation of 'stable' and 'unstable' group interaction states (cf.
also § 6). T h e first of these state types will be examined here, using Anec-
dote 1. T h e 'stable' semantic subframe on which it is based will be de-
scribed, followed by a partial account of its intonational marking in terms
of the categories outlined in § 2.

3.1. A 'stable' narrative contribution to dialogue


In a previous study (1981a: 93 ff.) I have described how intonation pat-
terns mark the more or less cooperative development of topical structures.
Development occurs in several stages or substages within a previously ne-
gotiated semantic frame or subframe. Depending on discourse type,
frames and subframes may hold f o r one turn or a whole sequence, and al-
though the turn-taking structure of the present example is quite different,
the principle of adherence to a negotiated semantic frame or subframe still
holds. To provide a context for the present discussion, the anecdote is re-
produced here in a purely orthographic transcription; for more detail,
readers should consult the tapes and the published tonetic transcription.
A. O h , and one pig died, because it ate too much.
B. O h really?
A. O h it was revolting. Oh, they were terrible, the pigs.
C. O h . . .
A. They made a dreadful row in the morning when it was feeding time.
Intonation as an Adaptive Process 179

And one pig, it was erm a young pig, about that size, you know, mid-
dling, anderm it was dead, and it was lying there. I'd never seen a dead
pig before. Absolutely stiff.
B. Di- the children saw it, did they?
A. Oh they were engrossed, you know.
C. Oh yes!
A. It was marvellous, erm they thought this was wonderful. And erm they
asked why it was dead, and er the farmer apparently didn't want his
wife to know, because he'd overfed them before and she'd been furi-
ous - and of course he was trying to keep it from her. But all the kids
were agog about this dead pig, and he was telling them not to tell the
farmer's wife
D. Yeah.
A. and all this . . . So this pig was absolutely dead, so they put it on - they
have a sort of smouldering heap that smoulders all the time - so they
went to burn the pig, and all the kids (laughter) were hanging over the
gate watching this pig. And they were very taken that the pig had died
because it had eaten too much, you know.
D. What a marvellous death!
B. A moral in that somewhere!
This anecdote elaborates a semantic frame with the structure "(ρ be-
cause q) & f , where:
ρ stands for "one pig died",
q stands for "it ate too much",
r stands for an appraisal (perhaps to be glossed as 'fascinated pejora-
tion') of "p because <f or its constituents.
Topical development within this frame is as follows:
Stage 1 (statement of the frame): "one pig died because it ate too much".
Stage 2 (elaboration of the frame as a function of the elaboration of its consti-
tuents, in the following order):
Ει(ρ): "and one pig it was erm a young pig . . ."
E(r): "oh they were engrossed you know . . ."
E(<7): "they asked why it was dead . . ."
Ei(p): "so this pig was absolutely dead . . .".
Stage 3 (restatement of the frame):
r: "and they were very taken that"
ρ: "the pig had died" because
q: "it had eaten too much".
This development is marked in several ways. First, it is very noticeable
that turn-taking is highly restricted, limited to a clarification and a few in-
terjections; this contrasts strongly with the following section of the dia-
logue, in which a new semantic frame is negotiated. Second, topical devel-
opment in the anecdote is marked by conjunction patterns. Of particular
interest are the largely asyndetic sequences associated with r, coordination
180 D. Gibbon

as a mark of improvisation in E(<j) - cf. andern, because•, and, and of course


(« therefore), but, and - and, most notably, the "restart" signals, after
breaking off an entagled construction, in the form of three occurrences of
so. Third, developments in topical structure are marked by different dinds
of prosodic frame; these structures will be informally discussed in the next
subsection and analysed in § 4.

3.2. Preliminary observations on pitch patterning


The intonation patterns in Anecdote 1 offer a number of interesting fea-
tures which are not captured well by a linear, in-text tonetic transcirpiton.
The main points to notice for present purposes are the following:
(1) Pause organisation is largely independent of the organisation of
long pitch stretches;
(2) There are significant relations between long pitch patterns and ele-
ments of discourse structure, such as high pitch Heads coupled with
downslope (cf. the beginning and end of the anecdote), pitch lowering rel-
ative to neighbouring sequences (e. g. oh they were terrible the pigs), or the
tendency towards pitch sustention at a mid level (following didnt want his
wife to know) both on level and on longer stretches within E(g).
(3) A number of 'special effects' such as higher pitch, rise-fall tones,
mark lexically appraisive items such as oh, terrible, engrossed, marvellous.
One problem associated with in-text notations is that they embody a
large number of implicit judgments about the phonetic properties of the
environments of accentual sequences as well as of particular syllables, and
possibly also judgments about properties of accompanying locutions. The
following three excerpts show several different kinds of 'onset', whose dif-
ferences do not come out in such transcriptions, and whose similarities are
not at all evident on either perceptual or distributional grounds (| = 'on-
set', f = 'booster', not pulse accent in the sense of § 2; cf. Crystal &
Davy, 1975):
- I and erm it was |d6ad (a pause separates the 'onset' from the rest
of the tone group);
- erm they I thought this was |w0nderful ('thought' is part of a very
rapid mid-to-low sequence a long prehead with wonderful as onset
and nucleus?);
- a n d e r · the I farmer · apfparently· f didn't want his f wife to k n ö w (first
two accented syllables rhythmically separate from the rest - perhaps
three onsets, the first two as 'orphans' - i. e. not part of a completed
tone group - and the third, didn't, higher in pitch than par, as onset of
the completed tone group?).
On the other hand, the perceptual and distributional similarities be-
tween onsets and boosters, as well as their relationship with pausal struc-
ture (cf. the third example) suggest that rigid application of full tone
group structure to all occurrences of accent sequences in tokens is wrong,
Intonation as an Adaptive Process 181

and that a less Procrustean approach which allows a linear development


with the possibility of uncompleted sequences, on the one hand, and in-
cludes non-accentual criteria (such as overall pitch height and pitch band-
width) for various different kinds of prosodic grouping on the other, does
more justice to the facts. A systematic description along these lines is at-
tempted in § 4, where sets of representations of quasi-autonomous levels
of organisation (following the descriptive strategy of Jassem & Gibbon,
1980, Fig. 3) are used, rather than in-text, single-level representation
strategies; the latter are useful for many purposes, but often unrevealing.

PROSODIC π-frame
FRAMES: "p-line > p b-line
high init. pitch
Ap = wide, downslope

γ-frame γ-frame
downslope downslope
convergence convergence

[high] [ ong syll.]


A Η
[high]
ACCENT
Polarity: α-Τ [a=l]
Modifica- inc. ampl.
tions: long pulse

LOCUTION: [cram worn pig daid bik? it Et tu m a t j ]


Figure 6: Prosodic frames and selected specifications of selected primitives f o r "ρ because tf
(one pig died because it ate too much) in the Crystal & Davy corpus extract (see text).
182 D. Gibbon

Figure 7: Prosodic frames and selected specifications for Ei (p) from the Crystal & Davy
corpus extract, showing 'warm' restart ('warm' since existing top-level π-frame
specifications are used). Mapping of prosodic categories on to the locution is sym-
bolised by parentheses and dotted lines; forced closure of open γ-frames after the
restart is symbolised by a large square bracket.
Intonation as an Adaptive Process 183

4. Prosodic frames in discourse

At the risk of being accused of reading too much into a limited set of ex-
amples, I shall restrict attention to "p because cf in Stage 1 and Ei(p) of
Stage 2 in the dialogue concerned. It would be scarcely possible to cover
more detail in the space of a single article. T h e prosodic structures of the
relevant extracts are represented in Figures 6 and 7. The main aspects of
an appropriate syntax for such structures are shown in Figure 7.
T h e nodes π and γ are interpreted as organisational frames; the pro-
cesses which operate within these frames have access to features denoted
by the 'multilabels' (Eikmeyer, 1980) which are attached to nodes in the fi-
gures.
The overall picture conveyed by the figures is relatively complex; how-
ever, at each level organisation is relatively simple, as Figure 8 shows. T h e
principles underlying the interrelationships between levels are also simple,
though the combinatory possibilities are large, leading to complex struc-
tures in actual use. T h e apparent complexity of specific structures is per-
haps an artefact of the analysis, a kind of 'optical illusion' to which all an-
alytic description is prone. Conditions on mapping between specifications
at different levels of embedding cannot be discussed here.

π-frames
Observable features associated with π-frames are:
(1) long-scope baseline feature specification, in particular [peakline ) p
baseline], downslope, peakline-baseline convergence.
(2) High initial peakline value (ultimately manifested at ß;); cf. the 'in-
itial boundary features' of § 2.1. above. By this criterion, π corresponds
approximately to Fox's "paratone group" (1973) or Brazil's "pitch se-
quence" (1978).
(3) In Figure 6, peakline modifications can be observed in two cases; in
each case, the category with altered peakline is marked as a π-frame.

γ-frames
Observable criteria for γ-frames are chiefly boundary criteria (see below),
though the global π-frame specifications (e.g. downslope) are also ac-
cessed in the internal organisation of γ-frames; these also show internal
downslope (to a lesser extent than the longer γ-frame sequences within π-
frames). In different terminology, the higher-level specifications 'perco-
late' down to the lower level. T h e superordinate π-frame peakline may be
determined heuristically from the heights of Heads of successive y-
frames.

Recursion and iteration


The π-frames are recursive with a peakline change at each recursion; this
184 D. Gibbon

[pÄb.-linel
[specs J
ε* ©—•o

Επ
.
Figure 8: Networks for description of structures represented in Figures 5 and 6. Note the
two kinds of recursion of π-frames: right-branching, in, Σ π and centre-embedding,
in Σ α (cf. "narrative tag" and parenthesis marking, respectively). Application to
other data types will necessitate specifying more end states, inter alia. Details of
ATN tests and actions based on § 2 are not included; a full phonetic interpretation
requires such specifications.

is an empirical claim based on observation of peakline (especially Head


height) patterning. Three cases of π-frame recursion are shown in Figure
6 (the recursion analysis is also reflected in the parallel organisation of lo-
cutionary structure at these points: discontinuities with embedding also
occur). Exit from recursion is either regular (perhaps here in the paren-
thetic you know?), with return to the previous peakline, or by 'purging',
i. e. an inelegant or 'brute force' return to the highest level, involving re-
initialization to the original peakline value, not a continuation of the im-
mediate preinterruption value (cf. anderm in Figure 6, also marked by slow
timing). A similar event occurs at the locutionary level: a kind of 'left dis-
location' with an anaphoric ('resumptive') subject it which may be inter-
preted procedurally in the same way. A further non-prosodic signal of a
similar process is the use of so, already noted in § 3.1.
The γ-frames are not recursive, but iterative within π-frames. The π/γ-
frame distinction is obscured in the customary p-data, the speech style of
reading sentences aloud, since π-frames and γ-frames tend to be co-exten-
sive in such data. Within γ-frames, pulses occur in a lower level iterative
control structure. In previous intonation studies concerned with these
questions this distinction between recursive and iterative control flow at
different levels has not been made. The networks shown in Figure 8 re-
Intonation as an Adaptive Process 185

present the levels concerned. The highest level system, Σπ, first defines the
global specification and then calls the next system, Σγ. On return to the
highest level, recursion to Σ π itself is possible, yielding 'narrative tags' or
similar marking stuctures. The Σγ system generates initial boundary speci-
fications, transfers control to the accentual rhythm system, Σ α , and on re-
turn generates terminal boundary specifications; it has the option of ite-
rating (looping). T h e lowest level system, Σ α , iteratively generates
accentual pulses or, alternatively, recurs to Σ π to generate parenthesis
markings. Accent iteration explicates the notion of 'subjective rhythm'.

Boundaries
The boundaries of a γ-frame are designated β; and ß t . The initial bound-
ary of a γ-frame is, in the illustrations, marked by a f pulse of high ampli-
tude; if the γ-frame is, in turn, π-frame-initial, then the pulse is higher still
in amplitude.
A criterion for the initial boundary of a γ-frame is, in a large number of
cases, Anacrusis ('A' in the figures), i. e. a sequence of unstressed syllables
spoken more rapidly than elsewhere and not in the general rhythmic pat-
tern (Jassem & Gibbon, 1980; Jassem, this volume). Anacrusis is taken to
mark the initial unstressed segment of a γ-frame before the Head (the first
accented syllable of a γ-frame). T h e initial boundary is complex, not
correlating with a single boundary tone, but consisting of both Anacrusis
(where present) and Head. The unmarked β; has a mid-pitch Anacrusis
and a f pulse, ( = high Head, Onset). In one fairly common marked case
(cf. the fixed parenthetic expression you know in Ei(p), Anacrusis is high,
followed by a I pulse, ( = low Head), showing the Principle of Sequential
Pitch Contrast. Anacrusis is an important category, both heuristically and
theoretically. Its phonetic properties, as a fast, arhythmic series of un-
stressed syllables (Jassem) and separated by a pitch level change from the
preceding group (Crystal), signal the start of a new γ-frame. Three special
properties of Anacrusis are illustrated in the present corpus extract:
(1) It operates together with the category Head to form a complex
boundary signal β;; the neutral case is lax, i.e. baseline-level Anacrusis
with I Head. Tense ( = high pitch) Anacrusis also occurs, either with f ,
forming an overall tense β;, or with |(as in you know), creating a conspicu-
ous boundary effect by means of the Principle of Sequential Pitch Contrast
(§ 2.3.).
(2) It is not necessarily separated from the preceding γ-frame by a
pause; in several cases, it follows the preceding ß t extremely rapidly
(marked as [fast] in the figures).
(3) It is also independent of, and may be interrupted by, minor hesita-
tion stretches, filled or unfilled. A criterion for continuity of the Anacrusis
in such cases is continutity of pitch contour near baseline level. It has al-
186 D. Gibbon

ready been noted that the hesitation type andern has different discourse
status.
The γ-frame terminal boundary ß t has different properties from β;- the
traditional terminology of Nucleus and Tail is adopted here for exposi-
tion, but not used in the figures. The neutral case of a ß t is simply a f pulse
followed by a fade at all levels: π, γ and pulse level. The fade constitutes
an unmarked Tail. Marked cases are of several kinds:
(1) An [a Accent] [ - a Accent] sequence on Nucleus and Tail, respec-
tively ( = steep falls and steep rises).
(2) [tense] or [tensing] Tail.
(3) Tail with laryngeal equilibration ( = 'chroma', 'stylization'). Note
that (1) and (3) may combine. If so, double-fall or double-rise 'call con-
tours' are produced; (3) alone results in a level contour. In the former
case, the [ - a Accent] contributes to the Sequential Pitch Contrast 'conspir-
acy' which in the neutral case applies automatically through the cumula-
tive effect of 'fade' at all levels.

Nucleus/Tonic
Recent investigations (Brown & al., 1980) have cast doubt on the validity
of the tradtitional notion of 'Nucleus' or 'Tonic', with results which indi-
cate that impressions of 'nuclear prominence' are a function of lexico-con-
textual information (including contrastive contexts) and of sequence-final
position per se, and are not elementary prosodic judgments. Investigations
of Head status have tended to point in the other direction, which may
mean that the so-called Nucleus is not as well-defined a category as Head
in prosodic frames - despite traditional descriptive priorities. The γ-
frames are frequently interrupted and re-started in 'real-time speech'; a
well-defined initial boundary is, perphaps for this reason alone, a more
important point of orientation than a final boundary. When γ-frames are
concluded after a recursion, either the peakline before the recursion is re-
covered, or a new Head is established. In either case, the Head is the de-
termining category, not the Nucleus.

Accentual pulses
The π and γ systems illustrated in the figures have a variety of interesting
properties. The pulse accent sequences themselves belong to a system of
rhythm generation which includes secondary accents. The pulses are
adapted to the current context in three main ways:
(1) position in the peakline-baseline vector defined in the dominating
π-frame or frames, and by the boundaries β; and ß t in the dominating γ-
frame(s);
(2) semantic conditions determined by the current locutionary constitu-
ent, contrastive contexts being associated with increased pulse amplitude
( = higher/lower deviation from the baseline) and syllable lengthening,
Intonation as an Adaptive Process 187

and appraisive contexts with longer leading flank rise time together with
syllable lengthening, yielding certain kinds of 'emphatic' 'rise prefix' in-
cluding delayed rises, rise-falls, 'homosyllabic preheads';
(3) in other styles such as reading aloud, syntactic structure (which pro-
vides constraints of detail in other styles, too) may be the main available
context, in which case pulse amplitude is mainly determined by locution-
ary specifier-head and other operator-operand relations, approximately as
postulated in conventional P-data-based 'sentence stress' descriptions; cf.
also Brazil, this volume).
Although the modifications to accents occur locally, they are deter-
mined by global context specifications which are accessible to pulse rou-
tines; the pulses themselves do not necessarily have to be specified even
for polarity, if the global [baseline )/(p peakline] feature is accessed. Local
specifications such as (1), (2) and (3) above can override the global specifi-
cations, but local specifications are marked specifications. This applies in
particular to pulses in non-Head and non-Nucleus environments, which
do not have any direct affiliation to frame categories in the figures.
These conditions on pulse accent sequences may be restated in the fol-
lowing terms: this approach implies that there is no actual accent type
Head or Nucleus; Head and Nucleus are positions in γ-frames, i. e. func-
tional terms analoguous to 'subject' and 'object', etc., in sentence syntax;
they are associated with 'syncategorematic' position-marking systems β;
and βt , whose phonetic correlates (boundary tones, pulse modifications)
may be said to have a theoretical status within γ-frames analogous to case
inflexions within sentences.

5. Adaptation as context sensitivity

Towards the end of § 4 the 'context-sensitivity' of pulse modification was


discussed, and described in terms of three kinds of context: position in π-
frames and γ-frames, relation to locutionary context, and relation to other
aspects of context. Pulse modification was said to be based on local look-
up of global values. Some of the context values involved in such 'prosodic
handshaking' will be dicussed in this section and followed by a more gen-
eral interpretation in terms of a feedback cycle in § 6.
(1) The association of peakline-baseline features with π-frames has al-
ready been discussed. Although these were represented as being assigned
to the current π-frame, they may be even more general 'genre' or 'style'
features, such as the [peakline )p baseline] feature in adult-child speech
styles. Another genre-marking feature of this kind is [peakline = p base-
line], a manifestation of the larynx equilibration strategy used in some
narrative styles with repetitive character (cf. Ladd, 1978, on 'stylization').
This contour flattening effect occurs in the discourse discussed in § 3, du-
188 D. Gibbon

ring E(<7) of Stage 2; it appears auditorily as a mid, narrow bandwidth


pitch contour and in the use of level tones on dead, know, before, wife. The
context specification table necessary for feature assignments of this kind
must contain information on the structure of the current discourse partici-
pant and channel configuration; a suggestion concerning the context in-
formation involved in such 'call contours' is given in Gibbon (1976), and a
more elaborate contextual framework of the kind required is used in a dif-
ferent descriptive application in Gibbon (1981).
(2) The instances of π-frame recursion described in § 4 appear to be
triggered by a context-lookup connected with assessments made by the
speaker about the amount of information the addressee already has (cf.
the parenthetical you know) or the amount of background information he
still needs in order to be able to grasp the main point of the narrative (cf.
the 'narrative tags' such as those associated with the first recursion in Fi-
gure 6 or the part of sequence E(<j) in Stage 2 which ends in . . . and all
this). The narrative tags involve mid pitch-range peakline-baseline conver-
gence and an auditory impression of fading intensity over at least one π-
frame.
(3) The preceding two instances involved context look-up at the π-
frame level. Anecdote 1 in this discourse does not show as much variation
at π-frame level as Anecdote 2 (not shown), in which far more occurrences
of pulses and associated sequential pitch contrast mechanisms ('upward'
laxing, 'final' rising pitch) occur; it also includes more low level iterative
structures similar to those mentioned in § 2 for a German dialogue: com-
plete sequences of accents in γ-frames end in Nuclei; the γ-frames them-
selves are in sequences within a π-frame, and the last γ-frame in such se-
quences ends in a more prominent Nucleus. In particular, the occurrence
of 'rise-tags', i.e. upward relaxation after a final pitch-lowering mecha-
nism, as marks of subordination to higher-level structures (cf. Gibbon,
1980a: 86, and Bing, this volume), are more common in Anecdote 2.
(4) Local adaptation to lexical properties of the current locutionary
constitutent is shown in a number of cases in the dialogue; in particular,
appraisive expressions such as oh, revolting, terrible, engrossed, marvellous,
wonderful, furious are associated with increased pulse amplitude a n d / o r
pulse leading flank delay features which typically yield auditory rise-fall
contours.
(5) The 'canonical' Armstrong & Ward Tune 1 type of contour, with its
typical pronounced downslope from a high pulse amplitude at Head posi-
tion, is also used in adaptation to discourse structure. It occurs, with mi-
nor variations, in the initial and final stages of the anecdotes, the explicit
statements of the semantic frame on which the anecdote is built; cf. the
following (in the Crystal & Davy transcription):
Anecdote 1: oh and lone pig DIED| because it ATE too much| . . . the fpig
h a d \DIED| be I c a u s e it h a d Ϊ-ΑΤΕΝ t o o m u c h | y o u I KNOW]
Intonation as an Adaptive Process 189

Anecdote 3: all though a f friend of ÖURS| who · was I so · f passionately


| F Ö N D o f PIGS| . . .
and would I lean over and |TALK to them| |FÖNDLY|
I WÖULDN'T h e |
This contour also occurs in mid-narrative, possibly as the unmarked
contour with no locally effective contextual adaptation. If this is the case,
there may be an argument for accepting such contours 'normal intonation'
judgments on P-data; in any case, it is hardly surprising that this tune
should have something approaching genre-marking status for formal
reading aloud of written texts, e. g. in news bulletins (where each sentence
may be a statement of a new semantic frame) or in P-data (isolated sen-
tences or standardised brief dialogue exchanges). It has close analogues in
other West Germanic languages, such as the contours described by v. Es-
sen (1956) for German, or the 'hat contour' of t Hart for Dutch.

6. On the descritpive potential of a process orientation

A large number of open questions remain. One of these is whether the


pulses themselves have a reasonably simple relationship with physiological
reality, or whether they are best regarded as abstract phonological fea-
tures. Internal descriptive evidence for their validity is given by their status
within a detailed and consistent descriptive system; external evidence from
the physiological phonetic literature indicates that the suggestion may not
be far from the truth (cf. the summary in Ohala, 1978). Another point is
the empirical validity of the distinction between recursive and iterative so-
lutions to the descriptive problems, in particular the validity of the distinc-
tion between π-frames and γ-frames. However, whatever the most plausi-
ble models for formal and substantive properties of intonation might be, it
will almost certainly be necessary to turn to procedurally oriented models
if ways out of some of the old impasses are to be found. One way of solv-
ing the hoary 'levels vs. contours' controversy is to find a third category
which is neither the one nor the other; this has been done here for part of
the English intonation system with the pulse accent thesis. A similar solu-
tion to the problem of baseline definition is the distinction between base-
line and peakline, which offers descriptive advantages in several areas.
The claim that a descriptively useful distinction can be made between
iterative and recursive properties of intonation processes was put to the
test in § 4 as a means of explaining pitch bandwith narrowing and pitch
height 'restart' patterns.
The strategies discussed in § 2.2.1, if they have any reality in speech
processes, help to explain why intonation analysis on a perceptual basis
appears over-simple and obscures distinctions between the different pro-
cedural factors which contribute to a percieved or measured pitch func-
190 D. Gibbon

{[artic.,.aud.l.
\ rof name 1
0

Figure 9: Outline of a feedback network as required in the implementation of each level of


the systems shown in Figure 7.

tion. On the other hand, the strategy concept also tends to make any in-
tonation hypothesis based on articulatory features difficult to falsify or
confirm. A further implication of the auditorily based overriding strate-
gies, in particular that of larynx equilibration under the stabilising influ-
ence of negatively phased auditory feedback, is that they control a func-
tional adaptation cycle which is sensitive to situational and locutionary
context, but is also operative within the phonological processes of intona-
tion as a dynamic interaction of articulatory and auditory factors. This in-
teractive cycle is outlined in the network of Figure 9. The indices symbol-
ise stages in temporal operation. The full cycle has a planing stage, an
execution stage, as well as checking and (as required) correcting stages. In
this aspect of the present approach lies the motivation for suggesting in § 2
above that the fundamental units of intonation organisation, and possibly
for all phonological organisation, are (articulation, percept) pairs. An ad-
vantage of this conception is that it systematically rejects a 'speaker/
hearer' idealisation for discourse phonology, preferring a dynamic combi-
nation of speaker and hearer perspectives.
To summarise: a process-oriented model of intonation for English has
been proposed, which incorporates a recursive category, the π-frame, an
iterative category, the γ-frame, and a conception of accents as articulatory
pulses. These were applied to the description of discourse data, and for-
mulated as a system of transition networks which take contextual feed-
back at phonological and functional levels into account. Much is still spec-
ulative, there are gaps in the theory, and new kinds of empirical evidence
will be needed in order to test the claims. But if it suggests ways of com-
bining the various lines of analysis required in the study of discourse
phonology, even if it does not to turn out to be viable in detail, the theory
will have served a useful purpose.
Intonation as an Adaptive Process 191

Bibliography

Abercrombie, D. et al. (eds.) (1964). In Honour of Daniel Jones. London: Longman.


Bachem, A. (1950). Tone height and tone chroma as two different pitch qualities. Acta Psy-
chologica 7, pp. 80-88.
Bing, J. M. (1980). Aspects of English Prosody. Bloomington: IULC. (Diss. Amberst 1979).
Bing, J. M. (this volume). A discourse domain identified by intonation.
Brazil, D. (1975). Discourse Intonation. Discourse Analysis Monographs No. 1. English Lan-
guage Research, Birmingham University.
Brazil, D. (1978). Discourse Intonation II. Discourse Analysis Monographs No. 2. English Lan-
guage Research, Birmingham University.
Brazil, D. (this volume). The intonation of sentences read aloud.
Brown, G. et al. (1980). Questions of Intonation. London, Croom Helm.
Chao, Y. R. (1956). Tone, intonation, singsong, chanting, recitative, tonal composition and
atonal composition in Chinese. In Halle, M., Lunt, H. and McLean, H . (eds.), pp. 52-59.
Chitoran, D. (ed.) (1976). Second international conference of English contrastive projects. Bu-
charest, U. P.
Chomsky, N. and Halle, M. (1968). The Sound Pattern of English. New York, Harper & Row.
Crystal, D. (1969). Prosodic Systems and Intonation in English. Cambridge, C.U.P.
Crystal, D. (1971). Relative and absolute in intonation analysis. Journal of the International
Phonetic Association 1: 17-28.
Crystal, D. and Davy, D. (1975). Advanced Conversational English. London: Longman.
Dechert, H. W. and Rauppach, M., eds. (1980). Temporal Variables in Speech: Studies in Hon-
our of Frieda Goldmann-Eisler. The Hague, Mouton.
Eikmeyer, H.-J. (1980). Transformationsgrammatiken mit Multilahels. Definition und Anwen-
dungsmöglichkeiten. Hamburg, Buske.
v. Essen, O. (1956). Grundzüge der hochdeutschen Satzintonation. Ratingen, Henn.
Fox, A. (1973). Tone sequences in English. Archivum Linguisticum 17-26.
Fromkin, V. Α., ed. (1978). Tone. A Linguistic Survey. New York, Academic Press.
Gibbon, D. (1976a). Perspectives of Intonation Analysis. Bern, Lang.
Gibbon, D. (1976b). Performatory categories in contrastive intonation analysis. In Chitoran,
D. (ed.) pp. 145-156.
Gibbon, D. (1981a). A New Look at Intonation Syntax and Semantics. In James, A. and
Westney, P. (eds.) pp. 71-98.
Gibbon, D. (1981b). Forschungsprojekt Propädeutik der Phonetik. Erster Teilbericht; Digitale
Frequenzanzeige mit einem Mikrocomputer zur Visualisierung der Sprachmelodie. Fachhoch-
schule Köln, Fachbereich Sprachen, März 1981.
Gibbon, D. (1981c). Idiomaticity and functional variation: A case study of international ama-
teur radio talk Language in Society 10: 21-42.
Gibbon, D. (1982). Intonation in Context. An Essay on Metalocutionary Deixis. In Rauh, G.
(ed.), pp. 195-218.
Gibbon, D. & M. Selting (1983). Intonation und die Strukturierung eines Diskurses. Zeit-
schrift fur Literaturwissenschaft und Linguistik, LILI 49.
Halle, M., Lunt, H . and McLean, Η., eds. (1956). For Roman Jakobson. T h e Hague, Mou-
ton.
Halliday, Μ. Α. K. (1967). Intonation and Grammar in British English. The Hague, Mouton.
t'Hart, J. and Collier, R. (1975). Integrating different levels of intonation analysis. Journal of
Phonetics 3: 235-255.
James, A. and Westney, P., eds. (1981). New Linguistic Impulses in Foreign Language Teach-
ing. Tübingen, Narr.
Jassem, W. (1952). Intonation of Conversational English (Educated Southern British), as: Tra-
vaux de la Societe des Sciences et des Lettres de Wrodaw, Series A, No. 45, Wroclaw.
192 D. Gibbon

Jassem, W. and Gibbon, D. (1980). Re-defining English stress. Journal of the International
Phonetic Association 10: pp. 2-16.
Jassem, W., Hill, D. R., Witten, I. H. (this volume). Isochrony in English speech: Its statisti-
cal validity and linguistic relevance.
Klinghardt, Η. and Klemm, G. (1920). Übungen im englischen Tonfall. Cöthen.
Ladd, D. R. (1978). The Structure of Intonational Meaning. Diss., Cornell University.
Lehiste, I. (1975). The phonetic structure of paragraphs. In Nooteboom, S. G. and Cohen,
A. (eds.), pp. 195-203.
Liberman, Μ. Y. (1978). The Intonational System of English. Bloomington, IULC.
Lieberman, P. (1967). Intonation, Perception, and Language. Cambridge, Mass., M.I.T. Press.
Neisser, U. (1967). Cognitive Psychology. Englewood Cliffs, N. J., Prentice-Hall.
Nooteboom, S. G. and Cohen, A. (eds.) (1975). Structure and Process in Speech Perception.
Heidelberg/New York, Springer.
Ohala, J. J. (1978). Production of Tone. In Fromkin, V. A. (ed.), pp. 5-40.
Ozga, J. (1980). Ά functional analysis of some pause and pitch step-up combinations', In
Dechert, H. W. and Rauppach, M. (eds.), pp. 221-225.
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Diss., M.I.T.
Pike, K. L. (1945). The Intonation of American English. Ann Arbor, The University of Michi-
gan Press.
Rauh, G., ed. (1982). Essays on Deixis. Tübingen, Narr.
Reich, P. A. (1969). The finiteness of Natural Language Language 45: 831-843.
Trim, J. L. M. (1964). Tonetic stress-marks for German. In Abercrombie, D. et al. (eds.), pp.
374-383.
J. ' Τ H A R T

A Phonetic Approach to Intonation: from Pitch Contours to


Intonation Patterns

Introduction

This contribution is the result of extensive experimental research on


speech pitch phenomena in Dutch and British English. It was meant to fill
a gap in this collection since a number of results of this research do not
appear to have found a wide response in linguistic circles. Nevertheless,
they may be of considerable importance to those involved in the study of
spoken language and, more specifically, in the study of the possible com-
municative functions of speech prosody. Now, before one can start trying
to find such answers to questions about communicative functions of in-
tonation as can be considered suitable for a generalization, it is necessary
that the intonation system of the language should be clearly described. As
yet, however, no linguistic approach to intonation has dealt with that
problem in a satisfactory way.
In phonetics, the primary task of which is to give an adequate descrip-
tion of everything present in the speech signal in connection with the way
it is produced and so far as it is perceived, it is considered that speech
pitch is an essential part of the sound image, which should therefore be
mapped out to its full extent. Unfortunately, the intuitive judgments by
linguists, otherwise a rich source to draw from, must fail to produce reli-
able data when it concerns phenomena beyond the reach of any normal
being's observation capacity. Such is the case with the very intricate and
deceptive phenomena in speech pitch. Even phoneticians, less normal be-
ings in the sense of possessing (possibly) specially trained hearing, fail
equally to give adequate transcriptions of these phenomena as long as they
continue to trust their unaided ears.

Measurement of the fundamental frequency

Consequently, a phonetic study of intonation begins with measurement of


the course of the fundamental frequency in large numbers of speech utter-
ances. T h e next step is then to try to discover some systematic regularities
194 J . ' t Hart

in the resulting recordings. However, the results of such measurements


are extremely difficult to interpret. T h e main problem is that these mea-
surements typically show a great variety of irregularities, amongst which it
is almost impossible to perceive any "speech melody". In the early sixties
there was every reason to suspect that the instrument used to measure the
fundamental frequency was the main source of the irregularities. Recently,
however, instruments (or computer programs designed for that purpose)
have been developed which perform these measurements almost perfectly,
and they show all the more accurately that the vocal cords themselves are
responsible for the irregularities. See for instance Fig. 1.

400
Hz

300

200

100

Alan's in Cambridge studying botany


50
.5 1 1.5 2 s
Figure 1: Result of measurement of the fundamental frequency in a short English sentence.

It seems impossible simply to collect such data from a large number of


speech utterances, to feed them into a computer and have them processed
in some way or other, in order to sift out the systematic, regularly recur-
ring phenomena. That, however, was to be the next step mentioned above.
T h e only way out seems to be to introduce an intermediate step.

An important assumption

Considerations that ultimately have led to the conception of such an inter-


mediate step are the following. From a physiological point of view it is un-
thinkable that all the variations observed are the results of an active
neuro-muscular control. Consequently, it would be useless for a speaker
to have any conscious intentions to produce them. Moreover, it is coun-
A Phonetic Approach to Intonation 195

ter-intuitive to imagine that any speaker intends to produce sounds and


sound variations that might not be perceptible. And, it is true, from a psy-
chophysical point of view there is ample evidence to consider it highly im-
probable that the ear is capable of following these minute, quick variations
('t Hart, 1976, 't Hart, 1977, 't Hart, 1981). The assumption was therefore
put forward (Cohen a n d ' t Hart, 1967) that the F c curve is a superposition
of pitch movements relevant to the perception of the speech melody, and
of other variations, that are merely ascribable to the irregularities in the
oscillation behaviour of the vocal cords. It was further argued that "the
relevant pitch movements are related to corresponding activities on the
part of the speaker. These are assumed to be characterized by discrete
commands to the laryngeal musculature, and should be recoverable as so
many discrete events in the resulting pitch contours".
Although this does not solve the problem, the important advantage of
this formulation is that it shows two possible ways of tackling it: either we
should extract the discrete commands to the larynx muscles, or we should
extract the perceptually relevant pitch movements. The latter course was
chosen. The requirement can be reformulated as follows: try to draw a
graph on top of the F 0 curve which should represent the course of pitch as
a function of time as it is perceived by the listener. This is the essence of
the stylization method, the intermediate step which was said to seem the
only way out.

The stylization method

But how can it be done? It is obvious that it is not enough to draw some
smoothed line that would constitute only visually a satisfactory solution:
we shall have to prove that it is representative of the perceived pitch con-
tour. The result of the stylization must therefore be made audible. This is
possible with a special instrument, the Intonator (Willems, 1966), by
means of which the original course of F 0 is replaced by an artificial con-
tour, which is constructed by the experimenter. Nowadays, a computer
program, which is more flexible, has superseded the hardware Intonator
(Vogten and Willems, 1977). The construction of the artificial contour is
guided by two criteria: it should contain the smallest number of straight
line segments (in the plane of the logarithm of F 0 as a function of time),
but it should sound indistinguishable from the original. This is achieved
by trial and error. One could, for instance, start with only one horizontal
line, at the average of F 0 over the utterance. The resultant sound is ridicu-
lous and by no means the same as the original. Somewhat better is, to give
this line a slight tilt downwards from beginning to end, in order to ac-
count for the slowly falling F„ observable in the measurement, the declina-
tion. Obviously, this is still not an auditorily acceptable approximation of
196 J . ' t Hart

the original: the artificial contour clearly lacks a number of major move-
ments up from and down to the lower declination line. If these are added
and carefully timed, the resulting contour will eventually become indistin-
guishable from the original, at least to the ear of the experimenter. The
perceptual equality is later checked in a listening experiment with several
tens of native listeners. See Fig. 2.

400
Hz

300

200

100

50
.5 1 1.5 2 s
Figure 2: Solid line: result of a stylization in such a way that no difference is audible between
the stylized contour and the original (dotted curve).

The next step is standardization, i. e. to provide the various movements


with standard specifications relating to their size, slope, duration, and po-
sition in the syllable.
Contours built with standardized pitch movements most often sound
slightly different from the corresponding originals, although they are es-
sentially the same, or "perceptually equivalent". Consequently, they can
be, and have been shown to be, fully acceptable as representatives of nor-
mal intonation in the language concerned in an appropriate experiment.
The purpose of standardization is, of course, to obtain a generalization:
whenever a certain movement is found to figure in, for example, the first
accented syllable of an utterance, and subsequently, it turns out that ex-
actly the same movement can be used for the first accented syllable of
large numbers of other utterances, the implication is that one of the syste-
matic regularities aimed at has been established. Also, if standard move-
ments of certain types which are appropriate to be used in utterance no. 1
are applied to construct a stylized contour for utterance no. 2, but fail to
yield the desired perceptual equivalence, it implies that the original of no.
A Phonetic Approach to Intonation 197

2 is a realization of a different intonation pattern, in which (some of) the


perceptually relevant pitch movements are of different types from those
fruitfully used in no. 1. The psychological reality of the very existence of
such categories of intonation, the intonation patterns, has been corrobo-
rated in separate experiments, specially designed for that purpose (Collier
a n d ' t Hart, 1972; Collier, 1975b).

Results

This approach has now produced a sufficiently complete description of all


main melodic shapes of Dutch intonation. To check it we designed a gen-
erative system, called a grammar of Dutch intonation, which has the status
of a competence model, and thus should be able to generate all and only
the well-formed, acceptable pitch contours used in the language. That
grammar was subsequently confronted with a large corpus consisting of
seven fragments, each of ten minutes' length, of spontaneous and quasi-
spontaneous conversations (from theatre plays), and seen to be capable of
covering 95 % of the observed intonational phenomena ('t Hart and Col-
lier, 1975).
More recently, a start has been made on applying the same methods to
the study of British English intonation. For one thing, it was felt desirable
to try to show that the stylization method is not merely applicable to the
description of the intonation system of just one language, which happens
to be Dutch, but that it can be used for other languages as well, although
these may have obviously different intonation systems.
Meanwhile, the intonation of American English had been treated in the
same way by Maeda (1976), who was personally acquainted with the
method by Collier. Maeda's rules are now being applied in a text-to-
speech conversion machine (Klatt, 1980). The other consideration was
that English is learnt as a second language by many more people than in
Dutch, and although a number of well-known courses on English intona-
tion do exist, they are all based on impressionistic observations only. It
has recently been shown that the resulting descriptions can lead to unac-
ceptable recipes (De Pijper, 1980).

Functions of intonation

The phonetic approach described has resulted in the first place in a purely
melodic description of intonational phenomena and was not primarily
meant to be used for studying possible functions of intonation.
Nevertheless a number of functions have been encountered, and some
of them have been verified experimentally.
198 J. 't Hart

First of all, there is the prominence-lending function of some pitch


movements, which have to meet special requirements, particularly with re-
spect to their position within the syllable, in order to yield the so-called
pitch accents, the most outstanding accents in a speech utterance. It is pre-
cisely the manipulation of artificial pitch contours that enables the experi-
menter to show that it is the location of these movements alone which,
eventually, determines the judgment of listeners as to where the most im-
portant accents are situated (Van Katwijk and Govaert, 1967; Collier,
1970; Van Katwijk, 1974; this was also demonstrated for a German sen-
tence at the DGfS Meeting in Berlin; s e e ' t Hart, 1980).
Secondly, there is the intonational marking of major syntactic bounda-
ries by other types of pitch movements. Again, working with artificial
pitch contours has made it possible to study this experimentally for the
first time (Collier a n d ' t Hart, 1975).
This does not mean that all the problems of accentuation and syntactic
demarcation have been solved. Despite numerous claims to the contrary, it
is still not fully understood which factors really determine where in a sen-
tence, given its wider context and situation, the pitch accents should be lo-
cated, and, perhaps even more important, which words have to be de-ac-
cented (Nooteboom, et al., 1980). Neither have we solved the problem of
whether a certain major syntactic boundary should be marked intonation-
ally or not (De Rooij, 1979).
Less is known about the possible function of a speaker's choice of a par-
ticular intonation pattern, or, to put it differently, about what determines
this choice. In all probability, it is one of the functions of the intonation
pattern to help reduce the number of possible interpretations that can be
given to what is said, but only in interaction with its verbal context and the
situation. And it is hard to see how the speaker intentionally chooses his
pattern precisely for this purpose. It is even harder to see how a speech
scientist could design appropriate experiments to find answers to these
very complicated questions. What constitutes an extra complication in the
'meaning-of-intonation' issue is that there are at least ten times as many
'meanings' — or rather implications, or interpretative possibilities - as there
are different intonation patterns, so that at least a given pattern must
correspond to a set of implications instead of to only one.

Theoretical implications

The main aim of the phonetic approach to intonation has been to give an
explicit, and if possible, exhaustive description of the various melodic pos-
sibilities in the given language. Attempts to develop from this a theoretical
framework would not have been essential. Meanwhile, however, the anal-
ysis of such a vast amount of material over a long period of time almost
A Phonetic Approach to Intonation 199

automatically leads to speculations about a number of theoretical issues,


leading to such questions as: How could the programming of intonation
be executed? What is the interaction of intonation and accentuation? Is
there physiological evidence in support of the basic assumption mentioned
above? What is the relationship between these results and those reported
earlier, in the various disciplines? Is it possible to reconcile the abstract
and global aspects of intonation as it functions in communication with the
concrete and atomistic features found in an acoustic analysis?
The most important point with respect to the programming of intona-
tion is that it is not necessary to suppose that the complete surface struc-
ture of the phonological component should be 'ready' before one can be-
gin with the 'derivation' of the corresponding pitch contour. One might
put it even more strongly and say that such a position is wrong; it would
inevitably lead to only one possible, and thus acceptable, pitch contour for
a given string of words. This is contradicted by the outcome of our experi-
ments.
It can be shown in a flow-chart (Collier, 1972) that the (still abstract) in-
tonation pattern is chosen at a very early stage, and that the actual contour
can be realized by answering simple and concrete questions pertaining to the
choice of pattern. In addition the speaker needs to be aware that a small num-
ber of words or syllables ahead a pitch accent has to be made, or a syntactic
boundary has to be marked. One less simple question must be answered in a
frequently occurring pattern of Dutch, viz. whether the next pitch accent to
be made is the penultimate of the utterance (or clause), because if it is, then
special measures are desirable, in order to fulfill the communicational func-
tion of announcing that the end of the utterance is coming near. And indeed,
in some cases the speaker has no opportunity to deal with that somewhat
more difficult question, and this results in the execution of a less compli-
cated, and less informative variant of the same pattern. This so-called emerg-
ency contour ('t Hart, 1972) is observable for instance in radio-broadcast
eye-witness accounts of soccer matches.
The same principle, viz. of a subordination of the choice of the kind of
accent-lending pitch movements to the kind of intonation pattern that is
to be realized, is explained in a different and perhaps more elaborate way
in a paper on the interaction of accentuation and intonation ('t Hart and
Collier, 1979). It confronts a principle, viz.:
"PI Those pitch movements that co-occur with prominent syllables are
entirely and exclusively related to accentuation, the remaining pitch
movements on the contour are associated with intonation"
with a number of facts, the main one being the possibility of intonating a
given utterance according to different patterns. PI "does not account for
the choice of the accent-lending pitch movements. There is nothing in the
nature of the accents themselves that would predict this choice". An al-
ternative principle:
200 J. 't Hart

"P2 (a) The nature and the order of all pitch movements in an utterance
are determined by the intonation pattern.
(b) Among the pitch movements of any intonation pattern there is
at least one which possesses such phonetic properties as are ne-
cessary for bringing about a pitch accent.
(c) The location of the accent-lending pitch movement(s) is deter-
mined by the position of the words that carry sentence stress,
and, more specifically, by the position of the lexically accented
syllable in each of these words"
is shown to find empirical support, so that PI must be rejected.
Physiological evidence in support of the assumption that the perceptu-
ally relevant pitch movements are related to corresponding activities on
the part of the speaker is provided by measurements of the electromyogra-
phic signal from the musculus cricothyroideus during the production of
various different intonation patterns, with pitch accents on varying syl-
lables within the utterances (Collier, 1975 a). The contraction and relaxa-
tion of this muscle are thus clearly visible as discrete events and are shown
to be the main cause of the changes of fundamental frequency upward and
downward respectively.
As was implied in the Introduction, one should not be very optimistic
about the relationship between these experimentally found phonetic re-
sults and impressionistically based linguistic conceptions, let alone that the
former would constitute the 'phonetic correlates' of the latter. Collier
(1974) is no longer alone in this criticism, witness recent investigations by
Brown, Currie and others in Edinburgh (Brown et al., 1980, Currie, 1980,
1981). Strangely enough, no-one as yet has even taken the trouble, as far
as I know, to react to Collier's sharp and provocative paper, which he de-
liberately published in a well-known, linguistic journal.
It has become clear to us that attempting to find the relationship be-
tween the abstract mental categories which are the intonation patterns and
their concrete realizations in the form of F 0 curves cannot be successful if
the attempt is made in one stroke ('t Hart and Collier, 1975). Therefore,
we have considered that a 'perceptual detour' must be made in three steps:
(1) from the concrete and global F 0 curve to the concrete and atomistic
pitch movements, in an attempt to design a proper coding system;
(2) from these to the stylized pitch contours (via the grammar), again glo-
bal and concrete; (3) since these are perceptually equivalent to the F 0
curves, the attempt to find the relation of the latter with the intonation
pattern can be replaced by an attempt to find it between the stylized pitch
contours and the abstract, global intonation patterns. And this problem
can be tackled in an experimental way.
The theoretical implication is that it enables us to make explicit what
belongs to the intonational competence shared by speakers and listeners.
There is no reason to suspect that application of the stylization method is
A Phonetic Approach to Intonation 201

restricted to only a few languages. It has already been shown to work for
other languages than Dutch, also independent of the "Dutch school"
(Vaissiere, 1971, for French; Fujisaki and Sudo, 1971, for Japanese).
The basic assumption about the link between the intonational gestures
on the part of the speaker and the extraction of the perceptually relevant
pitch movements by the listener, which is based on their shared implicit
knowledge of the intonation system of their language, must be considered
valid for users of all languages.

References

Brown, G., K. L. Currie and J. Kenworthy (1980). Questions of Intonation. Croom Helm.
Cohen, A. and J. 't H a r t (1967). On the anatomy of intonation. Lingua 19/2: 177-192.
Collier, R. (1970). T h e optimum position of prominence-lending pitch rises. IPO Annual
Progress Report 5: 81-85.
Collier, R. (1972). From pitch to intonation. Unpubl. doct. diss. University of Louvain.
Collier, R. (1974). Intonation from a structural linguistic viewpoint: a criticism. Linguistics
129: 5-28.
Collier, R. (1975a). Physiological correlates of intonation patterns. /. Acoust. Soc. Am. 58/1 :
249-255.
Collier, R. (1975b). Perceptual and linguistic tolerance in intonation. IRALXIII/4: 293-308.
Collier, R. and J. 't H a r t (1972). Perceptual experiments on Dutch intonation. Proc. Vllth
Int. Congr. Phon. Sei., 1971, Montreal, T h e Hague, Paris: Mouton.
Collier, R. and J. 't H a r t (1975). T h e role of intonation in speech perception. In A. Cohen
and S. G. Nooteboom (eds.). Structure and Process in Speech Perception. Berlin, Heidelberg,
New York: 107-123.
Currie, Karen L. (1980). An initial "Search for Tonics". Language and Speech, 23/4: 329-350.
Currie, Karen L. (1981). Further experiments in the "Search for Tonics". Language and
Speech, 24/1: 1-28.
Fujisaki, H . and H . Sudo (1971). Synthesis by rule of prosodic features of connected Japa-
nese. Proc. Vllth Int. Congr. Acoust. Budapest 1971: 133-136.
't Hart, J. (1972). Intonational Rhyme. In Romporti, M . and P. Janota, (eds.). Acta Universi-
tatis Carolinae, Philologica 1. Phonetica Pragensia III. 105-109.
't Hart, J. (1976). Psychoacoustic backgrounds of pitch contour stylization. IPO Annual
Progress Report 11: 11-19.
't Hart, J. (1977). Vers une base psychophonetique de la stylisation intonative. Actes des 8'ime>
Joumees d'etudes sur la parole, Aix-en-Provence. Vol. 1: 167-173.
't Hart, J. (1980). Synthese stilisierter Tonhöhenkonturen als Methode zur Analyse der In-
tonation. Linguistische Arbeiten und Berichte der FU Berlin.
't Hart, J. (1981). Differential sensitivity to pitch distance, particularly in speech. /. Acoust.
Soc. Am. 693, 811-821.
't Hart, J. and R. Collier (1975). Integrating different levels of intonation analysis. ]. of
Phonetics 3: 235-255.
't Hart, J. and R. Collier (1979). On the interaction of accentuation and intonation in Dutch.
Proc. IXth. Int. Congr. Phon. Sei., Copenhagen. Vol. II: 395-402.
Van Katwijk, A. (1974). Accentuation in Dutch, an experimental linguistic study. Doct. diss.
University of Utrecht, Van Gorcum and Comp.: Assen.
Van Katwijk, A. and G. A. Govaert (1967). Prominence as a function of the location of pitch
movement. IPO Annual Progress Report!·. 115-117.
202 J. 't Hart

Klatt, D. H . (1980). Real-time speech synthesis by rule. J. Acoust. Soc. Am. Suppl. 1, Vol 68,
S18 (A).
Maeda, S. (1976). A characterization of American English intonation. Ph. D. Thesis: M.I.T.
Nooteboom, S. G., T. Kruyt and J. Terken (1980). What speakers and listeners do with pitch
accents: some explorations. Nordic Prosody II. Proc. Second Symposium on Prosody in the
Nordic Languages. Trondheim: Tapir.
de Pijper, J. R. (1980). A melodical model of British English intonation. IPO Annual Progress
Report 15: 54-58.
de Rooij, J. J. (1979). Speech punctuation, an acoustic and perceptual study of some aspects of
speech prosody in Dutch. Unpubl. doct. diss: University of Utrecht.
Vaissifire, J. (1971). Contribution a la Synthese par Regle du Franfais. Thdse de 3' 4me cycle,
Univ. de Grenoble.
Vögten, L. L. M. and L. F. Willems (1977). The Formator: a speech analysis-synthesis system
based on formant extraction from linear prediction coefficients. IPO Annual Progress Re-
port 12: 47-54.
Willems, L. F. (1966). The Intonator. IPO Annual Progress Report 1: 123-125.
W. JASSEM, D. R. HILL, I. H . W I T T E N

Isochrony in English Speech: its Statistical Validity and Lin-


guistic Relevance

1. Introduction

In her review of recent work on rhythm in English speech, Lehiste (1977)


shows that there has been disappointingly little agreement among special-
ists as to the validity of the isochrony principle, ranging from fairly strong
textbook affirmation, such as the oft-cited description by Pike (1945: 34)
or a more up-to-date formulation by Ladefoged (1975: 102-103) through
doubt and scepticism (e.g. O'Connor, 1965; Bolinger, 1965; Uldall, 1971)
to downright rejection (e.g. Lea, 1974; Shen and Paterson, 1962). Her
own experiments lead Lehiste (1973; 1975; 1977) to the conclusion that
". . . there were some aspects of the data that spoke for the presence of
isochrony, and other aspects that spoke against it" (1977: 256). According
to Lehiste, isochrony is most evident at the perceptual level, but seeing
that the sensation of rhythmicality must reflect some properties of the sig-
nal, she also made measurements of the sound wave and produced evi-
dence in favour of a certain measure of isochrony, which she related to
syntax. The relation is an inverse one in the sense that if the (tendency
for) isochrony is destroyed by "an increase in the interstress interval",
then this is a sign of the presence of a syntactic boundary. One might per-
haps re-interpret Lehiste's conclusions in a direct sense as assigning to
isochrony a function of internal syntactic cohesion. One of the major mer-
its of the extensive treatise on the rhythm of spoken English by Adams
(1979) is a most exhaustive historical survey of previous work in the area,
with laudable emphasis on the widely overlooked ideas of the early phone-
ticians. It transpires that much of the present-day thinking and argumen-
tation on the rhythmicality of spoken English was anticipated in the pre-
ceding centuries, beginning with John Hart and Joshua Steele, the latter
having strongly influenced Abercrombie's (1964; 1973) theory, which will
figure prominently further on in this paper. Also, in direct relation to one
particular problem considered here, John Hart (Adams, 1973: 22-23) re-
lated 'accent' (equivalent to the early 20th century 'stress') to 'melody and
rhythmus' rather than loudness and he was aware of rhythmical (accen-
tual) units of speech such as of jin apple, from the Citie, "rediscovered"
30 years ago by one of the present writers (Jassem, 1949; 1952).
204 W . Jassem, D . R. Hill, I. H . Witten

It is also noteworthy that a difference of opinion as to whether pauses


in speech should be counted in the rhythmical structure of English can be
traced to the late 18th and early 19th century, J. Steele supporting the
pause and J. Odell opposing it (Adams, 1979: 29-30).

2. The Theory

T h e general problem is so well-known that it will scarcely be necessary to


expand on it. Hardly any present-day textbook of English phonetics (or
phonology) fails to mention rhythmicality as reflected in the (approxi-
mate) isochrony 'interstress intervals'. Some of the major, more specific,
questions, are:
1. What are the relations between rhythmicality (and isochrony) in the
acoustic signal and rhythmicality as a percept?
2. What are the appropriate measurement procedures to be applied to
the speech wave which would test the hypothesis of rhythmicality?
3. What is the experimental design of appropriate perceptual tests?
4. H o w are the results of measurements and perceptual tests to be eval-
uated?
5. If the 'isochrony effect' is present, does it primarily affect the length
of the entire syllables, or the duration of the constituent segments
(phones)?
6. If isochrony is primarily reflected in the length of entire syllables,
does it affect all the syllables of the rhythm unit ('bar', 'foot', or whatever)
or only the accented ('stressed') syllables?
7. If isochrony is primarily reflected in the duration of the phonetic
segments (phones), does it affect all the segments, or some of them only,
and how?
8. Is rhythmicality equally effective in all styles of speech, or is it a de-
pendent variable, being, for instance, more evident in verse reading, less
so in prose reading, slow and deliberate speech, and perhaps least evident
in fast, casual discourse?
9. Is rhythmicality an effect of accent, or is accent the effect of rhyth-
micality?
9a. If accent is the cause rather than the effect of rhythm, then which
phonetic attributes are relevant for accentuation? 1
10. If there is an isochrony effect, then (a) how can it be quantified, (b)
how can the quantitative statement be used to describe the strength of the
effect, and (c) how strong has the effect to be in order to be treated as
phonologically relevant?
1
9a is only included here because the question has often been begged. But the answer is in
fact a crucial premise in the theory of isochrony and rhythm. This has most fully been real-
ized by Adams (1979), see especially her introduction and Chapter 6.
Isochrony in English Speech 205

11. What are the relations, if any, between the rhythm units and syntax
or morphology?
The investigation reported on below proposes a method which may give
partial answers to points 2 and 4. It also assumes a hypothesis related to
point 7 and tests it statistically, leading to a possible answer to 10 a and b.
Reference will also be made to point 11.

3. Two Specific Theories of English Speech Rhythm

Two specific theories of the rhythm of spoken English have been pro-
posed. The one, which will here be referred to as (A), was first put for-
ward by Abercrombie (1964, 1973), and the other, referred to as (B), by
Jassem (1952), (slightly modified in Jassem, 1980 and 1981).2
Apart from the fairly obvious postulate that both theories submit, viz.
one of a tendency towards equality of interstress intervals, they have one
important premise in common. They do not start off with any higher-or-
der syntactic units of which the rhythm units would be constituents. In
fact, the 'beats', or 'bars', or 'feet', etc. of either theory, though possibly
correlated with syntactic entities, are independent of them.3
Abercrombie's theory was further developed by Halliday (1970) and
Witten (1977). The two theories may be summarized as follows:

(A) Abercrombie
1. The rhythm unit, called FOOT, always begins with a stressed syl-
lable, consequently any unstressed syllable follows a stressed one within
the same Foot. All unstressed syllables may therefore be described as post-
accentual (or postictic).
2. If any utterance begins with an unstressed syllable, a silent stress is
posited, this being an abstraction manifested as zero sound, i. e. not mate-
rialized objectively, though real psychologically (subjectively).
3. A disyllabic foot is triple-timed and may be represented by one of
the following structures: υ-(short-long), η η (medium-medium), or
2
Jassem 1980 and 1981 are recent editions of earlier works first published in 1954 and 1962
respectively. The modifications of the original 1952 version of (B) contained in these
works were proposed long before the results of the present investigation were available,
though they are largely borne out by it.
3
In this, as in other aspects, the position of both authors are drastically different from those
assumed by the proponents of any variety of Generative Phonology. But it may be interest-
ing to note that Jassem's 'rhythm units' seem to coincide with Chomsky and Halle's 'phon-
ological words'. The locution (utterance, tone-group, sentence, etc.) The book was in an
unlikely place is analysed into three 'phonological words' (Chomsky and Halle 1968:
367-368): tbe-book wäs-in-an-unlikely place. Exactly the same division is obtained by apply-
ing the principles expounded in Jassem 1952. Abercrombie's interpretation would be en-
tirely different: the I book-was-in-an-un- I likely I place.
206 W. Jassem, D. R. Hill, I. H. Witten

— u (long-short). The original version of the theory did not discuss feet of
more than two syllables, and this part of the theory was supplied by Wit-
ten (1977). This mora-based structure of rhythm was not insisted on by
Halliday (1970).
4. The internal rhythmic structure of a Foot is inherently related to its
segmental structure, e.g. (C)V 1 CV(C) 4 and (C)V 2 (C)V(C) both produce
η η , etc. Abercrombie adds, however, that "the phonematic structure of
the syllable m a y . . . at times be quite irrelevant" (1964: 217).
5. There are certain relations between rhythm and syntax. For instance,
. . the quantities depend on the presence of a word boundary" (ibid,
p. 219). There is a rhythmic difference between a one-syllable word fol-
lowed by an unstressed syllable of a word which is not directly related
syntactically, e. g., (take) Grey to (London) as opposed to Greater (London)
- and a word followed by an enclitic, e. g. take it, tell him. It is pointed out
that enclitic treatment of monosyllables is "not entirely clear" (p. 221),
some other cases of "rhythmic linking" being piece of, may there (be), etc.5
6. Stress is assumed (see, e.g., Abercrombie, 1967: 35; Ladefoged,
1975: 222) to be increased effort (or energy). The notion of stress is pri-
mary in relation to the notion of rhythm.

(B) Jassem
1. English speech consists of two kinds of rhythm units: (a) Narrow
Rhythm Units (NRU) and (b) Anacruses (ANA). For a given tempo, the
length of a narrow rhythm unit depends on the number of syllables. This
length is a constant for a monosyllabic rhythm unit and a given tempo,
and may be denoted by Y. As the number of syllables in a narrow rhythm
unit increases, the length of the narrow rhythm unit (NRU) also increases,
but not proportionately. A two-syllable N R U is longer than a monosyllabic
one, but it is distinctly less than 2Y. A three-syllable N R U is also longer
than a two-syllable one, but its length is significantly less than 3 Y. The
length of longer rhythm units is determined analogously.
2. Individual syllables within a multisyllable N R U tend to be of equal
length, i.e., the complete length of a polysyllabic N R U tends to be some-
what equally divided among the constituent syllables.6

4
V 1 = short vowel, V 2 = long vowel or diphthong, C = consonant.
5
As the phonematic structure of the Foot is only sometimes relevant to its internal rhythmi-
cal structure, and the relations between syntax and rhythm are not always clear, it is not
possible, within the framework of theory (A), to deduce the rhythm of an utterance from
its phonemic transcription. Nor does it seem to be possible to make simple additions to a
transcription so as to indicate the internal rhythmical structure of the Foot.
' It follows from (1) and (2) that the relative lengths of NRUs and their constituent syllables
may be graphically represented like this:
Isochrony in English Speech 207

3. 'Stress', which is now termed A C C E N T (Jassem and Gibbon 1980),


is the effect of the temporal organization of utterances. A complete mono-
syllabic utterance is accented by definition. It is the tendency described
above under (2) that is the basis of accent: The only or the first syllable of a
Narrow Rhythm Unit is accented.7
4. The duration of phones is necessarily affected by variations in the
length of syllables. Thus / i / is longer in read than in reading and this, in
turn, is slightly longer than / i / in reading it, at least in fairly slow speech. 8
It is not yet known how and to what extent rhythm affects the individual
phonemes or phoneme types.
5. Besides the NRUs discussed above, an utterance may include a syl-
lable, or a sequence of syllables, which is characterized by being as short
as possible, i. e. as short as is compatible with sufficiently distinct articula-
tion of the constituent phones. Such a syllable, or syllables, constitute the
ANACRUSIS (ANA). The length of an ANA, consequently, tends to be
proportionate to the number of the constituent syllables, or perhaps more
directly, the number of constituent phones. The ANA, if present in an ut-
terance, always precedes an N R U and belongs to that NRU. An NRU, to-
gether with a preceding ANA (if any) forms a TOTAL R H Y T H M U N I T
(TRU).
6. The rhythm of English speech is a phonetic phenomenon and is de-
termined on purely phonetic principles with no recourse to any other level
of analysis-synthesis, such as grammar or semantics. But there are interre-

1 syllable ι 1
2 syllables . 11 ρ
3 syllables ι 11 ι. .
4 syllables. ,ι „ ,, ι
etc.
7
It is assumed that accent is perceived in longer utterances due to the durational variation of
syllables, e.g., in David's fighting him now. I, U .LLJUI. J. But one might justifiably
ask how the (rhythmic) accent can be determined - and perceived - in an utterance like
Dinner's ready. 11 . ι ·. ,, where all the syllables tend to be of equal length. The answer is
that accent, like all phonetic, phonological and other linguistic entities, is recognized, in
the speech signal, by reference to internalized (memorized) patterns. It is assumed that pat-
terns like those shown in fn. 6 above are remembered, together with a quasi-absolute 'beat'
length, typical for the given speech tempo. — Mental traces for various quasi-absolute
beat lengths are necessary elements of a conductor's musical memory. — Thus, dinner's
ready . „ >ι » ι two NRUs of two syllables each, rather than four one-syllable NRUs be-
cause these would be almost twice as long, as in Jack bought four dogs. u—..—π—>. It should
also be noted that speech perception is 'heterarchical', with parallel signal processing me-
chanisms active at several levels, and with feedback. These complex processes concur in the
resolution of possible ambiguities.
8
Cf. Ladefoged's examples: speed\ speedy, speedily (1975, p. 103).
208 W. Jassem, D. R. Hill, I. H. Witten

lations between the rhythmical structure and syntax. These interrelations


are described in detail, though perhaps still not fully, in Jassem, 1952. Suf-
fice it to say here that words belonging to a TRU usually form syntactic
entities, and phraseology often disrupts a simple rhythm-syntax relation.
In the minute hand of my watch, of is in ANA (is proclictic): (the minute
hand) of my watch, but in a kind of fruit, o/belongs to one rhythm unit to-
gether with kind: a kind of fruit?
7. The description of rhythm as explained under 1 to 5 above may very
simply be incorporated into a phonemic transcription of General British
English (RP), as proposed by Jassem (1949), by observing that (1) the ac-
cented syllable is preceded by an accent mark (tonal, e.g. [„], [J etc.)
or atonal-rhythmic [.], or general ['], (2) TRUs are separated by spaces,
(3) a NRU, by implication, extends between an accent mark and the fol-
lowing space, and (4) the length of the syllables of the NRU, and the
length of the ANA are determined by rules 1, 2 and 5 above. Thus, Aber-
crombie's (1964; 216) I This is the\ house that\John huilt\ is, according to
(B), This is the house that John built and is transcribed Aöisiz da'haOs 03t
d 3 on'bdt7. 10
A very essential difference between the two conceptions of rhythm in
spoken English is the treatment of the unaccented syllables, which - ac-
cording to (A) - always follow the accented syllable. According to (B),
they are either preaccentual (preictic) or postaccentual (postictic), the two
categories being subject to fundamentally different rhythmic patterning.
If isochrony makes any sense at all, the length of the accented syllables
must be related to the number of unaccented syllables within the same
construct (foot, rhythm unit, or whatever). According to (A), in
John's pleased
John was pleased
John would be pleased
John would have been pleased
John would have been extremely pleased
the syllable / d j o n / should get gradually shorter to accomodate the un-
accented (unstressed) syllables. According to (B), the length of /dßDn/ in
these utterances would not be significantly affected by the presence or ab-
sence of any following unaccented syllable(s) in our example because the
TRU ends with / n / . u

' Cf. such unconventional spellings as kinder (kind of), sorta (sort of), cupper (cup of), etc.
10
Further examples of this type of transcription may be found in Jassem 1949, 1952, 1980
and 1981, and also in O'Connor 1967.
11
According to yet another theory of English speech rhythm proposed by Allen (1968), one
unstressed syllable following a stressed one does not affect the length of the interstress in-
terval, but a further increase of the number of unstressed syllables results in an increase of
the total length of that interval. The isochrony effect is maintained by a "negative correla-
Isochrony in English Speech 209

4. Syllable or phone?

The smallest unit of speech rhythm is usually assumed to be the syllable


and this is certainly appropriate for poetry. Describing English rhythm,
Jones (1975: 238—244) uses musical notation with note lengths indicated in
the same manner as in usual musical scores. But the length of the note
does not here refer to the duration of the syllable but rather to the time
from one vowel to the next. All the same, Jones does speak of syllable
length in this connection (cf., e.g., 1976: 238ff.). In Jassem's formulation
of English rhythm (1949; 1952) it was the duration of the syllable that was
assumed to be regulated by the isochrony principle. Abercrombie (1964)
distinctly refers to syllable length. But within this theory, the length of the
syllable is, partly at least, determined by the component phones (cf. above,
Sec. 3). In 1965, O'Connor wrote ". . . this contention [the isochrony
principle, W. J.] has never been satisfactorily tested and an investigation of
the durational aspect of speech might throw light on this, and on the gen-
eral pattern of rhythm in English" (O'Connor, 1965: 11). The same au-
thor, three years later (O'Connor, 1968), pointed out that if the syllable
was an elementary unit of rhythm, then, in a fixed frame, it should main-
tain its length irrespective of the number of constituent phones, so that the
duration of the component segments should be ". . . in more or less in-
verse proportion to their number." (O'Connor, 1968: 1). Although the ex-
periment performed by O'Connor was limited in scope, the results are of
considerable significance. It was found that, in a fixed rhythmical frame,
the length of the syllable increased quite consistently with the number of
the constituent phones. However, "The duration is in any case not directly
proportional to the number of segments; there is therefore a compressive
tendency which might correspond to isochrony mentioned in phonetic lit-
erature" (ibid, p. 3).
In order to measure the duration of any entity in the speech signal, it is
obviously necessary to know where, along the time axis, the entity begins
and ends. In a relatively simple case like the one investigated by O'Con-
nor 12 there is no problem about the beginning and the end of the syllable
in question. But there are cases in English where there is "no point of syl-

tion between the length of a given unstressed syllable and the number of syllables in the
interstress interval" (p. 52); " . . . as we add more unstressed syllables, the interval gets
longer, but the longer it gets, the more it resists any further increase in length" (p. 53).
However: "Pre-clitics are generally shorter than post-clitics and they may undergo differ-
ent kinds of changes because of these intrinsic differences" (ibid.). Lehiste (1972) also
noted the fact that an unstressed syllable may "shorten" a preceding stressed syllable, or
very nearly fail to do so - according to whether it belongs to the same word or not. T h e
loss of the 'shortening effect' is particularly noticeable between subject and predicate.
12
Syllables consisting of between 3 and 9 phones were embedded in a fixed frame "Take
Park", e.g., 'Take / s e s / Park', etc.
210 W. Jassem, D. R. Hill, I. H. Witten

lable division" (Hockett, 1955: 52) because two peaks are joined by an in-
terlude. A syllabification like /smol-9/ ( s m a l l e r ) is morphophonemic,
not phonetic or phonemic. C h i n a and f i n e r are perfect rhymes in Stand-
ard British and there is no phonological ground for treating them differ-
ently at the level of phone-(phoneme)-to-syllable synthesis. In fact / n / is
an interlude in both words. Such cases make it very difficult, if at all possi-
ble, to measure the length of English syllables in connected speech. But
the boundaries between segments (and phones) can, at least in principle,
always be located in spectrograms. It is therefore preferable to examine
rhythm units, such as Feet or NRUs, or whatever, as sequences of seg-
ments rather than sequences of syllables. Following O'Connor's train of
thought, it seems appropriate to investigate the relations between the
length of a rhythm unit and that of the constituent phones.

5. The Experiment

It will have been gathered from the preceding sections that much more re-
search is needed, both at the speech wave and the perception levels, before
a completely reliable description of English speech rhythm and the under-
lying principle of isochrony can be formulated. The main aim of the pres-
ent experiment is (a) to test a hypothesis on the reality of isochrony in the
speech signal on the basis of some reasonably representative material, and
(b) to see whether there is any statistically justifiable reason for preferring
one of the two specific theories presented in Sec. 3 over the other. Interre-
lations between rhythm and syntax in the light of the two theories will
also come under discussion.
Considering the complexity of the entire problem of rhythm and iso-
chrony in English, outlined above in Sec. 2, the present study can only
hope to be one step in the direction of a complete solution.
Study Units Nos. 30 and 39 of Μ. A. K. Halliday: A Course in Spoken
English: Intonation (1970) have been selected because the recorded materi-
als are readily available commercially, so that - if necessary - measure-
ments or other experiments may be made on the same material by other
interested specialists. The pronunciation of the texts seemed to the present
authors to be a good compromise between careful and deliberate, and cas-
ual and natural. Since a reliable automatic method of segmentation has
not yet been devised, it was necessary to make the measurements visually
from conventional spectrograms. The duration of individual phonetic seg-
ments was measured with an accuracy of about 5 ms and all the measure-
ments were double-checked by at least two of the authors. The incidence
of 'stress' is marked in the printed texts, but was checked by the authors in
careful, though informal, listening tests and was confirmed, except for
one or two cases.
Isochrony in English Speech 211

6. Phones and Rhythm Units: Basic Data

After extensive preliminary statistical testing, the phones were divided into
18 classes, as follows:
F- flaps and initial voiceless lenis stops,
D- the weak-friction lenis fricatives / ό , ν / ,
G- the non-syllabic vocoids /w, j, r / ,
E- the checked non-open vowels /e, i, 3, CD, λ/
B- non-initial lenis stops,
N- non-syllabic nasals,
H- the aspirate and the initial voiceless fortis aspirated stops,
K- fortis unaspirated stops,
Z- the heavy-friction lenis fricatives / z , 3/,
s c - syllabic contoids,
K H - aspirated unaccented fortis stops,
s- fortis fricatives,
A F V - the lenis affricates /dß, d r / ,
Ο- the close unchecked and the open checked vowels /i:, u:, ae (:),
D/,
KHA - accented fortis stops,
AF - the fortis affricates /tj", tr/,
A- the mid and open unchecked monophthongs /3:, a:, o:/ and the
diphthongs
F T H - aspirated final fortis stops.
It will be noted that the classes include types of phones rather than types
of phonemes. Preliminary statistical testing revealed that there are syste-
matic durational differences between allophones of one phoneme, whilst
phones belonging to different phonemes may be of the same duration.
Initially, a simplifying assumption was made that the mean durations of
the phone types may be calculated irrespective of the two kinds of rhythm
units posited under theory (B).
Table 1 presents the mean durations, with the variances, standard devia-
tions and coefficients of variability.
The differences between the means need not necessarily be all statisti-
cally significant. An analysis of variance showed that at least some of the
means were different at α = .01, as shown in Table 2.
The null hypothesis on equality of all means being rejected at α = .01,
a means-clustering analysis as proposed by Gabriel (1964) was performed.
This analysis groups together those of the means that do not differ signifi-
cantly. It is based on the principle of minimum within-group variance with
maximum between-group variance. This test led (after the 5th step) to the
following grouping:
(F) (D) ( G E B N ) ( H K Z S C ) ( K H S A F V O ) (KHA AF A FTH)
The differences between the means within groups have not been shown
212 W . Jassem, D. R. Hill, I. H. Witten

Table 1: Phone class duration

rank class mean variance st. dev. coeff.


of var.
ms ms2 ms %

1 F 16.6 32.5 5.7 34.3


2 D 47.6 303.2 17.4 36.6
3 G 56.7 500.7 22.4 39.5
4 Ε 59.2 643.1 25.4 42.9
5 Β 60.2 486.5 22.1 36.7
6 Ν 61.8 809.4 28.4 46.0
7 Η 64.9 1165.6 34.1 52.5
8 Κ 65.3 523.9 22.9 35.1
9 Ζ 70.3 1144.5 33.8 48.1
10 SC 76.2 685.3 26.2 34.3
11 ΚΗ 85.5 538.8 23.2 27.2
12 S 87.9 1157.9 34.0 38.7
13 AFV 93.9 832.9 28.9 30.7
14 Ο 96.3 1550.6 39.4 40.9
15 ΚΗΑ 117.2 1074.8 32.8 28.0
16 AF 126.9 835.5 28.9 22.8
17 Α 132.1 2971.7 54.5 41.3
18 FTH 137.5 709.1 26.2 19.4

Table 2: One-way analysis of variance for phone class duration

variance SS df MS F F.05 F. oi
calculated

between 1636 860 17 96286.1 98.985 1.62 1.96


error 2413350 2481 972.1
total 4050210 2498 972.7

to be significant at α = .05, whilst any difference between any two means


not belonging to one group is significant at α = .05.
The phone duration data were subsequently broken down according to
the type of rhythm unit, as shown in Table 3.
The small differences between the F O O T means in Table 3 and the
corresponding means in Table 1 are due to slightly different data included
in the calculations (e. g., the phones in Feet with 'silent stress' were disre-
garded in the calculations for Foot means).
T h e following observations can be made on inspection of the figures in
Table 3:
Isochrony in English Speech 213

Table 3: Mean durations of phone classes in rhythm units (ms). Size of class in parentheses.

phone class FOOT NRU ANA


F (17) 16.8 (9) 18.5 (10) 14.5
D (135) 48.4 (73) 49.8 (74) 45.2
G (193) 56.2 (166) 57.9 (37) 43.2
Ε (513) 59.8 (303) 66.0 (244) 50.1
Β (131) 60.4 (108) 60.5 (26) 59.6
Ν (259) 62.2 (208) 62.9 (65) 60.5
Η (37) 65.8 (33) 69.3 (5) 35.6
Κ (179) 66.1 (144) 68.5 (43) 55.4
Ζ (69) 73.2 (57) 77.2 (20) 55.4
SC (44) 75.9 (38) 77.6 (6) 65.0
ΚΗ (49) 86.3 (37) 88.5 (12) 77.4
S (165) 90.4 (144) 93.4 (27) 65.5
AFV (18) 93.8 (16) 95.3 (1) 80.6
Ο (141) 97.3 (123) 100.0 (25) 78.8
ΚΗΑ (41) 118.3 (42) 118.0 - -

AF (30) 127.7 (30) 127.7 - -

Α (231) 134.2 (199) 139.9 (47) 100.2


FTH (6) 137.5 (5) 136.5 - -

Table 4: Relative durations of the phone classes (NRU = 1.0)

phone class FOOT ΑΝΑ


F .907 .785
D .972 .907
G .970 .755
Ε .907 .759
Β .999 .985
Ν .989 .962
Η .948 .514
Κ .965 .809
Ζ .949 .718
SC .978 .838
ΚΗ .975 .874
S .967 .701
AFV .985 .846
Ο .973 .788
ΚΗΑ 1.000 —

AF 1.000 —

Α .959 .717
FTH 1.010 -
214 W. Jassem, D. R. Hill, I. H. Witten

(1) With but a few exceptions, the ranking of the means is the same
within each of the three sets of data (and the same as that in Table 2).
(2) Most of the shifts of ranking occur in the ANA column, in which
the means are less reliable because of smaller sizes of the classes.
(3) With very few exceptions, whenever the three means are available,
the ANA mean is the smallest and the N R U mean the largest. Table 4
shows the relative means assuming that for each phone class the duration
in N R U is equal to unity.
Table 5 includes mean values that are critical for the problem at issue
here.

Table 5: Statistical characteristics of rhythm units

type of grand mean mean rhythm mean number mean rate


rhythm of phone unit of phones (phones per
unit duration duration (mean length) second)
(ms) (ms)
FOOT 75.32 427 5.7 13.3
NRU 80.90 339 4.2 12.4
ANA 56.57 162 2.9 17.7

The data are not sufficient for a hypothesis to be put forward as to


whether the distinction N R U vs. ANA affects the different phone classes
in the same way, i.e., as to possible interaction between N R U / A N A and
the phone classes. There is no indication, for instance, that vowels might
be subject to a more drastic 'shortening' in ANA. But the average duration
of NRU is more than twice the average duration of ANA. The number of
phones in ANA is, on an average, 0.69 the number of phones in N R U
whilst the average duration of ANA is 0.48 that of NRU. Actually the av-
erage phone duration in ANA is 0.7 of the average phone duration in NRU.
This difference is significant at α = .001 and is tantamount to the pho-
neme rate being 1.43 faster in ANA than in NRU.

7. Rhythm Models

7.1. A model with two variables


Let us denote the duration of any rhythm unit by d, its 'size', expressed as
the number of constituent phonemes, by n, and the duration of each of the
constituent phonemes, by p. We ignore here the differences due to mem-
bership in different phone classes. By definition, d = np. Under an as-
sumption of functional relationship between d and n, we may consider two
extreme cases:
Isochrony in English Speech 215

(A) N o isochrony: p = const. = — and


n
(B) Strict isochrony: d = const. = np.
In both cases we are assuming that d is a function of n. The two cases
are illustrated in Fig. 1.

number of phones in the unit.

The functions in Fig. 1 have physical sense only for η = {real positive
integer}. The upper limit of η is not known.
In reality, there is no functional relationship between d and n, at least
because (a) there are systematic differences in the duration of phones, and
rhythm units of a given size may consist of different phone classes, thus
differing in duration, and (b) there is a certain measure of random varia-
tion even if the rhythm units are of the same size and have the same struc-
ture in terms of classes to which the constituent phones belong. There are
no doubt other sources of variation, e.g., the position in the utterance (ut-
terance-final rhythm units may tend to be longer), consequently, a realis-
tic model of isochrony is a regression model in which d = a + bn, i. e.,
one in which the duration of the rhythm unit is estimated from its size on
the basis of a best-fitting analytical relationship between the two variables,
216 W. Jassem, D. R. Hill, I. H. Witten

Figure 2: A regression model of the relationship between d- the duration of a rhythm unit
and η - the number of phones in the unit.

d t

T-
2
η

Figure 3: Regression coefficient b = 0. Values of ί/cluster closely about a 'stria isochrony'


line.
Isochrony in English Speech 217

d t

ι . i . 1 >
1 2 3 4 5 6 η

Figure 4: Regression cofficient b = 0. Values of dcompletely random.

as shown in Fig. 2. Note that if the regression coefficient b = 0, then


isochrony is indefinite, as shown in Figs. 3 and 4.
We shall consider two models with two variables. In one of these, i/will
be estimated from the sum of the mean durations of the constituent phone
classes. This sum will be denoted by d.

7.2. A model with three variables


Another possibility that has to be investigated is that d, the duration of the
rhythm unit, depends both on d, the mean cumulative duration of the con-
stituent phone classes, and on their number n. The value of d may be ex-
pected to correlate highly with n, yet the interrelations between d and η
may be such that a better estimate of d is obtained if both d and η are as-
sumed to have an effect on d. In a general from, this multiple association
is expressed by a regression equation d = a + bd + cn.

8. Analysis of Regression

8.1. Two variables: d and η


8.1.1. Linear regression
In a bivariate regression model, d is first made to depend on η only. For
no isochronism, the regression line can be made to pass through the origin
and form a 45° angle with the abscissa by performing a linear transforma-
218 W. Jassem, D. R. Hill, I. H. Witten

tion of both axes, with ( d - d) / ρ on the ordinate and (η - ή) on the ab-


scissa. The regression equation then takes the form
^ ^ = a + b(n- ή).
Ρ
Ideally, under this transformation, a should equal zero, but in reality a
« 0 because ρ is not calculated from exactly the same raw data as d. As
mentioned before, some rhythm units had to be left out because their du-
ration was not measurable. Also, in a few cases, the total duration of the
rhythm unit could be measured, though segmentation was doubtful so the
individual values of /»were not measured. It will be seen that the resultant
discrepancy is entirely negligible.

Table 6

d - d d-d
n-fl a (n-fl) σ r r*· 100% a b arctg b
Ρ Ρ

FOOT 0.00 0.00 2.21 1.98 0.72 51.7 -0.0026 0.644 32.8°
NRU 0.00 0.00 1.63 1.47 0.62 39.0 -0.0018 0.561 29.3°
ANA 0.00 0.00 1.53 1.68 0.83 69.6 -0.0036 0.916 42.5°

Table 6 gives the results of an analysis of regression with the variables d


and n, as expressed by eq. (1).
The magnitude that is most directly related to isochrony is either the re-
gression coefficient b or the corresponding arctg b. The following conclu-
sions can be drawn from Table 6:
(1) The regression coefficient for ANA is quite close to unity, and the
corresponding arctg b, i.e., the angle of the regression line with the ab-
scissa, is close to 45°, consequently there is very little isochrony in ANA.
(2) The NRU has a coefficient of regression which is nearly half (ex-
actly .613) the coefficient of regression for ANA, and the angle of the re-
gression line for NRU is 0.69 the angle for ANA13. There is distinct tend-
ency towards isochrony in NRU, though it is not very close to strict
isochrony.
(3) The regression coefficient for FOOT is intermediate, but closer to
that of NRU.
(4) The coefficient of determination r 2 · 100% indicates that though
there is distinctly more variance unaccounted for in the regression of
NRU than in the regression for ANA, both may be considered as satisfac-
tory in the sense of their predictive power. The fact that the coefficient of
determination is high for ANA and lower for NRU makes it plausible that
there is a factor which is active in the latter but absent in the former. Prob-

13
Cf. above, Sec. 6, on the relative mean durations of NRU and ANA.
Isochrony in English Speech 219

ably this factor is the distinction between final and nonfinal position. The
ANA can, by definition, only stand in a nonfinal position. The coefficient
of determination for the FOOT is intermediate between the other two and
indicates that theory (A) is not, in a statistical sense, unacceptable. But it is
shown to obliterate a distinction which is statistically very highly significant
(viz. that between ANAs and NRUs). The isochrony effect is shown in
Fig. 5.

8.1.2. Quadratic regression


Table 7 gives the results of a quadratic regression of d on η and n2. After
normalization of the variables, the regression equation here is of the form
^ψ- = a + b(n- n) + c(n- ή)1. (2)

Table 7
r[(n-n),
wm σ(η-η)1 r[(n-n)»,d] r[(n-fl),d]
(n-n)']
a b c H-100%

FOOT 4.90 7.14 0.29 0.72 0.27 - 0 . 1 4 1 0.619 0.0282 52.7

NRU 2.67 4.52 0.37 0.62 0.47 - 0 . 0 7 8 0.523 0.0287 39.6

ANA 2.34 3.81 0.55 0.83 0.56 0.117 0.844 0.0515 70.3
220 W . Jassem, D. R. Hill, I. H. Witten

It can be seen from Table 7 that the coefficients of determination for all
three types of rhythm unit are marginally better than in the linear model,
which is more directly interpretable. Therefore, there is very little, if any-
thing, to be gained from a quadratic regression model.

8.2. Two variables: d and d.


It is also possible to estimate the duration of a rhythm unit from the cu-
mulative average duration of the constituent phones. In other words, we
now take the mean duration of each phone in the rhythm unit, as shown
in Table 3 in the appropriate column, add the figures obtaining d and esti-
mate i/from this. If there is no isochrony, then, on an average, d = d. In
the case of strict isochrony, d is constant, so there must be a 'coefficient of
compression'. We have found it convenient to express the relation between
d and dby
d-d , ι d .
—— = a + b- (3)
Ρ Ρ
because, again, the regression coefficient and its corresponding angle are
easily interpretable.

Table 8

d d
σ- Γ Γ2· 100% a b arctg b
P Ρ
FOOT 5.74 2.09 0.80 63.8 -4.35 0.758 37.1°
NRU 4.25 1.48 0.73 52.8 -3.07 0.724 35.9°
ANA 2.92 1.58 0.90 80.4 -2.78 0.951 43.6°

The results of the analysis of regression are contained in Table 8. As the


means for d are here normalized by ρ i. e., pFoot, />NRU and PANA respec-
tively, they represent in fact the mean number of phones in the different
rhythm units. As in Table 6, the correlation coefficient is highest for ANA
and smallest for NRU, but each is distinctly higher than its counterpart in
Table 6. The values of the coefficients of determination are therefore also
each better. Thus, by taking the means for the various phone classes
which go to make d, we have accounted for part of the variance still unac-
counted for in the previous bivariate models. Fig. 6 shows the isochrony
effect.

8.3. Regression with three variables


Even though the simple regression models are quite satisfactory as judged
by the high coefficients of determination, it is tempting to see whether
some further improvement might not be achieved by predicting d from
both d and n.
Isochrony in English Speech 221

Figure 6: Linear regression of do η dm Foot, N R U and Ana according to Table 8.

Figure 7: Linear regression of don d. Regression lines brought to a common origin.


222 W. Jassem, D. R. Hill, I. H. Witten

Table 9

d d ,d d.
η σ(η) 4 r(n,i) r(nA a b c r 2 · 100 %
Ρ Ρ Ρ Ρ Ρ
FOOT 5.76 5.76 5.74 2.21 2.09 0.93 0.72 0.80 1.33 -0.187 0.942 64.4
NRU 4.25 4.25 4.25 1.63 1.48 0.91 0.62 0.73 1.11 -0.221 0.950 53.8
ANA 2.94 2.92 2.92 1.53 1.58 0.96 0.83 0.90 0.17 -0.378 1.302 81.3

The following linear regression was tested:


i = a + b-_+ cn, (4)
Ρ Ρ
and Table 9 gives the results of the regression analysis; cf. also Fig. 7.
Ideally, the three means: n, dΊρ and dJρ should be equal. Again, the
slight differences are due to small differences in the raw data. The correla-
tion coefficients for (n, d/p) are naturally very high. Both for (n, dip) and
(d/p, d/p) the correlation coefficients are highest for ANA and lowest for
NRU, but all values are high. As can be seen from the values of the coeffi-
cients of determination, the three-variate model accounts for more var-
iance than any of the bivariate models, though it is only marginally better
than that for d and d.

9. Some Linguistic Considerations

It was explained in Jassem ( 1 9 4 9 ; 1 9 5 2 ) that if the model of rhythm there


proposed is accepted, then rhythm can very simply be indicated in the
phonemic transcription of a running English text by observing the rules
quoted above in Section 2. The beginning of Unit 3 0 in Halliday ( 1 9 7 0 )
transcribed according to the rules reads as follows:
[izöaet tjiz tcö'it o'izit ta'pffidin s'maffls traepll ifaid'n3ö)n altar a'bacot
öa'wedu) aidav'sent a'keibl]
The beginning of Unit 39 reads:
[ai'laik Öast'p3im baiöaeti'levnj3r scold os'treihan boi didjcD silt]
Is that /'izöaet/, cheese / ' t j i z / , earlier / sliar/ are examples of NRUs
that are co-extensive with TRUs because they include no anacrusis. To eat
/teo'it/, or is it/o izit/, to put in /ta'poadin/, if Id known /ifaidn3(Dn/ are
examples of TRUs that begin with an anacrusis.
No attempt has ever been made to indicate rhythm as described by
Abercrombie ( 1 9 6 7 ) and Witten ( 1 9 7 7 ) in a running transcription of an
English text.
As mentioned earlier, both models are independent of syntax, but both
admit interrelations between syntax and rhythm. Abercrombie's model, as
applied by Halliday, sometimes results in very peculiar tone groups such
as //if Id known earlier about the wedding I'd have//sent a cable//. The
Isochrony in English Speech 223

first tone group includes the subordinate clause plus the subject and part
of the predicate of the main clause, the remainder of which forms the sec-
ond tone group. Many peculiar tone group boundaries may be found in
Halliday 1970. Here are some more examples: //he was grey and he was
woolly and his//pride was inordinate, he danced on a sandbank in the//mid-
dle of Australia and he//went to the Big God Ngong/ / (p. 121). / / on the Isle
of Man you can//still ride in a horse-drawn tram// (p. 117). Such tone
group boundaries, strange from the syntactical point of view, are due to
the assumption that unstressed (unaccented) syllables always belong to the
same rhythm unit as the preceding stressed (accented) syllables and from
the assumption that silence (pause) is a marker of a tone-group bound-
ary 14 . Model (B) of English rhythm has a simpler relation to syntax and
does not result in such disconcerting discrepancies between the phonologi-
cal and the syntactical structure of running speech.

10. Summary and Conclusions

A statistical method has been applied to express isochrony in quantitative


terms, and an attempt has been made to find isochrony in the acoustic
speech signal. It was assumed that if isochrony was at all detectable in the
speech wave, it should affect the duration of the phones which constitute
the rhythm units.
Tape recordings of continuous, naturally spoken General British Eng-
lish ('RP'), consisting of a total of almost 2500 successive phones served as
experimental material. The duration of the phones was measured spec-
trographically and the phones were grouped into classes according to their
mean duration.
Two theories of English rhythm were tested: Abercrombie's — called (A)
- which postulates one type of quasi-isochronous rhythm unit, the F O O T ,
and Jassem's - called (B) - which posits two types, viz. ANACRUSIS with
no isochrony, and NARROW R H Y T H M U N I T which tends towards
isochrony. Four regression models were applied making the duration of a
rhythm unit depend (a) linearly on the number of phones in the unit, (b)
curvilinearly on the number of phones in the unit, (c) on the sum of the
mean durations of the phones in the unit and (d) on both the sum of the
mean durations of the phones in the unit and their number. The results of
the regression analysis show that in all models the tendency towards
isochrony is minimal in ANACRUSIS and quite distinct, if not very
strong, in the N A R R O W R H Y T H M U N I T . Isochrony is also present in
the FEET, but the F O O T averages out and obliterates the distinction be-

14
On a distributional view of the tone-group, see Jassem 1978.
224 W. Jassem, D. R. Hill, I. H. Witten

tween ANACRUSIS and NARROW R H Y T H M U N I T which is shown to


be statistically very highly significant.
In keeping with theory (B), rhythm and isochrony can be very simply in-
dicated in running transcription of English text, which does not appear to
be possible within theory (A). This is of particular importance for compu-
ter-controlled speech synthesis by rule. Using a very simple algorithm
based on rules supplied by theory (B), the temporal organization of speech
may be generated from a transcription indicating the incidence of accent
and boundaries between T O T A L R H Y T H M UNITS plus a table of mean
phone durations.
Theory (B) relates the syntactic component of spoken text to its phono-
logical component much more simply than does theory (A).

Acknowledgements

The authors wish to express their appreciation of a grant from the Na-
tional Research Council of Canada to the Department of Computer
Science, University of Calgary, Alberta which enabled WJ to work there
during an extended visit to Canada, and to thank the British Council for
supporting a shorter working visit by WJ to the University of Essex, Col-
chester and one by IHW to Poznaü. The co-operation of dr M. Krzysko
and Mr. P. Stolarski of the Computer Centre of Mickiewicz University,
Poznari, in the computing labour is also gratefully appreciated.

References

Abercrombie, D. (1964). Syllable quantity and enclitics in English. In D. Abercrombie & al.
(eds.), In Honour of Daniel Jones. Longmans: London. 216-222.
Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh University Press: Edin-
burgh.
Abercrombie, D. (1973). A phonetician's view of verse structure. In Phonetics in Linguistics.
Longman: London. 6-13.
Adams, C. (1979). English Speech Rhythm and the Foreign Learner. Mouton: The Hague.
Allan, G. D. (1968). On testing for certain stress-timing effects. UCLA Working Papers in
Phonetics 10: 47-59.
Bolinger, D. L. (1965). Pitch accent and sentence rhythm. In D. L. Bolinger, Forms of Eng-
lish. Hokuou Publ. Co.: Tokyo. 139-180.
Gabriel, K. R. (1964). A procedure for testing the homogeneity of all sets of means in analy-
sis of variance. Biometrics 20 (3): 459—477.
Halliday, Μ. A. K. (1970). A Course of Spoken English: Intonation. Oxford University Press:
Oxford.
Hockett, C. F. (1955). A Manual of Phonology. Indiana University Publications in Anthropol-
ogy and Linguistics, Memoir 11.
Jassem, W. (1949). Indication of rhythm in the transcription of Educated Southern English.
Le Maitre phonetique 111/92: 22-24.
Isochrony in English Speech 225

Jassem, W. (1952). Stress in Modern English. Bulletin de la Societe Linguistique PolonaiseXll:


189-194.
Jassem, W. (1978). O n the distributional analysis of pitch phenomena. Language and Speech
21: 362-372.
Jassem, W. (1980). Fonetyka jfzyka angielskiego (English Phonetics) P W N : Warszawa, 7th ed.
Jassem, W. (1981). Podrfcznik wymowy angielskiej (A Handbook of English Pronunciation)
P W N : Warszawa, 7th ed.
Jassem, W. & Gibbon, D. (1980). Re-defining English accent and stress. Journal of the Inter-
national Phonetic Association 10: 2-16.
Jones, D. (1976). Outline of English Phonetics. Heffer: Cambridge. 9th ed. repr.
Ladefoged, P. (1975). A Course in Phonetics. Harcourt, Brace, Jovanovich: N e w York.
Lea, W. A. (1974). Prosodic aids to speech recognition: IV. A general strategy for phonologically-
guided speech understanding. Univac Rep. P X 10791.
Lehiste, I. (1973). Rhythmic units and syntactic units in production and perception. Journal
of the Acoustical Society of America 54: 1228-1234.
Lehiste, I. (1975). T h e role of temporal factors in the establishment of linguistic units and
boundaries. In W. U. Dressier & al. (eds.), Phonologica 1972. 115-122.
Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics 5: 253-263.
O ' C o n n o r , I. D. (1965). T h e perception of time intervals. Progress Report, Sept. 1965. Phone-
tics Lab. UCL. 11-13.
O ' C o n n o r , I. D. (1967). Better English Pronunciation. Cambridge University Press: Cam-
bridge.
O ' C o n n o r , I. D. (1968). T h e duration of the foot in relation to the number of component
sound-segments. Progress Report, June 1968. Phonetics Lab., UCL. 1 - 6 .
Pike, K. L. (1945). The Intonation of American English. University of Michigan Press: Ann Ar-
bor.
Shen, Y. and Peterson G. G. (1962). Isochronism in English. University of Buffalo Studies in
Linguistics, Occasional Papers 9: 1-36.
Uldall, E. (1971). Isochronous stresses in R. P. In L. L. Hammerich et al. (eds.), Form and
Substance. Akademisk Forlag: Odense. 205-210.
Witten, I. H . (1977). Flexible scheme f o r assigning timing and pitch to synthetic speech. Lan-
guage and Speech 20: 240-260.
GERALD KNOWLES

Variable Strategies in Intonation


The best thing to do with a good idea is to knock it down and replace it
with a better one. Of the different models of intonation that have been
proposed, the 'British' model as put forward in the works of Kingdon
(1958) and O'Connor and Arnold (1973) has been remarkably successful
in getting to grips with the often elusive patterns of intonation. This
model has probably achieved all it can be expected to do, based as it is on
structuralist assumptions. Now that linguists have gone on from the study
of structures and systems to the study of language in communication, we
require rather more of an intonation theory, and need an improved model.
In order to move forward, we need new evidence, which will challenge
the old ideas and force us to examine them afresh. British studies have
dealt almost exclusively with RP, and there are just a few exceptions such
as Brown, et al. (1980). In this paper I shall take a diasystemic approach to
intonation. The traditional model is unable to handle sociolinguistic var-
iables, and trivially different patterns in closely related varieties of English
are irreconcilably different: this I regard as the reductio ad absurdum of the
'tone' model. I shall be referring to several varieties of English, but mainly
R P and Scouse. Scouse is the dialect of Liverpool and Merseyside which
arose in the nineteenth century as the result of large scale immigration
from Ireland (Knowles, 1975), and which remains an interesting hybrid of
West Lancashire speech and Anglo-Irish. In order to analyse Scouse into-
nation, one has first to understand some of the ways in which Northern
English and Anglo-Irish intonation differ from RP. Although this is of
course a very partial diasystemic approach, it does avoid some of the false
generalisations that can arise from a study of RP alone.
What happens to good ideas in practice, is that they are received some-
what uncritically, and become orthodoxy and conventional wisdom, and
anyone who dares to challenge them is liable to be branded a crank or a
heretic. In 1775, Steele had the idea of transcribing English rhythm in a
musical notation, and today, most linguists and teachers of English 'know'
that English has an isochronous stress. For my own part, I cannot be-
lieve that of English speak in musical bars of the kind described by Aber-
crombie (1965) or Halliday (1967): indeed, the rhythmical hypotheses I
regularly use to interpret fundamental frequency contours are incompati-
ble with isochrony. Again, the most obvious observations on intonation
concern pitch movements, and it is widely taken for granted that by
Variable Strategies in Intonation 227

segmenting pitch patterns in some way we can arrive at the irreducible un-
its of intonation. On the contrary: we need to take several other factors
into account, including rhythm, gradient, voice quality, and the way vow-
els and consonants fit the pitch contour. In this paper I shall retain as
much as possible of the traditional approach, including terminology and
symbols; but the effect of going outside RP is that a quite different net-
work of relationships among familiar patterns emerges, and many of the
assumptions on which traditional analyses are based are shown to be un-
tenable.

1. Maps and Strategies

If we start with the lexico-grammatical or 'verbal' system, as linguists na-


turally do, and look outwards from there to intonation, the hypothesis
that naturally suggests itself is that intonation is a function of something
in the verbal system. The aim of a theory of intonation is simply to find
rules to map something in the verbal system on to intonation. This ap-
proach is at first very successful: non-final sense groups are mapped on to
low rises, yes/no questions on to higher rises, statements and commands
on to falls, and so on. Exact equivalents for these mapping rules can be
found in other languages. However, when these rules are tested against
real data, they turn out to be stereotypes, with all sorts of hidden assump-
tions built into them.
The view taken here is that intonation is an autonomous semiotic sys-
tem, which plays a rather different role than the verbal system. The
speaker has not only to decide what to say, but how to convey it effec-
tively to the addressee. H e has several channels at his disposal - verbal, in-
tonational and paralinguistic - and employs communicative strategies to
combine the signals sent on each channel so that the total effect will be
correctly interpreted by the hearer. Conventional linguistics concentrates
on the content of the message that is conveyed: intonation is part of rhe-
toric, or the strategies employed to get that message across.
In the analysis of a complex speech event, it is not always easy to decide
which channel does what. Perhaps few linguists would agree with the me-
trist who argued that the dactyl is a merry foot, citing as evidence the line
Merrily, merrily shall I live now: but the literature abounds with examples
in which the meaning of the words is ascribed to the intonation. Similarly
we must not confuse the role of intonation with the total strategy of which
it is a part. For instance, intonation is important in strategies for convey-
ing illocutionary force, but it is unlikely that intonation has illocutionary
force in itself.
The aim of a theory of intonation is to identify formal patterns and to
identify their role in speech. It is then a problem for a theory of rhetoric
228 G. Rnowles

to show how these patterns are used in communicative strategies. In prac-


tice, it is impossible (and undesirable) to keep the two strictly apart, and
although we shall be concentrating on a theory of intonation, we shall
have to deal where appropriate with rhetoric more generally.

2. Accent

The simplest job that intonation has to do is to draw the hearer's attention
to a part of a locution, and the gesture involved is an upward obtrusion in
pitch. When this gesture is used to highlight a single syllable, it is conven-
tionally called an accent: the pitch rises to a peak on the accented syllable
and then returns to a lower pitch. The notion of accent is often associated
with Bolinger (1958) but it is in fact one of the oldest ideas in lingustics. It
can be traced from the Vedic tradition through several ancient traditions
to comparative philology, where it proved of central importance from the
time of Verner on. Bolinger's particular formulation of accent is not with-
out its problems, and his accents A, B, C are prosodically complex and
open to the same sort of objections as British tones. Although accents A,
B, C have been used as a universal basis of comparison (Bolinger, 1978),
their description does not seem to my mind to be sufficiently general to
apply even to English outside the standardised varieties.

2.1. The Accent Contour


Although an accent highlights a whole syllable in principle, it seems to fo-
cus on certain parts more than others: on the vowel (or 'syllabic') rather
than consonants, and on the first element of a falling diphthong (e.g.
/ai/) or the second element of a rising diphthong (e.g. /ju/). Ultimately
accent focuses on a single point which we can call the accent point. Several
kinds of pattern focus here, but we shall deal here with only two, namely
the pitch contour and the rhythm.
The contour begins with a transition to the peak at the accent point,
and then glides down again. The terminology for British tones labels only
the part of the contour following the accent point, so that this rising-fall-
ing contour is conventionally called a 'fall', and indicated with the grave
sign ("). The RP fall - at least as it is usually described - has a relatively
rapid fall from the accent point, and then the contour levels out at low
pitch; this can be described as a 'concave' fall. In other kinds of English,
notably in Northern English, the tone is correspondingly 'convex', begin-
ning relatively flat after the accent point, and falling steeply later. The dif-
ference here is trivial, but unless it is recognised it can lead to a systematic
failure to recognise the accented syllable in some kinds of English: North-
erners might seem to have a perverse tendency to delay the fall and to ac-
cent the wrong syllable. Thus in Northern High Street, the pitch
Variable Strategies in Intonation 229

movement on street might give the impression to a Southern ear — or to an


analyst trained in RP intonation! - that the accent has shifted to street.
The rhythm of the accent is governed by a very general principle
whereby tempo is reduced after a rhythmical peak, in this case the accent
point. The initial transition is very rapid, giving the auditory impression of
an instant jump from one pitch to another, so that the accent point is ap-
parently at the very peak; the downward glide is much slower. This tempo
change is particularly marked in RP, but in other varieties it may be less
so. In Scouse the initial transition may be so slow that it is perceived as an
upward glide. Again the difference is trivial, but unless it is recognised, a
'fall' with a slow transition can easily be mistaken for a 'rise-fall'.
Accent has the routine job of giving prosodic shape and rhythm to
words by highlighting some syllables and not others, so that e. g. * beggar is
accented on the first syllable, and begin on the second. (This is of course
commonly called 'stress': but 'stress' is a notorious Humpty Dumpty term,
which means whatever the user intends it to mean on any given occasion,
and which we shall consequently avoid.) Not all the accents which a word
has when spoken in isolation appear when the word is in context in a locu-
tion, and this is the result of accent suppression rules.
Perhaps the best known suppression rule is the compounding rule. Ac-
cents are suppressed except on the first element of a compound, as in text-
book examples blackbird\ Trench teacher as opposed to black \bird or
French \teacher. By calling this a 'rule', we do not mean that it has to op-
erate whenever its conditions are met; it is a rhetorical device that the
speaker may or may nor use. If we introspect about compounds, we auto-
matically carry out the rule in all cases where it can apply. It also applied
in informal conversation. But in any kind of public speaking or lecturing,
it is frequently observed to fail, e.g. the Labour 'Party and 'Mersey 'side
which are just two of many examples heard recently on the BBC. It is
likely that in certain formal styles the compounding rule is cancelled or
overridden by another rule which may be related to the delayed accent re-
ported by Bolinger (1972: 643).
Accent is also used for foregrounding; or more preceisely, items which
are put into the background are deaccented. There are several cases of
foregrounding, one of them being the parallelism rule, by which variables
are accented and constants deaccented, e.g.:
Some of these unemployed teenagers are actually unemployable.
As the governor now is in effect the govern ment. . . (/mant/)
Real examples like this tend to look outlandish compared to con-
structed examples of 'contrastive stress', but they are regularly formed.
The constant, unemploy- and govern-, are deaccented and the accent falls
on the rest of the words. This constant/variable distinction is often con-
fused with given/new, as in: A: Who painted the Mona Lisa? B: Da "Vinci.
Here da Vinci is the variable in the frame X painted the Mona Lisa, and it
230 G. Knowles

also happens to be new information. In many cases accentuation will par-


allel other cohesive devices such as the use of pro-forms, ellipsis and so
on, but the fact that it is independently motivated is illustrated by such
familiar examples as John hit Fred, and then ' he hit 'him. The fact that he
and him are given triggers off pronominalization, but the fact that they are
variables in the frame X hit Y keeps them accented.

2.2. Tails
The term tail is used by Kingdon (1958) to refer to any syllables following
the last accented syllable of a tone-unit. The term can more usefully be
used to refer to the stretch following the change of gradient in the accent
contour: that is, the part that levels out at low pitch in the RP type, or the
part that falls sharply and levels out in the Northern type.
The item that begins the tail is the highest in a hierarchy of the follow-
ing kind: (1) a suppressed accent, e.g. rocking ·horse (where (•) marks the
tail), (2) an unreduced trailing syllable, e. g. stele<phone, (3) a reduced trail-
ing syllable, e.g. secreta·ry / sekra^tri/, (4) a syllable coda, e.g. well,
(5) the accented vowel itself, e.g. ηοΓηο ν / , far Πα<α/.
Types 1 and 2 are sometimes described as 'stressed' syllables, but this is
a different use of the term from its use for an accented syllable. Types 4
and 5 are not called stress, presumably because the whole syllable is al-
ready described as stressed. Type 3 are a positive embarrassment to stress
theories, as the syllable is simultaneously stressed and unstressed accord-
ing to the definition of stress being applied at the time. Vowel reduction
shows them to be unstressed, but the change of pitch gradient gives them
a prominence that cannot be disregarded. They are good candidates for
the isochronous type of stress, and can even count for ictus in verse and
form the rhyme, as in this piece of doggerel:
They came to have a picnic and to see
The campus of the university.

3. The Tone Unit

If we have something complicated to say, we do not utter it in a single


continuous stream. Apart from the problems of production, it would prob-
ably be incomprehensible to the hearer. We need to break it up into more
manageable fractions or quanta. These are independently motivated, and
do not correlate with words, accent contours, phrases or anything else,
and are the units commonly referred to as tone-units. The length of any
quantum depends on a number of factors including the familiarity of
speaker and addressee with the subject matter. They are likely to be short
on a poor telephone line, or if the addressee is taking the message down in
Variable Strategies in Intonation 231

shorthand; they may be longer in the case of an actor reciting his lines, or
a phonetician illustrating constructed examples.

3.1. Sandhi
When several accented items are grouped together in a single tone unit,
non-final accent contours are modified by sandhi rules. These are of two
kinds, pitch and rhythm. The pitch rule, which operates in most dialects of
England, levels the tone of the non-final accent contour, and the rhythmi-
cal rule cancels the tempo change at the accent point.
Consider a word like telephonic which has two accents, which would be
grouped into one tone unit in all but the most extraordinary cases. The
pitch rises rapidly to a peak on te-- the 'head', marked (') - and then stays
fairly level until *pho - the 'nucleus', marked Q - at which point it falls to
low. The effect of the tone sandhi rule is to spread the accent contour
over two accents. Since the non-final accent is no longer nuclear, it does
not have the rhythm of a nuclear contour, and the change of tempo at the
accent point is cancelled, with consequent shortening of the first two syl-
lables.
In the stock RP-type h n (i.e. head plus falling nucleus), it is assumed
that both these rules are carried out. In fact they are independent. For in-
stance, if I give my address as ' Farmdale Road I have to make the proper
noun clear as the hearer has to rely on phonological cues to interpret it; in
order to make it clear I may draw it out, even though it is in the head. The
resulting 'Färmdale - where the double macron (") indicates the lengthen-
ing — is in between the traditional 'head' and 'nucleus' and may be one of
the things called a level nucleus. In Scouse, the tone sandhi and rhythm
rules do not apply in sequence, but as alternatives. By far the commonest
type of head simply shortens pre-nuclear contours, so that after the ac-
cented syllable the pitch begins to return to low, e.g.:
a re connaissance 'aircraft over from "Germany
has what would be described as a sequence of 'falls' for RP. This kind of
head is very common outside England, and in order to recognise it, we
need to apply rhythmical as well as pitch hypotheses. Level heads with
long drawn out accent contours do occur in Scouse, e.g.:
I can't stand that "language any ionger
Which had the illocutionary force of removing a foul-mouthed customer
from a Liverpool pub.

3.2 Tone Unit Division


It is difficult to say exactly how we divide a complex locution into tone-
units, but there would seem to be at least two methods: linear grouping
which takes a group of words without regard to their internal organisa-
tion, and hierarchical grouping which is based on the accentuation of
groups within it.
232 G. Knowles

Linear grouping is used for single words, short phrases, or simple sen-
tences which present no structural problems for the hearer. The sandhi
rules apply cyclically and in the same way for each pre-nuclear accent, and
to indicate that they form a single group, each accent is pitched a little
lower than the preceding one. This produces a 'stepping head* for RP, or
a sequence of peaks for the Scouse type, e.g. 'Proto-'Indo-'Euro*pean. If
the rhythm sandhi rule has applied, this can be followed by an accent sup-
pression rule, which removes all accents between the first and the last,
thus 'Proto-Indo-Euro^pean. (These suppressed accents are sometimes
marked with a raised dot or small circle in British notation.) This suppres-
sion rule, incidentally, explains why the 'stepping head' can appear so nor-
mal through introspection, and yet rather odd in real texts.
Hierarchical grouping preserves the accentuation of its parts, and as it
were conflates potentially separate tone-units into one. For instance, a list
containing a Ford Escort and a 'Morris "Minor can be conflated into a Ford
' Escort and a Morris Minor: in each potential tone-unit pre-nuclear accents
are suppressed, and the nucleus of the first becomes the head of the con-
flated unit. At the upper limit, this rule can conflate whole sentences, as is
sometimes done by comedians, e.g. She said "What would you like?" he
said "What are you "offering?", where the head begins on like and the nu-
cleus falls on offering which are the potential nuclei of the sentences spo-
ken as separate tone-units. At the lower limit, hierarchical grouping over-
laps with linear grouping. Thus if we take the university and of "Lancaster
as a single tone-unit, we obtain either the ' university of "Lancaster by the
linear rule, or the university of "Lancaster by the hierarchical rule.

3.3. Tone Unit Boundaries


The properties of a tone-unit boundary are in some ways the reverse of
those of the accent point. Take an example like ' Go to "bed and 'go to
"sleep. There is a change of rhythm and a change of pitch direction, but
note that in this case they do not coincide: the rhythm changes between
bed and the leading syllable and, whereas the pitch changes at the begin-
ning of the head on go. The pitch falls into a trough before rising again.

3.3.1. Troughs and Rises


Our example could of course be conflated into one tone-unit; Go to bed
and go to "sleep, but the speaker on this occasion chose not to. Boundaries
come at the points beyond which the speaker has chosen not to group
items into one tone-unit. The reason for this may lie in the speaker's prob-
lems of production, or in predicted comprehension problems for the
hearer. In the latter case it would not be unreasonable to expect some kind
of signal to tell the hearer to process one chunk before going on to the
Variable Strategies in Intonation 233

next. There is such a signal and it involves modifying the relations be-
tween rhythm and pitch at the boundary. In RP, the trough and rise is
brought forward on to the tail of the first tone-unit, and since this is an
area of slow tempo, the auditory effect is of an upward glide. This is the
'rising tone' of non-final sense groups of the text-books. However, the
rise as opposed to the final fall is not necessarily the diagnostic feature,
because for the corresponding pattern in Scouse, only the trough is
brought forward.
Now it might be argued that a fall + trough is much the same thing as a
fall alone, but this follows from a shortcoming of the tone analysis. The
effect of the trough is to change the gradient of the accent contour.
Whereas the plain accent has a 'convex' fall, this modified version has rel-
atively level pitch up to the tail, at which point it drops rapidly, thus pro-
ducing a 'convex-concave' fall, which we shall symbolise ( \ ) . In certain
cases, when the accented vowel is very short and followed immediately by
the tail, this 'convex-concave' fall is indistinguishable from the RP-type
concave fall, e.g. one fellow got killed, where the peak is reached on / i /
and immediately slumps to low on / l / .
At this point let us consider a Scouse example like The 'Roundie was the
Ro tunda 'Theatre. The first accent differs from the third in gradient, and
the second differs from the third in rhythm. All three would be called
'falls' in the 'tone' analysis. It follows that Scouse cannot be analysed in
terms of 'tones'. Listening to RP data suggests that rhythm and gradient
are relevant also in RP, although in slightly different ways, and that a
'tone' transcription consequently misses much relevant information.

3.3.2. Blurred Boundaries


In our last example, the phrase Rotunda Theatre needs to be highlighted as
a whole. In other cases our tone-unit rules could lead to bizarre results. If
we divide My Uncle Jim's bought a Morris Minor between subject and predi-
cate, the rules predict *My Uncle "Jim's bought a Morris "Minor. The first
tone-unit retains its head and nucleus, but the second may be modified by
rules very like hierarchical grouping. The first accent is suppressed, it is
then taken over by the first tone unit as the nuclear tail, which then at-
tracts the rise or trough, giving in RP the much more natural My Uncle
'Jim's 'bought a Morris "Minor. Although there are two tone-units here, it
is impossible to say where the boundary is. Consider now a Scouse equiva-
lent with a trough instead of the rise: a 1lady, stepped off the 'side in .front of
the "· van. The speaker has taken three chunks, a lady, stepped off the side
and in front of the van, and then run them together blurring the bounda-
ries. In terms of 'tones', this sounds like a sequence of falls: a 'lady stepped
/ off the side in front / of the "van. But as a sequence of tone-units, that
would not make sense.
234 G. Knowles

4. Relative Pitch

The importance of relative pitch has been increasingly recognised over re-
cent years. What has not been so clearly perceived is that there are at least
three different problems entwined here. First there is the problem of
identifying the placing of pitch pattern in the total span of which an indi-
vidual is capable (see, e.g. Brown, 1977: 127-134). Secondly there is the
pitch relation between successive tone-units, and thirdly there is the height
of any individual accent contour relative to other accents. Brazil's notion
of key (e. g. Brazil et al, 1980: 26) is defined as a property of the tone-unit
but is in many of the examples a property of a single accent contour. The
analysis of relative pitch is further complicated by different interpretations
of the notion of pitch raising. In one case, pitch movements may simply be
amplified so that top pitch is raised, and the bottom line stays where it is.
In another case, the voice could be re-tuned like a musical instrument, so
that as top pitch is raised, bottom pitch is raised in some proportion. (The
term key suggests the latter case.)
When adults talk to infants, and when people ask questions, the pitch is
often observed to rise within the total span. In either case the raising may
be accompanied by an appropriate change of voice quality, and by a facial
gesture involving an upwards movement of the eyebrows. Question signals
of this kind are observed in many languages other than English, and can
be very marked in some varieties of English, but they are rather neglected
in descriptions of RP.
The relative pitch of tone-unit and individual accent contours are gov-
erned by the use of an upward obtrusion in pitch as an attention device. If
there is no reason to call attention to an item, it is pitched slightly lower than
the previous item, so that successive tone-units step down, and if there are
several accents in one tone-unit, they step down in pitch. There are all sorts
of possible reasons for drawing attention to an item: highlighting a new tech-
nical term or the name of a book, finding an equivalent for bold type or in-
verted commas when reading from a printed text, or even an away win in a
football score. But one of the standard uses of the attention device is to group
items: a relative rise in pitch marks the beginning of a new group, and step-
ping down indicates continuation of an old group. Hence, ceteris paribus, we
would expect tone-units in the same locution, and accents in the same tone-
units to step down. Stepping down rarely needs to be explained; the question
is usually why the speaker steps up.

5. Syncopated Contours

In the discussion of the accent contour above, we assumed that the


rhythm and pitch movement are perfectly synchronised at the accent
Variable Strategies in Intonation 235

point. What we in fact described was the contour which is neutral with re-
spect to another variable. In fact, the tempo change can be anticipated or
delayed on the pitch contour, and the accent point typically follows.
If the accent point is brought forward, it is followed by the rise and the
fall, and this is what is known as the 'rise-fall tone'. The result is a synco-
pated contour, with the pitch peak just after the accent point. How soon af-
ter it comes is a sociolinguistic variable. For RP, it is usually shown on the
syllable following (e.g. O'Connor and Arnold, 1973: 17). But in Northern
English, and possibly outside RP generally, the peak is reached on the ac-
cented syllable itself, after the accent point; this pattern is in fact listed by
O'Connor and Arnold (pp. 9, 11) as a second possibility. In the case of the
RP type on a word like * excellent, the peak is reached on the second syl-
lable; unless the pattern is correctly interpreted, the accent (or 'stress')
might appear to have shifted to the second syllable. (In this connection it
is interesting to note that the English pronunciation of Calcutta might
well be a misinterpretation of the Bengali "Calcutta accent contours
get complicated, one really needs to know the language to interpret them
and recognise the accented syllable.) The effect of a delayed accent point
is to put it after the peak, and towards the end of the fall; this is probably
one of the things classed as a 'low fall' (.n).
The use of syncopation has to do with the relationship between the con-
tent of the tone-unit and the assumed content of the addressee's memory.
The neutral contour announces the new content irrespective of what is al-
ready there; the rise-fall indicates that the new content exceeds or extends
existing content, and the low fall that it does not (i. e. the low fall suggests
something like 'you already know this'). For a good illustration of a rise-
fall consider:
SI: You're fond of 'dahlias, aren't you?
S2: I'm Very fond of dahlias.
The nucleus falls on very as the variable in the frame (S2) is fond of dahlias,
and because very exceeds this, it takes a rise-fall.

5.1. Syncopated Rises


The neutral contour followed by a rise on the tail produces the familiar
fall-rise (\n). If the accent point is brought forward, the result is the rise-
fall-rise ('\n), and if it is delayed, the low rise (,n). Perhaps the only one
of these that is not obvious is the low rise. Low rises are usually shown
(e.g. O'Connor and Arnold, 1973) as rising on the accented syllable from
low pitch. With practice I find I can produce contours of this kind, but
they do not strike me as RP or even English. The patterns I naturally pro-
duce have a very rapid slump at the beginning of the accented syllable
which gives the auditory impression that the accent point is low; in the RP
type the pitch may begin to rise immediately, but in the Northern greeting
such as heJlo, Jove /ε,1θ3 ,ΐυν/, the pitch is low on /loa/ and rising on
236 G. Knowles

/lov/, and this could easily be mistaken by a sociolinguistically naive ana-


lyst as having a low head on /loa/ and a rising nucleus on /luv/.
We suggested above that the basic use of the rise or trough is to signal a
non-final tone-unit. A perennial problem is to explain the uses of the dif-
ferent kinds of rise. However, if we think of the syncopation and the rise
as separate gestures which are then combined, at least the beginnings of an
explanation emerge. Consider these two extracts, the first describing a
cleaner's job, and the second recalling a car accident:
(i) You just sweep the -floor, empty the ,bins, and sweep the , floors,
and dust the stables, wash the ,floors, and come 'out.
(ii) Well, we were coming home from "Bootle, social " club, few '.years
back, and as we came round the corner of Commercial " Road, a
lady stepped off the side in front of the V.van, and my husband
swerved to a " void her, well, the car went into a '.skid,. . .
Both examples are from Scouse, and it can be seen that the low rise
corresponds to the RP rise, but the equivalent of a fall-rise is a fall plus
trough. The addressee could be expected to know what a cleaner does, so
the delayed accent point was appropriate, giving low rises; the details of
the accident were entirely unknown, so that the neutral contour was ap-
propriate. The rise-fall-rise (or trough) is used to extend the field of dis-
cussion. Thus in a discussion of bad language in a Liverpool pub:
And er, you know, when I " ,first came here, of course you know the av-
erage Scottie Roader, the language is, I mean it's out of this world . . .
In this case the speaker signalled her intention to extend the discussion,
but in the event failed to do so.

5.2 Strategic Rises


Not all rises are of course non-final. The explanation of final rises is even
more difficult. The order of items in a complex locution is motivated by
various strategies (see e.g. Quirk et al, 1972, ch 14), and consequently
rises are associated with these strategies. These associations lead to the
use of the rise with a strategy even when the tone-unit in question is in fi-
nal position. Since there is no common meaning underlying the strategies,
there is no common meaning underlying the uses of the rise. The fact that
these uses are derived by association explains the formal problem, namely
that - unlike the patterns we have discussed so far - final rises may not
have their equivalents in other languages. Some of them do, but some of
them are specifically English. For that matter, correspondences among
English dialects tend to break down or become irregular in this area.
Items which link a locution to the preceding context tend to come early;
but if they are candidates for the low rise, they may be postponed until af-
ter the item with the fall, e. g. He's a good 'skin, JCharl ("Charles is a good
chap") so that the subject Charles retains the rise it would have had in its
Variable Strategies in Intonation 237

normal position before the verb. This fall-plus-rise can be very useful in
avoiding awkward or impossible syntax, e.g.:
GK: . . . for when we go camping. REK: I don't want a .tent.
Here a tent links the locution to camping, but a tent isn't wanted by me or
a tent I don't want would be rather odd, and the speaker (then aged 3.3)
avoided the problem by putting a tent in its syntactically preferred posi-
tion, marking it as the link with the rise. Many kinds of linking item - and
sentence adverbs, vocatives, subjects, time and place phrases, i/clauses etc.
- can be postponed and marked in this way.
Scouse normally uses the same pattern here, but the very inverse is
sometimes found, with a rise (") that stays up until the nucleus of the fol-
lowing tone-unit is reached, e.g. He be' longs to Liverpool, Bamber, Gas-
goigne, which I would intuitively translate as He be 'longs to Liverpool, Bam-
ber Gasgoigne. This is a very familiar pattern in Irish intonation. Here is
an example from a Liverpool Irishman:
. . . when I liked it was going away was mostly on the ' fishing trawlers
.mostly. . .
If we disregard the problem of analysing the syntax here, the RP equiv-
alent would be . . . on the "fishing trawlers, mostly.
Links only work if they are stored in the hearer's memory. Locutions
which as a whole are directed at some item in the addressee's memory of
the preceding context will often be found to have a rise. An example of
this is the yes/no question. These are used for a variety of purposes in real
life, but if one sits and thinks about them, typical constructed examples di-
rect a proposition to the hearer's memory to check its truth. This is a
candidate for the rise, and add to this the pitch raising for a question, and
the result is the stereotype yes/no question intonation.
These derived rises can combine with other patterns to form difficult
puzzles for the analyst. For instance, it may appear superficially that the
rise is a contradiction contour. Real examples are easily found:
GK: I'll get some muesli and some cocoa from the Co-op.
RK: We don't get muesli from the ,Co-op.
To interpret this, we need to observe that in most cases, positive and ne-
gative operators do not count as variables for the purposes of the parallel-
ism rule, and consequently there may be nothing in a contradictory locu-
tion to take the falling nucleus. By its very nature, a contradiction is a
link, and qualifies for the rise. Note that rises of this kind can be very
economical in answering questions with unacceptable presuppositions, of
the kind Have you stopped beating your "wife? or Are you completely stu-
pid? Here is a real example:
RK: Have we got any per"oxide?
GK: Why, are you going to do your "hair again?
RK: ,No.
The answer no on its own would leave unchallenged the presupposition
238 G. Knowles

you do your hair with peroxide. Rising'no can be expanded to something


like I'm not going to do my hair with per, oxide. The delayed accent point,
resulting in the low rise rather than the fall-rise, carries the added implica-
tion 'and you know perfectly well I don't'.

5.3. The Fall-Rise


The rise shows something of the care that is needed in interpreting intona-
tion patterns, and the level of generalization at which it must be done. To
reinforce this point, let us consider some possible misinterpretations. Take
the fall-rise in the following constructed examples:
(1) A: How are your parents? B: My "mother's fine.
(2) A: How are the children getting on? B: \Peter's getting married.
(3) A: What's wrong with Mary? Β: I don't know. "John's rather worried.
The most naive reaction to (1) is that mother has a 'contrastive stress'
that implies that B's father is ill or dead. In fact, Β says nothing about her
father, and could add . . but I don't know about my father, I haven't
seen him for twenty years'. A less naive, but still too particular analysis
would be that the fall-rise has a narrowing effect, narrowing parents to
mother; just as in (2) it narrows children to Peter. However, surely in (2) we
are entitled by the co-operative principle to assume that Peter is one of the
children, and this applies equally to (1), where the narrowing effect is rein-
forced by the semantic relationship between the words parents and mother.
The narrowing hypothesis does not work for (3). By the co-operative
principle we can assume that John is closely connected with Mary, and pos-
sibly her husband, brother or doctor. It just so happens in (1) and (2) that
we can work out the relationship more precisely because the items with
the fall-rise are probably members of a previously mentioned set.
We clearly must not confuse the meanings of words, and the working of
the co-operative principle, with the meaning of intonation. In (1), (2) and
(3) the subject is marked off from the rest of the sentence with its own
tone-unit, and this is signalled by the rise. The intonation shows this to be
the link, but says nothing of how it relates to what has gone before: the
hearer has to work that out for himself.

6. Gradient

We have already encountered gradient as a sociolinguistic variable in the


accent contour. It is also affected by a following trough on the tail. If we
were to investigate these contours in nore detail, we would find that the
gradient is sensitive to different numbers and types of segments on the
contour. Gradient is also related to the width of pitch movement (see e. g.
Crystal, 1969: 213-4). If the gradient of rises and falls in pitch were con-
Variable Strategies in Intonation 239

stant, it would be an easy matter to decide which was primary and which
was dependent: if width was primary, the longer the contour the greater
the width. But since gradient is clearly not constant, its relationship to
width remains an open question. For the present, I shall assume gradient
to be primary.
The falling part of the accent contour can be steep or gentle; the details
of the slope depend on whether it is basically of the 'concave' or 'convex'
type. In Scouse, the gradient can be made gentle to the point of being vir-
tually level (~n), e.g. And we're getting 'left here, where the tail on here is
not significantly lower than the accent point of left. The fact that it is a
gentle fall emerges with a very long tail, as in:
Was it "Yorkshire Hilda went to?
where went is audibly lower than York.
We argued above that the fall plus trough is equivalent to the traditional
rising tone, a kind of Irish rise that goes down. If we take the rise-fall and
level out the fall, we get an Irish fall that goes up (,"). This contour, which
steps up from one level to another, is extremely common in Scouse, e.g.
I'm interested in 'bingo; What 'time is it please?; it's ,~all kinds of people go
to it. Again, long tails will fall audibly. Almost identical patterns are found
in Irish intonation, especially South Ulster, in Glasgow, and perhaps New-
castle and Birmingham.
Irish falls shows up a major weakness in the British tone analysis.
(When I first transcribed them, my ear told me they were rises, and my
brain told me they were falls, and it was a long time before I could recon-
cile the two.) It is simply not good enough to trace the pitch movement; it
is essential to see how the segments fit the contour, and take account of
the rhythm and gradient. Now in segmental phonology it is a matter of
elementary knowledge that segments we class as 'voiced' are not always
produced with a full vibration of the vocal folds, and the part played by
segment duration, voice onset time, etc. are well understood. What I am
suggesting here is that we cannot expect a one to one match between in-
tonation patterns and up and down pitch movements. The fact that a
Northern English final / d / is more voiced than an R P one is a minor
phonetic detail, and is of no consequence when we come to the phonologi-
cal relationship between / d / and / t / . Similarly the fact that a gentle rise-
fall comes down a bit more in RP than in the Irish type is a mere detail: if
the British tone analysis insists that one is a 'rise' and the other a 'rise-fall'
then that analysis is fundamentally misconceived. The Irish fall is called a
'rise' by Jarman and Cruttenden (1976), which may be true as far as pitch
movement alone is concerned, but leads to totally unacceptable conclu-
sions at the level of universals. The suggestion that the Irish fall has 'no
special value' or used to have a value which has now been 'lost' (Bolinger,
1978: 510) is simply wrong. True, RP speakers and perhaps Americans are
not sensitive to its value, but that is an entirely different matter. If glosso-
240 G. Knowles

centric theories are unacceptable in linguistics, there is no place for


lectocentric theories in intonation.
Gradient would appear to convey attitude. Traditionally, of course, this is
supposed to be the job of intonation as a whole (e. g. O'Connor and Arnold,
1973), but in fact one has to get the gradient right in order to make sense of
the examples, and the wrong gradient can make nonsense of them. Gradient
seems to convey a variety of different but related attitudes in different cases,
and it is not easy to find a label sufficiently general to cover all cases. Perhaps
a steep gradient con fronts the hearer with the content of the tone-unit, where-
as a gentle gradient appeases. Steep gradients are likely to be used to inferiors,
gentle ones to superiors; requests call for a gentle gradient, imperative com-
mands a steep gradient. (Although the standard statement and command are
both said to have a falling tone, they do not necessarily have the same kind of
fall. Whatever the statement has, an imperative issued by a speaker exercising
his authority is likely to have a steep fall.) Note, incidentally, that a tape-re-
corded interview carried out by a reporter or research student is of its very
nature likely to elicit gentle gradients, and therefore Irish falls in dialects
which have them. Data based on such interviews can give the false impression
that falls are rare or non-existent in certain dialects (see, e. g. Jarman and
Cruttenden, 1976:11).

6.1. Variable Fall-Rises


The attitudinal effect of the fall-rise differs dramatically according to the
gradient, especially if the steep type is accompanied by a harsh voice qual-
ity. For instance, I might call my son Robert, with a fairly level accented
syllable, followed by a slump and rise on the second syllable, and this
might be the prelude to something like would you like an ice-cream?. On
the other hand, \ Robert, with a steeply falling accented syllable, is more
likely to be a warning of paternal wrath to come. Compare also the
door's open (so you can walk straight in) and the \ door's open (so you'd
better go and shut it).
We suggested above (3.3.) that the fall plus trough is the Scouse equiva-
lent of the RP fall-rise. More precisely it corresponds to the gentle fall
rise. Steeper contours do rise in Scouse, although not to the same extent
as in RP. For instance, contradictions - which confront the addressee with
incompatible information - have rises: I don't think she \ walked up the
stairs, or there's ' nothing in language. The semantically empty word like is
sometimes used in Scouse to carry the rise, thus I don't use it my, self, like,
or not every \ night, like. In other words, the gradient of the Scouse accent
contour is copied on the tail.

6.2. Calling Contours


The Scouse fall plus trough is remarkably similar to the calling contour
(Ladd, 1980: 169-179), except that the Scouse trough is at bottom pitch
Variable Strategies in Intonation 241

whereas the calling contour ends rather higher. It is noteworthy that call-
ing contours are appropriate in the same situations as gentle gradients,
and would be inappropriate in situations where steep gradients are used
(cf Ladd, pp. 175-6); more particularly, calling contours are used in the
same situations as gentle fall-rises. Now Ladd, following a common-sense
taxonomy based on pitch direction, refers to the calling contour as a styl-
ized fall: I would suggest that it is in fact a stylized fall-rise.

6.3. Rising Gradients


In the case of the delayed accent point, there is little movement between
the accent point and the beginning of the tail, and consequently little
scope for differences of gradient. The gradient of the movement just be-
fore the accent point takes over, and in the case of rising tails, this can be
measured by the depth of the trough from which the pitch rises. Mrs
Bracknell's a , handbag? conventionally rises from a deep trough, whereas
an invitation 'coffee?is more likely to start from mid pitch. With a head,
Have you finished with the , paper? is a genuine question if the trough is
shallow enough, but more likely to be interpreted as an indirect command
if paper rises from very low.
Deep troughs are typically followed by steep rises, so that the gradient
is continued over the tail. The steepness of the rise may be accompanied
by the feature crescendo: there is a marked increase of initiator power and
consequent loudness on the trailing syllables, e.g. ,morNING, ,soRRY.
This crescendo is the source of the prototypical 'silent stress' on a word
like . (tha)nkYOU.

7. Conclusion

The view of English intonation which has been sketched in outline above,
began as a set of crude hypotheses which contained an element of truth:
statements have falls, contradictions have fall-rises, and so on. These hy-
potheses have been continuously refined over a number of years in the
light of detailed study of data from several varieties of English, and in-
deed other languages. This sketch is restricted to what I judge to be the
central patterns of intonation; it is inevitable that as further data is ex-
amined, and more peripheral patterns included, many of the present sug-
gestions will have to be revised. It goes without saying that nobody yet
knows enough about intonation to give a correct or definitive account of
any part of it.
However, unless the diasystemic approach can be shown to be funda-
mentally wrong in principle, it does render a number of orthodox assump-
tions out of date. When linguists set up systems of contrasting phonemes,
it is reasonable to look for a set of contrasting 'tones' (Halliday, 1967;
242 G. Knowles

Brazil et al, 1980), which enter into 'tunes' or 'tone-units' which in turn
have an arbitrary relationship with their meaning. It is difficult to argue
for or against this view, because it is irrelevant, unless one makes it rele-
vant by forcing intonation patterns into prearranged categories in an en-
tirely Procrustean fashion. Tones are not a discovery but an invention. It
follows that there is no point in looking for the meaning of any tone, or
set of tones: it is a waste of time looking for a common meaning for the
set of rising tones.
In order to get at the meaning of intonation, we must investigate the
different patterns that modify the accent contour. Having identified syn-
copation, we can ask what the rise-fall and rise-fall-rise do as a class as
opposed to the low fall and low rise. The notion of gradient establishes
the most unlikely link between the Irish fall, varieties of the fall-rise, and
silent stress, and we can then ask how they are related in meaning. It is
difficult to see how the tone analysis can even lead to the right questions,
let alone find an answer. If the diasystemic approach does little more than
improve the quality of the questions, it will have been worth while.

References

Abercrombie, D. (1965). Studies in Phonetics and Linguistics. London: Oxford University


Press.
Bolinger, D. (1958). "A theory of pitch accent in English" Word 14: 109-49.
Bolinger, D. (1972). "Intonation is predictable (if you're a mind reader)", Lg 48: 633-44.
Bolinger, D. (1978). 'Intonation across language', inj. Η. Greenberg, ed, Universals of Hu-
man Language, vol 2, Phonology. Stanford University Press.
Brazil, D., M. Coulthard & C. Johns (1980). Discourse Intonation and Language Teaching.
London, Longman.
Brown, G. (1977). Listening to Spoken English. London, Longman.
Brown, G., K. L. Currie & J. Kenworthy (1980). Questions of Intonation. Croom Helm:
Crystal, D. (1969). Prosodic Systems and Intonation in English. London, Cambridge University
Press.
Halliday, Μ. A. K. (1967). Intonation and Grammar in British Englisch.The Hague: Mouton.
Jarman, E. & A. Cruttenden (1976). 'Belfast intonation and the myth of the fall', Journal of
the International Phonetic Association 6: 4-12.
Kingdon, R. (1958). The Groundwork of English Intonation. London: Longman.
Knowles, G. (1975). Scouse: the Urban Dialect of Liverpool. Unpublished Ph.D. thesis, Uni-
versity of Leeds.
Ladd, R. (1980). The Structure of Intonational Meaning. Bloomington and London: Indiana
U.P.
O'Connor, J. D., G. F. Arnold (1973). Intonation of Colloquial English. Second edition, Lon-
don: Longman.
Quirk, R., S. Greenbaum, G. Leech & J. Svartvik (1972). A grammar of Contemporary English.
London: Longman.
Steele, J. (1775). The Melody and Measure of Speech. London: Bowyer and Nichols.
M A N F R E D KRAUSE

Recent Developments in Speech Signal Pitch Extraction

1. Introduction

Measurement of fundamental frequency or pitch in speech signals has al-


ways received special emphasis in electronic speech processing. Represen-
tation of the temporal development of the speech melody, with varying
standards of precision, is of great importance f o r the various disciplines of
language research and speech processing, such as
- linguistic research (intonation, prosody)
- language training (speech therapy, foreign language acquisition)
- vocoders (speech encoding)
- speaker recognition (verification, identification)
- speech recognition.
Pitch extraction from speech signals has turned out to be an especially
difficult technical problem, since the complex structure of speech signals
derives from the co-operation of glottal excitation and temporal variations
in the shape of the vocal tract, and is therefore subject to a variety of in-
terference types. At the same time, the hearing process is involved, with
properties which must, at least to some extent, be taken into consideration
in the technical methods of analysis.
T h e net result of psychoacoustic research so far has been that pitch per-
ception is not to be equated with the perception of the fundamental fre-
quency of a periodic or quasiperiodic acoustic signal. Sensations of pitch
occur with other acoustic signals, such as narrow bandwidth noise, beat
tones with non-harmonic overtones and, in particular, acoustic signals
with non-existent or suppressed fundamental frequency. Perception of a
pitch is not always tied to a physically present oscillation, which led to the
coining of the term 'virtual pitch'. A detailed treatment of this topic is to
be found in Terhardt (1979). According to Terhardt, virtual pitch can be
derived by dividing all frequency components of a complex sound into
subfrequencies by integer numbers. Virtual pitch is established as the fre-
quency at which a majority of subfrequencies is found.
T h e algorithms are simultaneously the expression of a functional model
of hearing. Essential components of the functional model which concern
the identification of (virtual) pitch, are the spectral analysis of the acoustic
signal into critical bands, i.e. frequency bands in which the faculty of
244 Μ. Krause

hearing extracts loudness and timbre impressions from the acoustic signal,
as well as the suppression or masking and the dominance of certain fre-
quency components.
The temporal structure of acoustic events plays an important role in the
evaluation of sound by human beings. Unlike the technical acoustic ana-
lyzers which have been developed so far and which do not change their
properties during signal input, i.e. are time invariant, hearing is a time
variable analytic system with a storage facility.
That is, the result of the analysis depends on the history of and internal
relationships between the acoustic events. Depending on the duration and
development of the events, a human being is apparently able to analyse
their temporal and spectral structure more precisely by concentrating at-
tention on particular aspects; in this way he can apparently overcome the
theoretical limits of technical systems, which are expressed in an indeter-
minacy relation Äf.At = 1. By these means, pitch perception is already
fully developed after a few periods of the fundamental frequency; recog-
nition time is between 4 and 10 msec.
Technical pitch analysers require longer measuring intervals, between
about 20 and 50 msec. Because of the averaging process small pitch varia-
tions are lost. These small variations, however, are evaluated by the sense
of hearing as individual voice characteristics. Depending on the standards
required, differing degress of technical effort are required in order to cap-
ture the details.1
As analogue and digital signal processing techniques have been refined,
a variety of pitch measurement methods have been developed. These cover
both time domain methods which process the signal envelope shape, and
frequency domain methods, which process the signal spectrum.
It is evident that digital systems are attracting the greatest attention at
present. Digital signal processing offers considerable advantages against
analogue processing. The most important of these is that the acoustic data
can be stored for arbitrary lengths of time without degeneration and used
for subsequent numeric and comparison operations without random inter-
ference from noise (e.g. tapes or amplifiers). Since digitally represented
data are in strict temporal relationship by virtue of the sampling fre-
quency, which is lost in analogue-mechanical reproduction from tape, for
example, because of wow and flutter, methods in which signal values have

1
It has turned out that the low precision of pitch identification has proved to be an obstacle
in using high quality speech synthesis for speech output systems (reading devices, informa-
tion systems).
Pitch Extraction 245

to be repeatedly operated on, as in correlation analyses, belong to the spe-


cial domain of digital technology. 2

2. Pitch Analysis Techniques

2.1. Autocorrelation
In autocorrelation analysis (Dubnowski, 1976), a single frame of the
speech signal of duration At, which lies between 10 and 50 msec, is com-
pared with itself, whereby in effect a copy of the frame is systematically
delayed with respect to the original by a time Δτ. For each value of Δτ, the
autocorrelation value is calculated from the sum of the products of the re-
spective sample values of the original and the copy.
T h e highest value obtains when original and copy are co-extensive, i. e.
when the temporal delay is Δτ = 0. The envelopes of voiced sounds differ
little from one period to the next, of course, so that a maximum re-occurs
when the delay factor Δτ corresponds to one fundamental frequency pe-
riod. This fact is utilised in pitch extraction.
However, some signal envelopes contain side maxima in the autocorre-
lation function, which lead to errors in pitch measurement. These are nor-
mally the same errors as those which occur in older methods, namely oc-
tave leaps during sounds whith low frequency formants.
In more recent pitch extraction methods the attempt is therefore made
to suppress harmonics in the vicinity of the fundamental frequency by pre-
processing the signal envelope. Centre-clipping, in which low amplitude
values are suppressed, has proved to be a usable technique. If the signal is
simultaneously limited to the values + 1 and — 1, which amounts to peak-
clipping, calculations are considerably simplified. In this way, processing
speed can be increased to make real time analysis possible.

2.2. Cepstrum analysis


If the spectrum of periodic or quasiperiodic waves such as voiced speech
sounds is calculated, a series of more or less prominent spectral lines ar-

2 In digital signal processing the continuous waveform is converted periodically into sample
values. The individual sample values are then converted into a numerical value, usually bi-
nary. The sampling frequency must be at least twice as high as the highest required fre-
quency component in the signal. The number of binary places (bits) required for sample
value conversion depends on the dynamic range required. One binary place is required for
each 6 dB of dynamic range. In speech processing, it is often sufficient to use an 8 kHz
sampling frequency and 8 bits. For high-fidelity music transmission, for instance on digital
audio disks, about 40 kHz and 14 to 16 bits are necessary.
246 Μ. Krause

11 (1
©
.1! 1 11
i1
ι II ' - I
1 PRODUCTS © ©
01141014110114101 SUM = 22
1> 11
©
ill I II
<

II
ί
ACF ( τ = 0 ) = 2 2 / 2 2 = 1

\
11
Λ Λ ~ . Λ ~ PRODUCTS © ®
0-1 0 2 1 0-1 0 1 2 0-1 0 2 1 0-1 SUM = 6
\

ίΤΙί Πί
\

Uli'
ι ι

SHIFT τ = 2 ACF (τ = 2) = 6/22

Figure 1: Outline of autocorrelation

ranged in equal intervals along a linear frequency scale is obtained. 3 If the


spectrum is treated as a new waveform, another spectrum can be calcu-
lated. This spectrum is called the 'cepstrum'. (Noll, 1967). For periodic
waveforms it shows a particularly clear fundamental frequency marking,
since this was naturally periodic along the frequency axis of the first spec-
trum because of its integer multiples. Here, too, preprocessing of the ori-
ginal waveform and additional preprocessing of the spectrum can yield a
further improvement in fundamental frequency marking. The disadvan-

3 There are currently very efficient algorithms for spectrum calculation in the form of the
Fast Fourier Transform (FFT), which can be executed on large computers or on special
processors. There are now processors which, despite the enormous complexity of calcula-
tions, which increases with frequency resolution requirements, perform the calculation at
approximately real time speeds.
Pitch Extraction 247

w ΓΪ.
"1
cc-
THRESHOLD

: f — — — I - * - * * PC-
1
THRESHOLD
RF

C -o—o-

Figure 2: Outline of centre-clipping (a) and subsequent peak-clipping (b). The result is the
data sequence (c).

tage is the twofold spectrum analysis, each taking up considerable process-


ing time, which makes real time processing with simple equipment impos-
sible at present.

2.3. Linear Prediction


In the method of linear prediction, the next signal value is predicted from
a number of preceding signal values (Markel, 1976). This is done by
weighting the previous values with significance factors which are known
as prediction coefficients. Prediction coefficients are varied in such a way
as to minimise errors between predicted and measured signal values. T h e
structure used in calculating the prediction coefficients corresponds essen-
tially to the filter characteristics of the mouth and throat areas, the vocal
tract minus the nasal tract, so that a partial inverse filtering takes place. 4

* Linear prediction is now one of the most important digital vocoder methods.
248 Μ. Krause

SIGNAL

TIMEt

SIGNAL
SPECTRUM

FREQUENCY -
FICTIVE TIME — t '

CEPSTRUM =
SPECTRUM OF
SPECTRUM

FICTIVE FREQUENCY
= QUEFRENCY

COMPONENTS Τ COMPONENT OF
FROM FORMANT I HARMONIC SERIES
STRUCTURE

Figure 3: Principle of cepstrum analysis.

For this reason, the error function contains periodicities due to glottal ex-
citation which can be evaluated by analogue or digital means. Uncertain-
ties can occur in pitch extraction, since omission of nasal tract properties
means that not only fundamental frequency components are present.
Center-clipping and peak-clipping allow more favourable results to be
obtained with this method, too. By restricting calculations to a small num-
ber of prediction coefficients, usually 10-15, calculations can be kept rea-
sonably manageable, allowing real time operation.

2.4. Maximum likelihood analysis


In the maximum likelihood method, the speech waveform of a signal
frame of duration At is compared with a test waveform which is systemati-
Pitch Extraction 249

PREDICTION ERRORS

Figure 4: Structure of a linear predictor.

cally varied (Wise & al., 1976). Deviations within the frame are registered
and statistically evaluated. T h e test waveform with lowest deviation is
taken to be the most probable, and its fundamental frequency is taken to
be that of the tested speech waveform. T h e advantage of the method lies
in the relatively simple calculations which allow fast evaluation, although
on occasion a large number of test waveforms have to be processed.
Since the fundamental frequency of speech sounds only changes rela-
tively slowly, pitch variation can be tracked with a restricted number o f
test waveforms once analysis has been "locked" at the beginning of a
voiced sequence, thereby increasing processing speed.

2.5. Average Magnitude Difference Function (AMDF)


This method (Ross, 1974) is a variant of the autocorrelation method, with
the difference that instead of multiplying the temporally delayed signal val-
ues of the original and the copy the difference of these value pairs is
formed and these differences are summed over the time interval At. In
consequence, the sum reaches a minimum in periodic signals with delays
Δ τ which correspond to one period. This can be utilised for marking the
fundamental frequency. T h e method has the advantage of replacing more
time-consuming multiplication operations by subtraction, allowing an ap-
proximation to real time analysis.
Recent variants of the method also use signal preprocessing, for exam-
ple by peak-clipping, whereby the clipping threshold is optimised by trial
and error (Hettwer & Fellbaum, 1981).

2.6. Integration techniques


T h e fact that the mean value of the speech signal as a function of time is
zero (the non-alternating pressure component is ignored) can be used to
determine the fundamental frequency, since the mean value of voiced
speech signals over a single period is zero. T o be precise, this is only true
250 Μ. Krause

of static signal frames, and only approximate for transient processes. By


superimposition of overtones the temporal mean can also become zero
within one period, making it necessary to rid the signal of higher harmon-
ics. This is done by multiple digital low pass filtering.
Formation of the temporal mean, like low pass filtering, also represents
an integration of the signal in time. In digital processing, this corresponds
to the continuous summation of the sampled values, taking the arithmetic
sign into consideration. The method is therefore the digital version of a
well-known analogue integration method (Erb, 1974). Since rounding er-
rors accumulate over large numbers of sampled values the method cannot
remain error-free over arbitrarily long signals. Provision must be made for
zeroing the sum at intervals by appropriate means, preferably after each
correctly recognised fundamental frequency period.

2.7. Hybrid methods


The complex calculations needed for spectral analysis of the signal for
pitch extraction purposes has led to the idea of delegating this task to ana-
logue filters which perform this task in real time. The evaluation and deci-
sion processes which determine the correct pitch are executed on a digital
computer, which also, among other things, compensates numerically for
the frequency and amplitude sensitive error characteristics of the analogue
filters (Frokjaer-Jensen, n. d.).

2.8. Comparison of digital methods


The various methods of pitch extraction each introduce their own specific
types of error. A number of methods have been compared in a comprehen-
sive study (Rabiner & al, 1976). This confirms previous experiences that
some are more suitable for certain voices than others, and in particular
show qualitative differences with respect to male and female speakers. The
cepstrum method, for instance, is more suited to voices with low funda-
mental frequency, while autocorrelation is better for higher pitched
voices.
In addition, a pitch extraction method must be able to determine voiced
vs. voiceless states. In this respect, too, the methods differ. Good selectiv-
ity is provided by linear prediction and AMDF, while the cepstrum
method turns out to be less suited to this task.
Because of the differences in performance which are characteristic of
the different methods, it has been suggested that different methods should
be used in parallel in order to exclude errors, which in general do not oc-
cur simultaneously, by majority decision (Rabiner & al., 1976). But the
comparative study also showed that the hybrid method generates errors
which in some respects exceed error counts for the pure methods.
Pitch Extraction 251

3. Trends in the Development of Methods of Pitch Extraction

It follows from the preceding discussion that the last word has not yet
been spoken in the area of research into fundamental frequency measure-
ment in speech. Most digital methods are not yet suited to real time opera-
tion, in which results are continuously displayed with minimal delay due
to the calculation process. Special devices using microcomputers will be
available for this purpose in the near future. Assuming that no new
method is discovered for error-free pitch extraction (and research is con-
tinuing on this), error correction procedures based on other properties of
the speech signal such as formant or amplitude configurations will cer-
tainly be developed. Partial results in this area have already been obtained
(Kretschmar, 1980).
Many methods of pitch extraction require the speech signal to be of
high quality, that is, the fundamental frequency must be present in the sig-
nal. This is not the case, for example, with telephone transmission. Envi-
ronmental noise may not exceed a given level either, a condition which
cannot be met for many speech analysis purposes. This area requires com-
prehensive research in order to overcome these obstacles.
In conclusion, it may be said that there are several techniques which are
well suited to pitch extraction for investigations in the area of intonation.
It is necessary for the researcher in this area to be able to judge the quality
of the methods as far as possible in order to put them to optimal use for
his own purposes. At the present state of technological development he
can assume that "the" method does not exist, so that it is desirable to use
more than one method and to make his own analysis of doubtful cases, if
necessary, by observing the oscilloscope trace of the waveform, particu-
larly in the area of the transients between voiceless and voiced sounds.

References

Dubnowski, J. J. et al. (1976). Real time digital hardware pitch detector. IEEE Trans. Acoust.
Speech and Signal Proc. ASSP-24: 2-8.
Erb, H . J . (1974). Ein Verfahren zur Bestimmung der Sprachgrundfrequenz in Echtzeit. Fre-
quenz 28: 23-28.
Frekjaer-Jensen (n.d.). F-J Electronics ApS. Prospectus for the Pitch Computer Type PC
1400.
Hettwer, G. & K. R. Fellbaum (1981). Ein modifiziertes Sprachgrundfrequenz-Analysever-
fahren für lineare Prädiktionsvocoder. Fortschritte der Akustik 1981: 633-636.
Kretschmar, J. (1980). Untersuchungen zur Natürlichkeit synthetisierter Satzmelodien. Fort-
schritte der Akustik, DAGA 1980: 699-701.
Markel, J. D. & Α. H. Gray (1976). Linear prediction of speech. Springer: New York.
Noll, Α. M. (1967). Cepstrum pitch determination. J. Acoust. Soc. Am. 41: 293-309.
Rabiner, L. R., et al. (1976). A comparative performance study of several pitch detection al-
gorithms. IEEE Trans. Acoust. Speech and Signal Proc. ASSP-24: 399-418.
252 Μ. Krause

Ross, M . J . et al. (1974). Average magnitude difference function pitch extractor. IEEE Trans.
Acoust. Speech and Signal Proc. ASSP-22: 353-362.
Terhardt, E. (1979). Calculating virtual pitch. Hearing Research 1: 155-182.
Wise, J. D. et al. (1976). Maximum likelihood pitch estimation. IEEE Trans. Acoust. Speech
and Signal Proc. ASSP-24: 418-423.
D. ROBERT LADD

English Compound Stress

1. It has generally been assumed in descriptions of English at least


throughout this century that the stress patterns on expressions like those
in (1) are somehow a direct consequence of their syntactic structure (e.g.
Poutsma, 1914: 22; Trager & Smith, 1951: 67-77; Lees 1960: 120; Quirk et
al., 1972: 915, 1019; Chomsky & Halle, 1968: 91 ff.). In many cases (e.g.
greenhouse vs. green house), this can be attributed to the difference between
compound and phrase in surface structure; hence the common names
'phrasal stress' and 'compound stress'. This is the analysis formalized in
the Chomsky-Halle Compound Rule (shown in 2), which presupposes a
syntactic analysis such that 'compound' is defined as a branching structure
of the sort

X X

The treatment of cases like stiel warehouse vs. steel warehouse under this
analysis is somewhat obscure, since both seem to be noun-noun com-
pounds; here, however, reference is often made to deep syntactic differ-
ences - i. e. 'warehouse made of steel' vs. 'warehouse for storing steel' -
and, though details of such an analysis have never actually been worked
out, the assumption continues to be held that ultimately the whole pheno-
menon will be shown to depend on syntax at one level or another.
The tenacity of this assumption is quite remarkable in view of the exist-
ence of large numbers of problems such as those shown in (3), distinctions
which, in the words of Chomsky & Halle, are 'widely maintained but syn-
tactically unmotivated' (1968: 156). In general, analysts seem content to
This paper was presented at the 11th NELS meeting at Cornell University in November
1980. A number of colleagues at Cornell and Bucknell read an earlier version and gave me
much useful criticism, while both Ruta Noreika and the Wednesday night intonation seminar
at Penn (Fall 1980) gave me their reactions as the present version took shape. To all, my
thanks; blame only me.
254 D. R. Ladd

write off the exceptions to lexical arbitrariness - Chomsky and Halle sug-
gest the possibility of treating them in the 'readjustment component' - or,
in short, to take the syntax-based analysis as far as it will go and then fix
up the rest of the data ad hoc.

(1) Minimal Pairs


'Phrasal Stress' (weak-strong) 'Compound Stress' (strong-weak)
green house 'house that is green' greenhouse 'glass building for
growing plants'
French teacher 'teacher from French teacher 'teacher of French'
France'
steel warehouse 'warehouse made steel warehouse 'warehouse for
of steel' storing steel'
woman doctor 'female doctor' woman doctor 'gynecologist'

(2) Chomsky-Halle Compound Rule (1968: 92)

V - [1 stress] / [ # # X [ 1 stress ] y # #Z# #]NAV

(3) Typical Problem Cases for Compound Rule


'Phrasal Stress' 'Compound Stress'
apple pie (cf. apple cake)
chocolate cake
town meeting (cf. faculty meeting)
Franklin Stove (cf. Skinner Box)
Madison Avenue (cf. Madison Street)
student union (cf. trade union)
ballpoint pen (cf. fountain pen)
French Toast
city hall
whisky sour
barefoot doctor 'Chinese paramedical person'
weekend warrior 'army reservist'

As long as the number of the leftovers is not overwhelming, the basic hy-
pothesis about the relation of syntax and prosody is effectively unfalsifi-
able.
My goal in this paper is not to try to patch up the syntactic analysis, but
simply to abandon it and present an explanation of a different kind. As I
will show, this explanation predicts the existence of the exceptions to the
syntactic treatment and accounts for the types of cases in which they oc-
cur. The paper is divided into two parts: first it shows how compound
stress is not just a footnote to the normal stress rules, but part of the
English Compound Stress 255

larger phenomenon of deaccenting; then it goes on to discuss a large


amount of data to which the analysis applies.1
2.1 It is important to present as background the outlines of the general
view of stress that the analysis presupposes. This is the view developed in
Ladd (1980), which is a combination of a more or less Hallidayan concep-
tion of the function of stress with the Liberman-Prince theory of its pho-
nological form (Halliday, 1967; Liberman & Prince, 1977). Specifically,
we need an illustration of the way these two views work together to pro-
vide a clear account of the particular type of marked or non-'normal'
stress often known as deaccenting. This is seen in the following dialogues:

(4) a. A: Has John read Slaughterhouse-Five?


B: No, John doesn't read books.

b. A: Have you talked to John recently?


B: No, I can't ständ the man.

The stress patterns in Speaker B's replies in these exchanges would often
be called 'contrastive'. Yet it is obvious that the meaning of B's reply in
(4a) is not something like 'John doesn't read books, he burns them' - that
is, it is not contrastive in any explicit sense. Instead, the point of the stress
pattern is to move the stress off books, to deaccent it and refer it to the
context.
The Liberman-Prince theory makes it possible to represent deaccenting
very elegantly as the simple reversal of the s and w assigned to a given pair
of sister nodes in the rhythmic structure. Thus, the normal stress on B's
reply in (4a) (John doesn't read books) would be represented as shown in
(5a). For the deaccented version (John doesn't read books), we simply re-
verse the circled nodes in order to put the w on books; if the contrastive
version were intended, (John doesn't read books, he burns them) we would,
in effect, reverse the circled nodes in order to put the s on read. What the
Liberman-Prince representation makes plain is that there is only a single
phenomenon of marked stress, with contrastive and deaccenting as two
different functions (5b).2 (Notice the Hallidayan viewpoint at work in the
notion of 'functions' of a stress pattern.)
1 A number of caveats must be entered into the record at this point. First: I will be discussing
only compounds whose head is a noun (complex nominals, in Levi's term), but the ap-
proach, if not the specific analysis, can be extended to cover other cases as well. Like Levi,
I find no difference relevant to my concerns here between compounds where the first mem-
ber (henceforth the 'attribute') is also a noun and those where it is an adjective. Second: I
am ignoring differences in the weaker or less-stressed half of the compound, differences
often analyzed as distinctions between 'secondary' and 'tertiary' stress, e.g. long island vs.
Long Island (Trager and Smith, 1951: 69) or butter cup 'cup for butter' vs. buttercup 'type of
flower' (Kingdon, 1958: 195). This decision is based in part on the implicit claims of the
Liberman-Price stress analysis, but it also follows most earlier studies of compounds. Ulti-
256 D. R. Ladd

(5) /-R-

W W
•®cS \
®>
John doesn't read books John doesn't read books,
a. 'normal' b. 'marked' (deaccented or
contrastive)

T h e view of stress presented here also makes it possible to talk of deac-


centing applied within constituents smaller than the sentence. This is seen
in a pair of examples from Schmerling (1976: 5 5 - 6 ) :

(6) a. This is the döctor I was telling you about, ('normal')


b. This is the doctor I was telling you about, ('normal' in medical con-
text)

T h e problem that Schmerling points out is that both of these are in some
sense 'normal stress'; out of context (6a) seems 'normal', but (6b) seems
just as normal in the context of a hospital or a medical convention. Con-
fined as she is to the Trager-Smith-Chomsky-Halle view of normal stress
as a merely automatic consequence of the syntax of a sentence, Schmer-
ling is prepared to use examples such as these as the basis for abandoning
the notion of normal stress altogether.
But to do that would be to throw away a valuable concept. Indeed, the
first step toward treating this puzzle is to take 'normal stress' in the Hal-
lidayan sense of the stress pattern that signals an unmarked focus. 3 This
makes it possible to speak of both stress patterns as normal, in the sense
that both convey the focus this is NP. This focus is reflected in the rhyth-
mic structure by the fact that at a higher level in the tree both versions, as

mately, of course, an explanation will have to be given for these distinctions as well. Third:
It is well known that there is a certain amount of individual and dialect difference in as-
signing stress patterns to compounds. The data here reflect my own speech, but I have
checked with other informants to avoid basing my statements on some idiosyncratic usage.
In particular, I have checked not only individual terms, but, in accordance with the analysis
presented here, pairs and groups of items as well (e. g. I have checked chocolate cake and ap-
ple cake together and find that many speakers make the distinction noted here. Question
marks next to individual items in the data tables indicate those items in which there seems
to be considerable disagreement about the stress pattern.
2 Any discrepancies between standard Liberman-Prince trees and these are intentional, but
cannot be justified here.
3 For stress and focus see Halliday, 1967; Chomsky, 1971; Jackendoff, 1972; Wilson &
Sperber, 1979; Ladd, 1980.
English Compound Stress 257

expected by the normal stress rules, have the s assigned to the rightmost
NP:

(7)

This is the doctor I was telling you about.

With the focus assigned, we can go on to assign either marked or normal


stress within the strong N P constituent; doctor is either weaker or stronger
than telling depending on whether it is or is not deaccented to refer to
some medical context. Thus:

(8) a.

the doctor I was telling you about


\
b. S

the doctor I was telling you about

In short, the answer to Schmerling's puzzle is simple: sentences can exhibit


both normal and marked stress at different levels of structure simulta-
neously.
2.2 This is the idea to be applied to the problem of compound stress.
Specifically, my thesis in this paper is that compound stress represents the
deaccenting of the head of the compound. Thus the normal or unmarked
stress for the type of structure in e. g. green house would be as follows:

W S
They live in a green house, ('normal')

The reverse of this could be contrastive

(10) / \
s w
They live in a green house, not a grey one.
('marked' — contrastive)
258 D. R. Ladd

or, as in other cases of marked stress, it could also represent deaccenting,


as in:

(11)
s w
I grew them in a greenhouse, ('marked' — deaccented)

As I just showed, the deaccenting can apply within the compound without
affecting the focus information conveyed at a higher level in the rhythmic
structure of the sentence; that is, compound stress can be treated as
marked or non-normal without in any way implying that it is thereby im-
possible for it to occur in a sentence with 'normal stress'.

2.3 At this point it is worth spending a paragraph or two to explain why it


is specifically deaccenting that I think is involved in compound stress. As I
showed in Ladd (1980), deaccenting cannot be seen simply as e.g. a syn-
tactic rule that interacts with the normal stress rules in cases of corefer-
ence. In fact, it occurs in a wide variety of situations, and must be treated
as making some independent semantic/pragmatic contribution to the in-
terpretation of the sentence, like Hallidayan 'normal stress'. Unfortu-
nately, space permits only a two-sentence summary of my earlier findings;
the interested reader is referred to Ladd (1980: Chs. 3 & 4) for more de-
tail. In brief, what deaccenting signals is that some specific reference to
the context is necessary for a full or exact interpretation of the deaccented
constituent. The actual details of the inference made in individual cases,
such as 'coreference' or 'this is a medical context', are left to pragmatic in-
terpretive strategies.

This meshes very well with recent work on the semantics of compounds
by Downing (1977)*, Kay and Zimmer (1976), and Dowty (1979). What
distinguishes these writers from earlier generative work on compounds
(notably that of Lees [1960, 1970], Levi [1978], and Mötsch [1970]) is that
they do not seek to explain the specific relationships seen in compounds
by positing some sort of underlying predicate relation between the two
parts of the compound. (For instance, steel warehouse is not represented as
being underlying 'warehouse for steel', nor apple tree as derived from 'tree
with apples'.) Instead, they posit a single general compounding relation-
ship that leaves the specific relation to be inferred on the basis of the indi-
vidual lexical items involved. To put it another way, the compound con-
struction does not corfvey an explicit meaning that fully determines the

4
While Downing's experimental study was primarily concerned with the creation of novel
compounds, she found little support for the underlying-predicate approach to compound
semantics; I do not feel that I distort her findings by including them here.
English Compound Stress 259

interpretation of each compound, but only a rather inexplicit set of guide-


lines, as it were, for pragmatically inferring an interpretation.
Relevant quotes from Kay and Zimmer, Dowty, and Downing are the
following:
The prototypic use of nominal compounds is to narrow the semantic
coverage of the head noun to a smaller class. (Kay and Zimmer, 1976:
4)
A novel compound αβ denotes some set (exactly which one we do not
know) such that all members of this set are ß's and are typically asso-
ciated by some appropriately classificatory relation to an a. (Dowty,
1976:319)
The speaker tends to create the compound on the basis of a parameter
significant for his categorization, rather than merely his description, of
the entity in question. (Downing, 1977: 838)
The common thread running through these is something like the fol-
lowing: The compound construction signals that there is some relation be-
tween the attribute and the head which is relevant for classifying or cate-
gorizing the head, not merely describing it; a compound thus names some
entity or category distinct from the entity or category named by the head
alone.
This meshes very nicely with the function of deaccenting as described
above. In general, deaccenting signals that some specific reference to the
context is essential for a full or correct interpretation of the deaccented
constituent; specifically in the case of a compound, the deaccenting of the
head signals that in order to determine the category named by the com-
pound, the head must be understood in the light of what Dowty calls the
'appropriately classificatory relation' between it and the attribute. In green
bouse, for example, nothing special is signaled about the interpretation of
house in this context; house is more precisely described, but not newly sub-
categorized. In greenhouse, on the other hand, house is deaccented to sig-
nal that it contributes only part of what is necessary for identifying the
new category of things named by the compound as a whole.
3.1 The hypothesis just presented is a fundamentally different type of
analysis from the traditional description of compound stress. One of the
reasons that the traditional description cannot account for exceptions is
that, in effect, it cannot account for the regularity either. That is, it sug-
gests no particular explanation of why compounds should be stressed one
way or the other; it merely states an observed correlation between syntax
and prosody. The analysis proposed here, by contrast, suggests an actual
reason for this correlation, namely, a certain congruence between the in-
formation conveyed by the stress pattern and the information conveyed by
the compound relation itself, as just illustrated with the case of green house
and greenhouse.
One way to test this explanation, then, should be to see whether excep-
260 D. R. Ladd

tions to the traditional rule exhibit some kind of mismatch between what
the compound relation and the stress pattern convey. If my explanation is
correct, then compounds with phrasal stress ought to be cases where the
information conveyed by the deaccenting would be somehow inappropri-
ate - say, cases where any subcategorizing effect of the attribute is rela-
tively small. I will discuss three groups of cases which I think show this
quite clearly.
3.2 The first set involves place names like those shown in (12). We
might predict that these would take phrasal stress, since the head (Avenue,
Road, etc.) is in no sense subcategorized by the attribute: Madison Avenue
does not name a particular type of avenue, Olin Library does not denote a
special category of library, the Golden Gate Bridge is a bridge, etc. As the
data in (12) show, the prediction of phrasal stress on these is largely borne
out. There are, however, a few nouns that are deaccented in such com-
pounds: street, house, town, land, and perhaps a few others. Considering
these each in its own general semantic group, though, one can see that
they are always the least specific or least marked. In city thoroughfare
names, for example, we get at least vague expectations about the nature of
the thoroughfare being named from most of the possible head nouns - we
would expect an Avenue or Boulevard to be wide or important; a Road
probably leads out of town; a Place or a Crescent is probably residential;
and so on. Street, however, gives us no such information. It could be State
Street, in the heart of downtown, or it could be Dogwood Street in some
quiet suburb. There is, in other words, a real sense in which we do get less
information about the category of things being named from Street than
from any of the others, and hence more from the attribute; this is more
typical of ordinary compounds, and is exactly what is signalled by the
stress pattern. 5
Comparable observations can also be made about the cases in (13), in
which the head is the proper name of the inventor or discoverer of the en-
tity or category named by the compound. The case of disease names is
typical here: the relatively vague Syndrome and Disease (like Street) are
deaccented but more specific words like Chorea and Palsy are not. While I

5 Quite some time after presenting this paper, I discovered that both this phenomenon and
its explanation have been noted by non-linguist native speakers, as can be seen from the
following passage:
"Why, in speaking of thoroughfares,' asked a correspondent of John o'London's Weekly in
1936, 'is it the custom to accent the proper name only in the case of a street? It is always
Fleet street, Southampton Street, but Shoe lane, Farrington road, Fetter lane.' The paper's
lexicographer, Jackaw, answered: 'In a town the great majority of thoroughfares are streets;
street, therefore the expected word, needs no emphasis, and the stress goes on the street's
name. Lanes and roads, being much less common, these words are naturally given at least
equal stress with their distinctive names; convenience begets habit.'
('The Street and the Stress', John o' London's Weekley, April 18, 1936, cited by Mencken,
1948.)
English Compound Stress 261

cannot go through each of these cases in detail, it is nonetheless important


to emphasize the nature of the prediction being made: the analysis does
not claim to be able to make predictions about individual cases, which is
what the traditional analysis purports to do, but only implicational predic-
tions about groups of cases. If Syndrome and Disease and Street actually
worked like all the others in their respective groups, the validity of the an-
alysis would not be affected. The analysis predicts only that if one or two
members of a particular semantically related group of head nouns are
deaccented, they will be the least marked or least specific. Thus it is only if
Palsy were deaccented and Syndrome were not that we would call the anal-
ysis into question or look for some further factor.
3.3 A second set of cases (shown in 14) involves the classification of cu-
linary terms. As can be seen from just three cases - chocolate cake, apple
cake, and apple pie - it is futile to try to explain the exceptions to the tradi-
tional Compound Rule in terms of individual lexical items, since apple can
be either stressed or unstressed in attribute position, and cake can be either
stressed or unstressed in head position, depending on the compound.
Moreover, since all three seem to represent an underlying relation Β made
of A, the stress cannot be explained in Levi- or Lees-style syntactic terms
either. Instead, what seems to be involved here is classification in terms of
what one might call 'flavors' vs. 'categories'.
Things to eat often come in a variety of flavors - ice cream, milk
shakes, sandwiches, and souffles are all examples. For most purposes in
the culinary taxonomy, the different flavors all count as 'the same'; that is,
in the terms we have been using to discuss compounds and deaccenting,
naming the flavor further describes, but does not further categorize. This
is why many of these culinary compounds have phrasal stress. In chocolate
cdke and apple pie, in other words, cake and pie are the categories, and
chocolate and apple are merely flavors. In apple cake, on the other hand, we
do have a different category: the deaccenting signals something like 'this
thing is cake only to the extent circumscribed by something else in the con-
text, namely, apple'. The effect of the deaccenting here is thus like what
we saw in greenhouse.

(12) Compound Place Names


'Phrasal stress' 'Compound stress'
Madison Avenue State Street (downtown)
Trumansburg Road Dogwood Street (suburban)
Maple Drive
Kingsford Crescent
Marvin Gardens
Park Place
Olin Library Eastman House (Rochester mu-
seum)
262 D. R. Ladd

Morrill Hall Blair House (U. S. Govt. Official


Guest House)
Gannett Clinic Andrews House (Brown Univ. In-
firmary)
Johnson Museum Faunce House (Brown Univ. stu-
dent union)
McGraw Tower Dunster House (Harvard dorm)
Rockefeller Center
New York City London Town (big)
Enfield Village Middletown (little)
Tompkins County
New York State
Baffin Island Baffin Land (old name for Baffin
Island)
Cayuga Lake Marie Byrd Land (section of Ant-
arctica)
(the) Charles River Chicagoland (area around Chi-
cago)
(the) Atlantic Ocean Disneyland (California amuse-
ment park)
(the) Sahara Desert
Golden Gate Bridge
Walt Disney World
(the) Erie Canal
Shea Stadium
Fenway Park
Penn Station
Harvard Square
Schoellkopf Field

(13) Compounds with Proper Names in Attribute Position


'Phrasal Stress' 'Compound Stress'
Halley's Comet (the) Van Allen Belts
Planck's Constant (?) (the) Peter Principle
Grimm's Law (?) (the) Sapir-Whorf Hypothesis (?)
(the) Monroe Doctrine (?)
Occam's Razor
Huntington's Chorea Downs' Syndrome
Bell's Palsy Parkinson's Disease
Franklin Stove Skinner Box
Coleman Stove Allen Wrench
Morse Code Plimsoll Line
Gutenberg Bible
Phillips (Head) Screwdriver
English Compound Stress 263

(14) Culinary Compounds


'Phrasal Stress' 'Compound Stress'
apple pie mud pie (?)
blueberry pie apple cake
cherry pie carrot cake
chocolate cake coffee cake
vanilla ice cream peanut butter (?)
strawberry ice cream apple butter
cheese souffle sweet roll
chocolate souffle egg roll
lemon souffle jelly roll
grilled cheese sandwich ice cream sandwich (?)
peanut butter & jelly sandwich tomato sauce
lemon sherbet hot sauce
raspberry sherbet Worcestershire Sauce
coffee milk shake white sauce
whole wheat bread date and nut bread
rye bread zucchini bread
NB: stress on ice cream varies - what is indicated above is stress on
the whole word ice cream without regard to which syllable.

If this seems too facile, there is a simple pragmatic test that seems to
suggest that the distinction between flavors and categories is a real one. If
the head of such a compound can be inserted into the frame ' D o you want
a ?' or ' D o you want some ?' without misleading the ad-
dressee about what is being offered, then the attribute is a flavor. For in-
stance, 'Do you want a sandwich?' is fine even if all the speaker really has
available is, say, a cheese sandwich. On the other hand, if both the attri-
bute and the head must be included in order not to mislead the addressee,
then a separate category is involved; Do you want some bread? is decidedly
infelicitous if what the speaker has in mind to offer the addressee is ba-
nana bread. The reader is invited to try this test on the data in (14); while
the results are not 100% consistent with the stress patterns, the correlation
is quite considerable.
3.4 The final group of cases is provided by expressions where the head
names an artifact of some sort, and the attribute names the material of
which it is made. In general, these also have phrasal stress, as shown in
(15). This suggests that in these cases, as in those involving culinary fla-
vors, the category named by the compound is essentially the category by
the compound is essentially the category named by the head alone. To put
it another way: the material of which an artifact is made, generally is not
relevant for classifying or categorizing it.
There is independent evidence for this in Downing's study of the crea-
tion of new compounds. She suggested that 'naturally existing entities
264 D. R. Ladd

(plants, animals and natural objects) are typically classified . . . on the ba-
sis of inherent characteristics; but synthetic objects are categorized in
terms of the uses to which they may be put. This would seem to correlate
with the fact that synthetic objects are typically created with some goal in
mind, while natural entities generally are not' (Downing, 1977: 831). In
those few cases of (15) which do have compound stress, it seems for the
most part - e. g. glassware, leather goods, gingerbread man - that the ma-
terial really is relevant for specifying the category being named.

(15) Material-plus-Artifact compounds


'Phrasal Stress' 'Compound Stress'
paper bag glassware
cardboard box leather goods
silver candelabra gingerbread man
gold watch cedar chest (?)
tweed jacket aluminium foil (?)
wool suit
cotton shirt
steel warehouse (made of steel)
silk stockings
carbon steel
glass jaw
tin ear
silk purse
wooden nickel
3.5 At this point we are in a position to explain the minimal pair steel
warehouse / steel warehouse. Since, to repeat Downing's words, we are
more likely to categorize synthetic objects on the basis of the uses to
which they may be put rather than on the basis of inherent characteristics,
it follows that we categorize warehouses according to their intended con-
tents, not the material of which they are made. Thus we interpret steel
warehouse as 'warehouse made of steel', because the stress pattern tells us
that no subcategory is being named, whereas we interpret stiel warehouse
as one for storing steel, first because the stress pattern tells us that ware-
house is indeed being classified into some subcategory by steel, and second
because Β for storing A is a reasonable classificatory relation to infer be-
tween those two nouns. N o underlying syntactic difference or abstract
predicate need be posited to explain the interpretations here; they follow
quite simply from inferences based on what we as speakers know about
stress and about compounds.
Once again it is important to emphasize the relative or implicational na-
ture of the prediction made by the analysis presented here. I believe it is in
principle impossible to predict stress patterns in individual cases solely on
the basis of the two lexical items involved, or solely on some underlying
English Compound Stress 265

syntactic relation between the two. The relevant factor is whether the at-
tribute categorizes or merely describes the head; to determine that, we
may have to consider individual cases against the background of other
possible attributes or other possible heads. Both apple cake and steel ware-
house represent Β made of A, but in the case of cake, the fact that it is made
of apple categorizes it, when compared to other possibilities, whereas for
warehouse, the fact that it is made of steel only describes, especially when
compared to other possible relations between the two lexical items ware-
house and steel.
4. The foregoing analysis of stress patterns in compounds has several
points of interest. First, it explains rather than merely describes the rough
correlation between compound syntax and so-called compound stress.
Second, it makes the description of English simpler, by removing com-
pound stress from the cases to be covered under 'normal stress' and sub-
suming it under the independently needed rubric of deaccenting. Third, it
tends to provide independent confirmation of analyses of compounds like
Dowty's which have a relatively impoverished semantics and a richer prag-
matics, and gives no support to generative models like those of Levi and
Lees. Finally, it may be possible to turn the analysis around - as in the case
of 'flavors' vs. 'categories' - and use it as a tool for investigating taxono-
mies and markedness relations in the structure of the lexicon. For all these
reasons I think it provides some genuine new insight into an intractable
old problem. 6

' Limitations of space make it impossible for me to do more than mention the existence of
two complicating factors. First is the likelihood that any treatment of the semantics of
compounds must distinguish between the 'ordinary' semantic opacity in a compound like,
say greenhouse, and the semantic opacity involved in what may best be described as idioms,
such as white elephant, French letter 'condom' (so also a number of other expressions involv-
ing ethnic slurs), swan song, wallflower, etc. (Note that both stress patterns are found in
these.) Levi (1978: 11-12) argues for just such a distinction in connection with the semantic
opacity of compounds. The implications of this for the analysis presented here are not en-
tirely clear.
The second complication is that purely phonological factors are sometimes involved to at
least some extent in determining compound stress patterns. At least two types of cases
come to mind. First, there is a tendency to stress very long compounds farther to the right
than might otherwise be expected (e. g. travel expense reimbursement voucher, not travel ex-
pense reimbursement voucher, or maple syrup container distributor, not maple syrup container
distributor). Second, it is likely that the leftward shift in short, common compounds such as
oatmeal and ice cream (which are still pronounced oatmeal and ice cream by conservative
speakers) is related to the general leftward shift in nouns in general (e. g. cigarette, still pro-
nounced cigarette by conservative speakers). One might say that such cases are being
treated in effect as non-compounds. This explanation is entirely consistent with the fact
that many monomorphemic .words in present-day English are known to have arisen from
earlier compounds (e. g. daisy < day's+ eye, hussy < house+ wife, sheriff < shire+ reeve).
266 D. R. Ladd

References

Chomsky, N. (1971). Deep Structure, Surface Structure, and Semantic Interpretation. In


Steinberg & Jakobovits (eds.), Semantics: An Interdisciplinary Reader in Philosophy, Linguis-
tics, and Psychology. Cambridge: Cambridge University Press. 183-216.
Chomsky, N., and M. Halle (1968). The Sound Pattern of English. New York: Harper & Row.
Downing, P. (1977). On the Creation and Use Of English Compound Nouns. Language 53:
810-842.
Dowty, D. (1979). Word Meaning and Montague Grammar. Dordrecht: D. Reidel.
Halliday, M.A.K. (1967). Notes on Transitivity and Theme in English (Part II). Journal of
Linguistics 3: 199-244.
Jackendoff, R. (1975). Morphological and Semantic Regularities in the Lexicon. Language
51: 639-671.
Kay, P., and K. Zimmer (1976). On the Semantics of Compounds and Genitives in English. Un-
published paper. Univ. of Berkeley: California.
Kingdon, R. (1958). The Groundwork of English Stress. London: Longmans.
Ladd, D. R. (1980). The Structure of Intonational Meaning: Evidence from English. Blooming-
ton: Indiana University Press.
Lees, R. B. (1960). The Grammar of English Nominalizations. IJAL 26: Publication 12.
Lees, R. B. (1970). Problems in the Grammatical Analysis of English Nominal Compounds.
In Bierwisch & Heidolph (eds.). Progress in Linguistics. The Hague: Mouton 174-186.
Levi, J. N. (1978). The Syntax and Semantics of Complex Nominals. New York: Academic
Press.
Liberman, M., and A. Prince. (1977). On Stress and Linguistic Rhythm. Linguistic Inquiry 8:
249-336.
Mencken, H. L. (1948). American street names. American Speech 23: 81-88.
Mötsch, W. (1970). Analyse von Komposita mit zwei nominalen Elementen. In Bierwisch &
Heidolph (eds.). Progress in Linguistics. The Hague: Mouton. 208-223.
Poutsma, H. (1914). A Grammar of Late Modem English, Part II. Groningen: Noordhoff.
Quirk, R., S. Greenbaum, G. Leech, and J. Svartvik (1972). A Grammar of Contemporary Eng-
lish. New York & London: Seminar Press.
Schmerling, S. F. (1976). Aspects of English Sentence Stress. Austin: University of Texas Press.
Trager, G., and H. L. Smith (1951). An Outline of English Structure. Norman, Oklahoma:
Battenburg Press.
Wilson, D., and D. Sperber (1979). Ordered Entailments: An Alternative to Presuppositional
Theories. In Oh & Dinneen (ed.). Syntax and Semantics, vol. 11 (Presupposition). New
York: Academic Press. 299-323.
HANS-HEINRICH LIEB

A Method for the Semantic Study of Syntactic Accents*

0 Introduction

It has always been a vexed question in the study of syntactic accents ('sen-
tence stress', 'contrastive accent' etc.) how to find a method by which the
semantic effects of accents can be established beyond vague generalities.
The adequacy of any method can be judged only after the following ques-
tion has been answered: What ARE the semantic effects of syntactic ac-
cents? In a recent study on 'Accent and Meaning' (forthcoming) I adopt
the following answer:

(1) For each occurrence of a syntactic accent in a sentence there is


(i) a speaker belief associated with the occurrence whose content
involves a propositional attitude of the hearer, and there may
be
(ii) a second doxastic attitude of the speaker (not necessarily belief)
that is associated with the occurrence of the accent and whose
content does not involve the hearer.

I have no space here to defend or further explain this view. Given this
conception, I was confronted with the problem of finding a method by
which the attitude/content pairs associated with accent occurrences could
be established in a systematic way. I developed the method characterized
in this paper, the METHOD OF DIALOGUE SCHEMATA, whose basic
features are as follows.
Expression (2) denotes a dialogue schema, or rather, set of dialogue
schemata:1

(2) A. Ich bezweifle, daß die Frau gekommen ist.


Β. (1) Der Mann ist gekommen.
(2) Der Männ ist gekommen, nicht die Frau.2

* Adapted from Sections 1.3 and 1.4 of Lieb (forthcoming)


* Adapted from Sections 1.3 and 1.4 of Lieb (forthcoming)
1
In Lieb (forthcoming) the language in which syntactic accents are studied is German. I
here keep the German examples.
2
A. I doubt that the woman has come.
Β. (1) It is the man who has come. (2) It is the man who has come, not the woman.
268 H . - H . Lieb

Very roughly, a dialogue schema is a pair of sets of sentences. The sec-


ond set contains a sentence with AT M O S T O N E accent occurrence and
may contain expansions of this sentence. Each sentence of the first set ren-
ders explicit a certain propositional attitude. In an actual dialogue based
on the schema, an utterance of a sentence of the second set may or may
not be a 'proper response' to an utterance of a sentence of the first set. If
it is, we may attribute to the speaker of the sentence from the second set
the belief that the hearer has the attitude that is made explicit in the sen-
tence from the first set. Moreover, if utterances of the expanded sentence
and utterances of the expansions are proper responses to utterances of the
same sentences we conclude that the expansions only make explicit what is
part of the meaning of the expanded sentence. If certain additional re-
quirements are met, these meaning parts plus the previously established
speaker belief are identified as semantic effects of the accent occurrence in
the expanded sentence, or of lack of accent if there is N O accent occur-
rence. The method of dialogue schemata consists, very roughly, in con-
structing appropriate dialogue schemata and eliciting judgments from na-
tive speakers on the 'response compatibility' of sentences of the second set
with sentences of the first set; these judgments are then used for hypo-
theses that eventually lead to hypotheses on the semantic effects of syntac-
tic accents in a language.

1 The Method of Dialogue Schemata: Basic Concepts

1.1 The concept of dialogue schema

A language or language variety D is taken simply as a set of 'idiolects' (in a


defensible sense), each with its own system or systems S.}
A pair {A, Β) of sets is a DIALOGUE SCHEMA in an idiolect system S
if, and only if, A is not empty, S is a system of a 'spoken' idiolect, and the
following conditions (a) to (e) are satisfied.

Condition (a). A consists of STRUCTURED INTERPRETED SYN-


TACTIC UNITS (SISU) of S, that is, of triples (f,s,u) such that / is a syn-
tactic unit or concatenation of units of S; s is a syntactic structure of / in
S; and u is a meaning that / h a s in S given structure s and some assignment
of word meanings to the words that occur in f
More specifically, A consists of interpreted structured sentences of S.

3
I will use variables of various kinds, all of them italicized letters with or without numerical
subscripts. To keep the degree of formality as low as possible I will interpret the variables
only by informal hints in the text. For the same reason concepts will be introduced by de-
finitions that are formally as unassuming as possible.
Syntactic Accents 269

Condition (b). Β consists of STRUCTURED PARTLY INTER-


PRETED SYNTACTIC UNITS (SPISU) of 5, that is, of triples (f,s,v)
such that / a n d s are as before but ν is not a meaning but a POSSIBLE
MINIMAL BASE for meanings of f. This concept is to be understood as
follows:
A MINIMAL BASE for a meaning determines the referential, the pro-
positional and some of the 'illocutionary force' aspects of the meaning. All
other aspects are left unspecified; in particular, the semantic effects of ac-
cent occurrences as characterized in (1) are not represented in meaning
bases. (For a definition, cf. Lieb forthcoming: [112c].)
A POSSIBLE minimal base is an entity 'of the same formal type' as a
minimal base but which, as a matter of fact, may not be a base for any
meaning of the sentence. (Cf. Lieb forthcoming: [112b].)
In contradistinction to the elements of set A, no element of Β has a com-
plete meaning as one of its components; it is exactly the completion of
meaning bases by accent effects that we wish to study by means of dia-
logue schemata.

Condition (c). There ist exactly one function e of the following kind.
(i) The arguments of e are parts / of syntactic units such that each / is
an argument of e if and only if for some (f,s,ü) in A or some (/^s^v) in B,/2
is a 'primitive' constituent of / ( o f / ) relative to the syntactic structure s
(ij). (A primitive constituent is one that doesn't contain any other consti-
tuents.)
(ii) If fa is an argument of e, e(/) is a lexical meaning of f2 in S.
(iii) If / and / are arguments of e and f2 and / are 'occurrences' of the
same word or words, e(^) = e(/).
(iv) For each (f,s,u) in A, if e1 is the subfunction of e whose arguments
are parts of / then u is a meaning of /relative to f,s, e, and S.
(v) For each {fi,sl,ύ) in B, if e1 is the subfunction of e whose arguments
are parts of / , then there is a meaning t^ of / relative to / , ely and S.
Intuitively, function e assigns lexical meanings everywhere in the dia-
logue schema where lexical meanings can be assigned. (On our conception
there are no 'meaningless' primitive constituents but some contituents may
have an 'empty' meaning.) The same word in different places is assigned
the same lexical meaning (condition [iii]: this guarantees constancy of lexi-
cal meanings throughout the entire dialogue schema. The meaning of a
sentence in A is based on the assignment of lexical meanings, and for the
partly interpreted sentences in Β it is possible to obtain sentence meanings
based on the assignment of lexical meanings. (Conditions [iv] and [v]; the
concepts of sentence meaning and meaning base are understood as in Lieb
forthcoming: [100] and [112].)

Condition (d). Every sentence in A has the syntactic-semantic form:


270 H . - H . Lieb

E G O ATTITUDE T H A T p, where E G O is an expression of S for self-


reference by 'the speaker', such as ich in German or a morphological
marking of first person singular, ATTITUDE is a verb of S that denotes a
propositional attitude such as English believe, doubt, want etc.; T H A T is
an expression of S like English that which introduces the formulation of a
content of the attitude (in case there are such expressions); and ρ is a for-
mulation of a content of the attitude. E G O and ρ are identical for all sen-
tences in A.

Condition (e). There is exactly one (f,s,v) in Β such that: (i) there is at
most one accent occurrence in ./given structure s, and (ii) every other ele-
ment ΟίΛ,ίΟ of Β is a PERMISSIBLE EXPANSION of (f,s,v) in S in the
following sense.
There is a SPISU (£,s2,v2) of 5 such that (f^s^ = (f,s) 'preceded by' or
'followed by' v1 is ν 'plus' v2, where "plus" denotes a certain seman-
tic operation on meaning bases; and for any meaning u, if is a base for u
and (f,suu) is uttered, the speaker expresses a propositional attitude
whose content is partly formulated by the {flysz,v2)-part {fyh^i) and
that involves the adressee of the utterance only if the addressee is expli-
citly referred to. (The concept of permissible expansion is exemplified by
[2] and [1] in [2B]).

In this informal definition of "dialogue schema" two unique entities


have been assumed:
By the LEXICAL INTERPRETATION in S of a dialogue schema in S
we understand the unique function e postulated in (c) that assigns lexical
meanings to words of the sentences of the schema.
By T H E C E N T R E in S of a dialogue schema in S we understand the
unique element (f,s,v) of its second component that satisfies condition (e).

1.2 Notation for sets of dialogue schemata

Expression (2) is not a dialogue schema but the orthographic name of a


schema. More correctly (2) denotes a SET of dialogue schemata (A,Β) in
some German idiolect system (here left unspecified); this set is defined as
follows. (The rest of this subsection, though important for theoretical rea-
sons, is not needed for an intuitive understanding of [2] and may be omit-
ted on a first reading.)
Each orthographic word in (2) uniquely denotes a phonological word of
the idiolect system. The left-to-right sequence of orthographic words in a
single line of (2) denotes the sequence of phonological words denoted by
the orthographic words.
Each line of (2) - orthographic words, punctuation signs, and accent
names ("v") taken together - denotes the set of pairs ( f s ) such that / i s de-
Syntactic Accents 271

noted by the left-to-right sequence of orthographic words in the line and s


is a syntactic structure of f in the idiolect system that is compatible with a
fixed traditional interpretation of the punctuation signs, a presupposed
syntactic analysis of the idiolect system, and a fixed interpretation of the
accent names. Lack of accent names means lack of accents. ('Syntactic
stress', which is different from accent, is left unmarked.)
For each line of (2) there are semantic specifications in the context of
(2) (in particular, the translations in fn. 2). These specifications partly de-
termine the meanings or meaning bases allowed for each pair (f,s) that is in
the denotation of the line. Each line of (2) T O G E T H E R W I T H its se-
mantic specifications denotes the set of triples (fs,ü), for a line in (2A), or
the set of triples (f,s,v), for a line in (2B), such that (f,s) is in the denotation
of the line and u (or v) is allowed for (f,s) by the semantic specifications of
the line.
It is assumed that all denotations are non-empty.
Expression (2) excluding "A", "Β", "(1)", and "(2)" denotes the set of
dialogue schemata (A,Β) in the idiolect system such that: (i) each element
of A is a triple {f>s,u) in the denotation of a line in (2A), and for each line
in (2A) there is only one triple in its denotation that is an element of A; (ii)
each element of Β is a triple (f,s,v) in the denotation of a line in (2B), and
for each line in (2B) there is only one triple in its denotation that is an ele-
ment of B.
The expressions "(2)", "(2A)", "(2B)", "(2B[1])", "(2B[2])" (the last four
without "2" where this is supplied by context) are T H R E E F O L D
AMBIGUOUS, (i) They are used to refer to the name of the set of dia-
logue schemata and to its parts, (ii) They are used to refer to the sets de-
noted by the name or its parts. (In this sense, "(2A)" denotes the set of
first components of dialogue schemata that are in the denotation of ex-
pression (2); "(2B[1])" denotes the set of triples (f,s,v) that is denoted by
line [1] in (2B), etc.) (iii) They are used to refer to individual elements of
the sets denoted by the name or its part. Disambiguation is by context.
A dialogue schema in an idiolect system S is of limited interest if there
are no corresponding schemata in the systems of all idiolects of a given
language or language variety.

1.3 Dialogue schemata for sets of idiolects

A dialogue schema in a single idiolect system may be representative of a


whole set of idiolects in the sense that there are 'corresponding' schemata
in the systems of all idiolects in the set. This idea will now be made more
precise.
Consider a structured interpreted unit ( t ) in an idiolect system Si-
On our conception the syntactic structure contains a 'constituent struc-
ture' and a 'marking structure' that are complex constructs of syntactic
272 H . - H . Lieb

categories of Each category is a set; it may be a set of syntactic units of


Sj. For instance, Non-Group-in- Su the set of noun groups of 5 t , may be
such a category. In a different idiolect system S2 we may again have
Noun-Group-in- S2 as a category. The two categories are 'analogous' in an
obvious sense: if we assume a general relation ' / is a noun group in S\
then the two categories are the sets of first-place members of the relation
that belong to and S2, respectively, as second-place members.
I shall assume concepts of analogy of the type . in St is ANALO-
GOUS to . . . in S2', where S1 and S2 are any idiolect systems and . . is any
formal or semantic category of St and . . . any formal or semantic category
of S2. It is impossible to explicate these concepts in the present context.
Basically the explication is to proceed as indicated in our 'noun group' ex-
ample. For each category there is at most one category in S2 to which it
is analogous.
Analogous categories may still be different sets. This may well be the
rule. Therefore, structured sentences in different idiolect systems will
normally be non-identical even if the idiolects belong to the same lan-
guage: however similar a sentence ( f x , s x , % o f S1 may be to a sentence
{f2,s2,u^j of S2, if the structures and s2 involve the categories Noun-
Group-in- St and Noun-Group-in- S2) respectively, and these are different
sets, the nonidentity carries over to the structures and s2, hence, to the
two triples. On the other hand, the triples may be identical except for the
differences between analogous categories. We informally introduce the
following notion:
Let 5t and S2 be idiolect systems. Let (f,s,ii) be an interpreted structured
syntactic unit of THE Si/Sj-VERSION of (f,s,ü) = the triple Οί,ίι,«ι)
obtained form (f,s,u) by replacing in f , s, and u each category of S1 by the
analogous category in S2 if there is such a category; otherwise Οί,^,^ι) is
an appropriately chosen 'empty' entity. (The definition for SPISU's (f,s,v)
is analogous.) Obviously, the St/52-version of (f,s,u) need not be a SISU
of S2; similarly, for the S^ ^-version of (f,s,v). Even if the ^-version
of (f,s,v) is a SPISU of S2, it may be an 'unsuccessful' one; there may be no
'completion' of the St/S2-version, in the following sense:

(3) Let (f,$,v) be a structured partly interpreted syntactic unit of S.


(.f,s,u) is a COMPLETION of {f,s,v) in S iff
a. (f,s,u) is a structured interpreted syntactic unit of S;
b. for all e, if u is a meaning of /relative to f , s, e, and S, then wis a
base for u relative to f , s, e, and S.

(e is an assignment of lexical meanings to words occurring in f . For the


concepts of meaning and meaning base, cf. again Lieb forthcoming: [100]
and [112].)
Syntactic Accents 273

Using the concepts of version and completion, the notion of dialogue


schemata in an idiolect system may be extended to SETS of idiolects such
as languages and their varieties:

(4) (A,B) is a DIALOGUE SCHEMA I N 5 FOR D


iff
a. 5 is a system of an idiolect in D;
b. (A,Β) is a dialogue schema in S;
c. for every system S1 of any idiolect in D,
(i) the S/ Sj-version of every (f,s,u) ε A is a structured inter-
preted syntactic unit of S t ;
(ii) the 5/5"χ-version of every {f,s,v) ε Β is a structured partly in-
terpreted syntactic unit of
(iii) there is a completion in Sl of the S/ S t -Version of every
(f,s,v) ε Β.

Intuitively, definition (4) singles out those dialogue schemata in a given


idiolect system that are REPRESENTATIVE of an entire set of idiolects,
such as a language: for whose sentences there are exact correspondences
in the systems of any idiolect of the set. This holds not only for the (partly
interpreted) sentences in the second component of the schema but also for
the sentences in its first component; this is important in view of the use we
wish to make of dialogue schemata; every speaker of the language is to be
a competent judge for either part of the schema both as a hearer and as a
speaker: he must be able to answer the question whether the sentences in
the second part of the schema are 'response compatible' with the sentences
in the first.

1.4 Proper dialogue schemata

The notion of 'response compatibility' is based on the notion of 'proper


response'; this notion is taken as basic.
The proper response relation holds between triples (V,u,V,), where Vis
a speech event produced by speaker Vt and Μ is a meaning of ^intended
by Vy·. ' Vwith u as produced by V^ is a proper response to V2 with u^ as
produced by V3'.
Proper responses will not be explicitly characterized except for the fol-
lowing informal assumption:

(5) Postulate for the proper response relation. If Κ with u as produced by


Vy is a proper response to V2 with u^ as produced by V}, then
a. Vi correctly understands u^ with respect to V}; and
b. u is motivated by V^'s understanding of Uy and by V^'s beliefs
about Vy that are based on F t 's understanding of t^.
274 H . - H . Lieb

The notion of response compatibility is construed as follows:

(6) Let be a structured interpreted syntactic unit of Sl and


a
{fvhyHl) Sisu of S2.
in S
Ifuhti i is RESPONSE COMPATIBLE with {f2,s2,u^ in S2 iff
for every V, Vlt V2, and Viy if
a. Kis a normal utterance by Vx of (/ί,ί1} «χ) in S^
b. V2 is a normal utterance by Vj of {f2,s2,u^ in S2,
c. Vis a reaction by Vt to V2,
d. Vi addresses Vto V3,
then
Vwith u as produced by Vi is a proper response to V2 with u^ as
produced by V}.

(For the concept of normal utterance, cf. Lieb 1979: Sec. 2).
'Proper' dialogue schemata are, very roughly, those schemata in which
every completion of each element of the second component is response
compatible with each element of the first component, regardless of idio-
lect system:

(7) (A,B) is a PROPER DIALOGUE SCHEMA in 5 for Z>iff


a. (A,Β) is a dialogue schema in S for D;
b. for every ΟίΛ,^), {f^u^, Su S2, if
(i) St is a system on an idiolect in D,
(ii) S2 is a system of an idiolect in D,
(iii) (Xii,«i) is a completion in S1 of the S/S^-version of some
element of B,
(iv) {f2,s2,u^j is the S/S2-version of some element of A,
then
(A,s1,u1) in Si is response compatible with (f 2 ,s 2 ,uJ in S2.

Three different idiolects S, Slt and S2 may be involved in a proper dia-


logue schema. System S is the one to which the sentences in the two com-
ponents actually belong. In a practical application, this may be a system of
a researcher who is also a native speaker. When (7) and (6) are seen in
combination, system is associated with a speaker of (completions of
S/Si-versions of) sentences in the second component, i. e. of the partly in-
terpreted sentences that contain accent occurrences. S2 is associated with a
speaker of (S/ S2-versions of) sentences in the first component that make
explicit propositional attitudes and to whose utterances the S^speaker
reacts.
Syntactic Accents 275

2 Exemplification and Comments

2.1 Steps 1 to 5

Every method is characterized by a series of procedural steps. I will not


describe these steps in abstracto but give a schematic example from which
the general nature of the steps may be inferred.
The starting-point will not be single sentence (structured and partly in-
terpreted) but a set of sentences. This is due to the fact that in studying ac-
cents we concentrate on the pitch properties of intonation. Intonation
structures of accented sentences will therefore be specified only with re-
spect to pitch. Since on our conception each syntactic structure contains
an intonation structure component, individual syntactic structures of ac-
cented sentences cannot be completely specified. We therefore consider
not an individual sentence but a set of structured (partly) interpreted sen-
tences that differ in non-pitch properties of their intonation structures.
Correspondingly, we must consider a set of 'equivalent' dialogue schem-
ata, not a single schema.

Starting point: The German idiolect system S* and B* = the set of


structured partly interpreted sentences (f,s,v) of S* such that:
(i) f = der mann ist gekommen;
(ii) s is such that 'downward contrastive accent' Ο occurs on mann and
no other accent occurs anywhere else;
(iii) / is a declarative sentence of S* relative to structure s;
(iv) ν is a possible minimal meaning base that involves, very roughly,
'adult human male' as a lexical meaning of mann; 'come' in a literal sense
as a lexical meaning of gekommen; and 'empty' meanings for der and ist.
(ν) the meaning of der mann is such that the speaker refers to exactly
one person in any normal utterance of (f,s,u) if ν is a base for u. (This is
only a rough outline of a definition of "B*".)

Step 1. A set of dialogue schemata (A,B) in S* is defined such that


(i) the centre of each (A,Β) in S* (cf. end of Sec. 1.1 for "centre") is an
element of B*;
(ii) each element of B* is the centre of some (A,Β) in the set;
(iii) certain other conditions are satisfied which would make the set of
schemata 'permissible' in the sense of Lieb (forthcoming: [129]).
Let us accept (2) as a name of this set. Line (Bl) in (2), "Der Männ ist
gekommen.", denotes the centre of each dialogue schema in the set.

Step 2. (2) ist adopted as a set of dialogue schemata in S* for German,


either after appropriate testing or on the basis of a more general assump-
tion.
276 H.-H. Lieb

Step 3. The linguist selects native speakers of German (informants) and


makes sure that they correctly understand the sentences of the schemata
(i. e. correctly identify the S*/S-versions of the sentences in their own
German idiolect systems S, including the semantic component of the ver-
sions).

Step 4. The linguist presents the informants with the pairs of sentences
((Bl), (A)), ((B2), (A)) of each schema (cf. [2]) and elicits for each pair
judgments on the response compatibility of the first sentence with the sec-
ond. This step requires great care.
The linguist may have taped normal utterances of each sentence by a
(the) speaker of idiolect system S*, produced in succession for the sen-
tences of each pair. This serves only for identification of the sentences
(taping may already be required by Step 3); it must not be mistaken as
production of a dialogue, which it is clearly not.
If orthographic representation is chosen, it is CLASSES of sentence
pairs that are presented (cf. Sec. 1.2); in this way 'irrelevant detail' can be
abstracted from right at the beginning. (On the other hand, the linguist
cannot be certain that not relevant detail is lost in the abstraction process.)
Elicitation of judgments. There are two different kinds of judgments that
may be elicited. In the first case, each informant is made to pass judg-
ments on the following question: Is every completion of the version of
the/a (B)-sentence in the German idiolect systems of the informant re-
sponse compatible with the version of the/an (A)-sentence in every Ger-
man idiolect system he knows of or can imagine? In the second case, the
question is generalized: Is every completion of the version of the/a (B)-
sentence IN EVERY German idiolect system he knows of or can imagine
compatible with the version of the/an (A)-sentence in every such system?
Naturally, the judgments cannot be elicited by asking such questions. In
the first case, an appropriate question might have a form such as: 'Imagine
you are having a conversation with another German. He tells you at one
point when this is appropriate: [replaying of recording of (A)-sentence, or
joint characterization of all (A)-sentences]. Would it be just normal for
you to continue by [replaying of recording of selected (B)-sentence, or
joint characterization of all selected (B)-sentences], provided that this is
what you believe and what you see fit to tell the other person, and tell him
in this tone of voice?'4
In the second case, an appropriate question might be of the following
form: 'Imagine two Germans are having a conversation. One tells the
other: [as before, for (A)-sentence]. Would it be just normal for the other

4
The final qualification is intended to make the informant abstract from intonational differ-
ences between the (A)-sentences and (B)-sentences that are unrelated to accent manifesta-
tions.
Syntactic Accents 277

to continue by [as before, for (B)-sentence] provided that this is what he


believes and what he sees fit to tell the first person, and tell him in this
tone of voice?' After presenting one of the questions the two recordings or
sentence characterizations may be repeated in succession.
What the native speakers judge is the existence or non-existence of a
proper response relation between utterances of the sentences taken either
as sentences of their own German idiolect systems or as sentences of arbi-
trary German idiolect systems. The "just normal" part of the questions
must be a formulation that agrees as closely as possible with the postu-
lated properties of the proper response relation.
The judgments elicited from the speakers are recorded; they exemplify
the EMPIRICAL DATA for the semantic method of dialogue schemata.

Step 5. The judgments are evaluated and a hypothesis is formed concern-


ing the status of (2) as a set of P R O P E R dialogue schemata in S* for Ger-
man.

Suppose that (2) is accepted as such. We then proceed as follows.

2.2 Steps 6 and 7

As pointed out in Sec. 0, the relevance of the method of dialogue schem-


ata depends on the theoretical assumption that the semantic effects of syn-
tactic accents are as postulated in (1), i. e. consist in the contribution of at-
titude/content pairs to sentence meanings. The next step exemplifies how
such pairs may be identified.

Step 6. The status of (2) as a set of proper dialogue schemata in S* for


German is used to identify attitude/content pairs of the following kind:
a. they are as required in (li) or (Iii);
b. they may be posited only on the hypothesis that (2) is a set of proper
schemata;
c. they do not involve any lexical meanings not present in B* (accents
do not introduce lexical meanings).
There are at least two pairs that satisfy these conditions:

(8) a. Belief/there is a non-man whose coming is doubted by the


hearer.
b. Belief/there is a non-man who has not come.

These pairs are established as follows (The following formulations are


informal. A more precise formulation would heed the distinctions made in
definitions [7] of "proper dialogue schema" and [6] of "response compati-
ble". All utterances are to be 'normal' ones, cf. [6].)
278 H.-H. Lieb

Consider any dialogue schema in (2). In any utterance of (A) the


speaker explicitly expresses the propositional attitude of doubt towards
the coming of the person to whom he is referring by die frau in his utter-
ance. 5 Assume an utterance of (B2) that is a reaction to the utterance of
(A) and is addressed to its speaker. As (B2) is response compatible with
(A), the speaker of (B2) correctly understands the meaning of the utter-
ance of (A) (cf. [5]). The speaker of (B2) is therefore entitled to the belief
that the speaker of (A) doubts the coming of the person he is referring to
by die frau in his utterance, and the speaker of (B2) will normally have this
belief. On a proper construal of the proper response relation it should fol-
low from the compatibility of (B2) and (A) that the spaeker of (B2) refers
by die frau in his utterance to the same person that the speaker of (A) re-
fers to by die frau in his. By uttering (B2) the speaker of (B2) expresses his
belief that this person is a woman. He will therefore normally have the be-
lief that there is a woman whose coming is doubted by the speaker of (A),
who is addressed by the speaker of (B2). It is obvious from the utterance
of (B2) that the speaker believes that no woman is a man. Thus, in his ut-
terance of (B2) the speaker will normally have the belief that there is some
non-man whose coming is doubted by the hearer, which takes us to the at-
titude/content pair (8a) with respect to (B2). Now (Bl) is also response
compatible with (A). We conclude that the speaker of (Bl) will normally
have the belief that there is some non-man whose coming is doubted by
the hearer also in an utterance of (Bl) that is a reaction to an utterance of
(A). We thus associate the attitude/content pair (8a) with (Bl). (8b) is con-
nected with (B2) in an obvious way without involving (A), and is then as-
sociated with (Bl) by the argument used for (8a).
The attitude/content pairs in (a) is all we can get on the basis of dia-
logue schemata (2), and (8) is obviously not yet satisfactory. As a matter
of fact, an improved version of (1) would have ruled (2) out to begin with.
First, (2) is too specific with respect to the propositional attitude of the
hearer. We should use only dialogue schemata that allow the checking of
arbitrary propositional attitudes (Step 4). This may seem hard to achieve,
given an indefinitely large number of propositional attitudes; a solution to
this problem is indicated in Lieb (forthcoming: Sec. 5.3). Second, the
schemata in (2) may be incomplete with respect to relevant expansions of
their centres; there may be expansions that justify a belief that is stronger
than (8b): whose content has (8b) as a logical consequence.
The dialogue schemata in (2) are only an abbreviation of the schemata
in Lieb (forthcoming: [135]), and these are indeed satisfactory. They
establish two different attitude/content pairs which it is more reasonable
to associate with (Bl) in (2).

5
We assume a meaning of die frau such that the speaker refers to exactly one object by die
frau in any normal utterance of the sentence.
Syntactic Accents 279

We have not yet shown that the attitude content pairs (8) - if accepted -
should be taken as semantic effects of the accent occurrence on mann. For
this, an additional step is required.

Step 7. The pairs identified in Step 6 are compared with the pairs obtained
by applying Steps 1 to 6 to some set Bi of triples (f,s,v) such that: f = der
mann ist gekommen; s is a structure identical with a structure t h a t / h a s in
B* except that given s, either mann has an accent other than downward
contrastive accent or a word other than mann has downward contrastive
accent; and ν is as in B*.
Put in a nutshell, we change either the accent or the accent place and
see what happens. If different attitude/content pairs are obtained, both
the old and the new pairs are tentatively taken as semantic effects of the
relevant accent occurrences. Otherwise, Step 7 is repeated.

This concludes our exemplification of the method. Generally, the


method is applied in seven steps, referred to as 'Step 1', 'Step 2' etc., which
are exemplified by the seven steps in the above example. Various aspects
of the method are further developed in Lieb (forthcoming: Sec. 5.3).

2.3 Comments

Comment 1. The method is dependent both on theory and on empirical


fact; it is sound if the semantic effects of syntactic accents are correctly
viewed as in (1).

Comment 2. The method is not a discovery procedure, for a number of


reasons.
(i) The initial choice of dialogue schemata is crucial. N o rigorous rules
were formulated to govern this choice. Further restrictions are put on the
initial set of schemata in Lieb (forthcoming: Sec. 5.3.4), but even they do
not guarantee unique choices.
(ii) The basic relation of proper response requires further study and ex-
plication. The less it is restricted the greater are the uncertainties in actual
application of the method.
(iii) The results of Steps 1 to 6 must be taken as tentative until all syn-
tactic accents have been studied for all types of accent occurrences. Again,
there is no discovery procedure for establishing the overall set of atti-
tude/content pairs.

Comment 3. The attitude/content pairs associated by Step 6 with an EX-


PANSION of a dialogue centre may not automatically be associated with
280 H . - H . Lieb

the CENTRE: I have not been able to completely exclude the possibility
of expansions that introduce 'extraneous' attitude/content pairs which as
a matter of fact should not be included in meanings of the centre.
(i) An attitude/content pair is N O T associated with the set of centres if
it is ruled out by the following test: For any completion of a centre, any
utterance of the completion by the speaker, and any response by the
HEARER to the utterance, if the hearer assumes in his utterance that the
speaker has the attitude towards the content, then the HEARER's utter-
ance is not a proper response to the speaker's utterance.
(ii) An attitude/content pair IS associated with the set of centres if it
passes the following test: For any completion of the centre, and any utter-
ance of the completion by the speaker, there is a proper response by the
HEARER to the utterance in which the hearer assumes that the speaker
has the attitude towards the content. (Both conditions [i] and [ii] are suffi-
cient ones. The two conditions extend, so to speak, dialogue schemata by
allowing not just for a simple exchange Α-B but for an exchange A-B-A.)

Comment 4. The method does not entail the view that the semantic effects
of syntactic accents are exclusively 'discourse phenomena'. On the con-
trary, it accepts view (1) by which the semantic effects of an accent occur-
rence essentially consist in contributing attitude/content pairs to sentence
meanings. The method exploits the possibility of discourses of a special
type. A speaker uses a sentence with a meaning by which the speaker must
have a certain belief concerning a hearer attitude, and he uses this sen-
tence AFTER an explicit statement by the hearer that he (the hearer) actu-
ally has this attitude. We may thus get at necessary speaker assumptions
by means of a discourse. However, the assumptions are required by the
meaning of the speaker's sentence; they are not required by the hearer's
previous utterance. Put differently, we study phenomena in the domain
of sentence meaning by means of discourses; these phenomena do not be-
come discourse phenomena by the fact that they can be so studied. True
enough, they are extremely important for the structuring of discourse but
this does not force us to exclude them from sentence meaning. I take the
position that discourse structure should be partly explained by sentence
meaning; my conception of sentence meaning is construed accordingly.

Comment 5. The method of dialogue schemata would not be suited for


isolating attitude/content pairs of the speaker that are not expressed in
proper responses to utterances of the addressee. However, I have been un-
able to discover such pairs. The most likely candidates would be pairs as-
sociated with 'all-new' utterances (where the speaker assumes that he is
saying something completely new to the hearer); it turns out that this situ-
ation is covered by the method (cf. Lieb forthcoming: [175], Comment).
Syntactic Accents 281

Comment 6. The method combines elements of two different informal ap-


proaches to the study of accents that have been frequently used in the lit-
erature. One is the investigation of non-contrastive accents by question-
answer pairs, the other the study of contrastive accents via expansions of
the sentences in which they occur. It can now be seen why these methods
can be partly but not completely successful.
A question of the usual kind is a request for additional information on
what is already partly known or believed to be true. In asking a question,
the speaker expresses propositional attitudes and their contents which are
connected either with the request for information or with the speaker's
doxastic background. The person who answers the question may assume
that the addressee of his answer has the attitudes expressed by the ques-
tion and may in turn express this assumption by using a certain accent.
Thus, a question-answer pair may be used to establish attitude/content
pairs as assumed in (li). On the other hand, only a small number of pro-
positional attitudes are associated with questions, which disqualifies the
question-answer method as the only method for accent studies.
Similarly, a speaker may express explicitly by an expansion of a sentence
an attitude/content pair that would remain implicit in an utterance of the
sentence. But such expansions cannot make explicit the attitudes with ad-
dressee-oriented content that are studied by the question-answer method;
thus, the expansion method is also inadequate as the only method used.

Comment 7. The method of dialogue schemata requires systematic varia-


tion of accents and accent places over given syntactic units or concatena-
tions of units (Step 7). This limits the usefulness of studying recordings of
spontaneous speech: even with a huge data collection, a study based on
such material can only approximate the systematic variation required by
Step 7. On the other hand, recordings of elicited material may be used in
Steps 3 and 4 (presentation of sentence pairs), and recordings of actual
discourse are useful for evaluating hypotheses on the overall semantic ef-
fect of a given accent.

Comment 8. The method allows for the following situation AS A LIMIT-


ING CASE.
(i) In Step 1 the linguist, who is a native speaker of D (the language or
language variety investigated), defines a set of dialogue schemata whose
sentences belong to a system S of an idiolect in D that is an idiolect of the
linguist.
(ii) In Step 2 the set is adopted as a set of dialogue schemata in S for D.
(iii) In Step 3 the linguist is the only native speaker of D to be selected.
Steps 4 and 5 are then applied as before, and Steps 6 and 7 remain unaf-
fected; it is simply the case that the empirical data (speaker judgments) are
282 H.-H. Lieb

obtained in a particular situation in which the linguist may be tempted to


make the data fit the theory.
To minimize this danger, presentation of sentences, elicitation of judg-
ments, and recording of judgments must be carefully kept apart, and intui-
tions concerning the semantic effects of accent occurrences that the lin-
guist may have as a native speaker of the language must be recognized
AND DISCARDED. It is a grave mistake to allow linguistic intuitions,
which may have heuristic value in the very first stages of theory forma-
tion, to interfere with linguistic judgments in a situation of controlled data
collection.

References

Lieb, Η. (forthcoming). Accent and meaning. A study of syntactic accents, stress, and rhythm,
with special reference to German.
Lieb, Η. (1979). The universal speech function. A functional account of the relation between
language and speech. In: Ezawa, K., Rensch, Κ. H.: Sprache und Sprechen. Festschrift für
Eberhard Zwirner zum 80. Geburtstag. Tübingen: Niemeyer. 185-194
H E L M U T RICHTER

An Observation Concerning Intensity as a Predictable Feature


of Intonation

The present pilot study has its main interest in the applicability to intensity
of methods such as the trend technique, and in their descriptive power.
This technique consists in finding a strainght line which is optimal accord-
ing to Legendre's criterion of least squares 1 . It is most closely related to
the theory of Bravais-Pearson's correlation coefficient 2 and therefore in-
volves all the problems of adequacy in cases of a curvilinear relation be-
tween two variables, well known in behavioural statistics. The focus of the
study will be on the empirical data rather than on reflexions on statistical
methodology. Here, however, it must be said that
(1) the incomplete fitting of a mathematical function is less importantly
suspicious on account of its suggestion of a structure where there is none,
than on account of its liability to fail to grasp a structure;
(2) the relevant acoustic information about intonation may be "coded" in
features of the speech signal which lie beyond uni-directional processes
(involving either an increase or a decrease of one parameter).
I can now add that fitting straight trend-lines to intonation curves really
does seem to do a good job for purposes of automatic phonetic analysis
(Rietveld and Boves, 1979, Takefuta, 1979). This illustrates the first point
above, for there is no original straightness in, for example, the i^-curve.
Moreover, it can be said that intensity is (re-)gaining considerable interest
in intonation research. As to the above point (2), inspection of the mate-
rial available to me soon led me to the question whether the relation be-
tween the slopes of the trend-lines preceeding and succeeding an intensity
peak might be indicative of the dynamics of speech.
First branch is introduced to refer to the part of the intensity curve
which precedes an absolute maximum (peak), the curve being the graph of
a grosso modo monotonously rising function, or, on the other hand, to the
straight line adapted to that curve section, and second branch to refer to

1
The present applications of the technique have a classical forerunner in 'phonometry', a
statistically oriented phonetics of the thirties (Zwirner and Zwirner, 196611, p. 186-188).
2
For a detailed discussion of this theory with respect to phonometrical applications see
Richter, 1974.
284 Η. Richter

the graph of the grosso modo monotonously falling function or straight


line following the peak.
Then it can be stated that
- configurations with a first branch slope greater in its amount than that
of the second branch will show the intensity peak after less than half of
the total time involved or, in short, will be left-asymmetrical\ while
- configurations with a second branch slope greater in its amount than
that of the first branch will show the intensity peak after more than half of
the total time involved or, in short, will be right-asymmetrical {see Fig. 1).

Left-asymmetry Right-asymmetry

—1/2—•+•—1/2 —
2 4 6 8 2 4 6 8
Figure 1

It is obvious that these "polar" types of configuration are allied to visual


perception in terms of gestalt. Being immunized (as a disciple of Eberhard
Zwirner) against any idolatry of curves, I abbreviated the procedure by
letting two subjects classify the events in the material 3 according to
whether they were
1 visually left-asymmetrical,
r visually right-asymmetrical, or
η neutral (undecidable or visually symmetrical)
The instruction explicitly referred to the technique of least-square straight
lines; nonetheless other stimulus properties, such as, for example, the area
under the curve, may have influenced the judges. This open situation
seems by no means invidious. Given that it is useful to examine the rela-
tion between pre-peak and post-peak intensity in terms of types of
asymmetry, it is quite unsettled whether a comparison of slopes or a glob-
al visual response corresponds best to the relevant device in auditory per-
ception. In the light of the initial point (1), one could even argue that
judgements based on gestalt perception do not involve the fitting of a pe-

3
In fact my wife (a mathematician who segmented and measured registrations and wrote
transcriptions for Zwirner's institutes for years) and myself were these judges.
Intensity as a Predictable Feature 285

culiar function, or that if fitting a straight line to a curve does not obscure
a structure, reacting to the whole configuration will not have this effect
either.

Table 1: T w o subjects' judgement of intensity-type

Judge 2 1 r η

Judge 1
1 201 2 25 228
r 3 77 22 102
η 22 9 79 110
226 88 126 440 (number of
curves to be
judged)

The degree of agreement between the two judges is shown in Table 1.


Only very few judgements came out as contradictory (pairs (l,r), (r,l)):
5 pairs about 1 percent) out of 440. There was little doubt that the neutral
category was mainly applied to visually symmetrical configurations rather
than in order to express the judges' being at a loss. Accordingly, the visual
typology might be a triple one. The convention used in this pilot study in
order to have one and only one specification for every curve somewhat ar-
bitrarily gives priority to the non-neutral judgements. So
(1,1), (l,n), and (n,l) were set equivalent to yield the left-asymmetrical or
L-curves (248),
(r,r), (r,n), and (n,r) were set equivalent to yield the right-asymmetrical
or R-curves (108);
similarly
(n,n), (l,r), and (r,l)were set equivalent to yield the symmetrical or
O-curves (84)4.
The material used to test the supposition that there is some interest in
the relation between pre-peak and post-peak intensity in terms of L, O,
and R consisted in the intensity registrations of 256 utterances of the Ger-
man ja and 184 utterances of the German »em by 30 subjects (19 male and

4
In a more detailed investigation special attention would have to be paid to judge-specific
tendencies. Obviously judge 1 was polarizing more sharply than judge 2. Another critical
point is the obvious violation, as a result of our convention, of a consistent order
# 1) # n) # r in the diagonal and in both the marginal column and marginal line of Table 1
notwithstanding the difference in individual tendencies. At present, however, the only
technical alternative to giving the non-neutral judgements priority would have consisted in
haphazardly splitting the mixed judgement pairs such as (l,n).
286 Η. Richter

11 female students at the University of Bonn)' under the following experi-


mental conditions 5 :
In experiment I 10 subjects were given a set of 12 standardized yes-no
questions, with the instruction to invent for each one a dialogue consisting
of
A's question (standardized)-»- B's answer (ja or nein)-*· A's comment.
The performance of the dialogue (as a monologue, but 'as in radiodrama')
was tape-recorded in a studio, which resulted i n l O x 12 = 1 2 0 utter-
ances of ja and nein.
In experiment III χ 10 subjects were given one of two sets of 16 stand-
ardized comments, with the instruction to use each one in a fictitious dia-
logue consisting of
A's question «- B's answer (ja or nein) χ- A's comment (standardized)
The remaining 2 X 10 X 16 = 3 2 0 elements of the present corpus were
obtained by tape-recording these dialogues. 6
Standardization of the experimental stimuli (questions or comments, re-
spectively) was as follows:
It was assumed that questions can be interest-changing, that is tend to alter
the selective concern of the person asked (B) about aspects of his "world".
As to experiment I, we reduced these aspects to two — the person asking
(A) and one section or event of the environment - reducing the presumed
continuum of degrees of interest to the polar values ο ( o f f e n , open) and a
(abgeschlossen, closed), and suppressed merely stabilizing questions (with
zero-change). This results, combinatorially, in the 12 characteristics of
questions, listed in Table 2 in the order in which they were given to the
subjects. Here the first pair represents what we shall term B's inner situa-
tion presupposed by A, while the second pair represents what A intends to
be B's inner situation. Within these pairs a value ο or λ in first position re-
fers to B's interest in the environmental event, a value ο or λ in second po-
sition to B's interest in A. (So (ao, ad) is to mean that A presupposes that Β
is uninterested in the relevant section of the environment but interested in
A himself, A intending Β to lose his interest in A as well.) The alteration

5
I did the experimental work at the Institut für Kommunikationsforschung und Phonetik
(IKP), University of Bonn, together with Rainer Seidel and Dirk Wegner. Another col-
league of the IKP, Bruno Fritsche, provided us with pitch and intensity registrations of the
yes-no answers. The experiment, not primarily undertaken for intonation purposes, was
sponsored by the Deutsche Forschungsgemeinschaft, Bonn-Bad Godesberg.
Intensity was analyzed with an integration time of 10 ms and an input resistance of
500 Ohm, and registered with a speed of 5 cm/s on a logarithmic scale ranging from 0 to
40 dB.
' In both experiments there was a second run prescribing nein where the subject had origi-
nally chosen ja, and ja where he or she had originally chosen nein. The data in the present
paper are from the first run only. Experiments I and II in a sense complete the design of
Richter, 1967 and Seidel, this design being
A's question -<— B's answer (ja and nein intonational • A's commently specified)
Intensity as a Predictable Feature 287

Table 2: Standardized questions, given to subjects 1-10

Item Charac- Altera- Formulation


num- teristic tion
ber formula
1 (ao, ad) IK WÜRDEST DU DEN LAHMEN FILM
AUSLASSEN, AUCH WENN WIR DANN
NICHT ZUSAMMEN SEIN KÖNNTEN?
2 (oa, ao) 2 WÜRDEST DU AUF DEN FILM VER-
ZICHTEN, AUCH WENN WIR DANN
ZUSAMMEN SEIN MÜSSTEN?
3 (oo, od/ IK WÜRDEST DU IN DEN SPANNENDEN
FILM GEHEN, AUCH WENN WIR
DANN NICHT ZUSAMMEN SEIN
KÖNNTEN?
4 (aa, oo) 2 WÜRDEST DU DICH ZU DEM FILM
AUFRAFFEN, AUCH WENN WIR
DANN ZUSAMMEN SEIN MÜSSTEN?
5 (oo αό) iu WÜRDEST DU AUF DEN FILM VER-
ZICHTEN, WENN WIR DANN ZUSAM-
MEN SEIN KÖNNTEN?
6 (oa, ad) lu WÜRDEST DU AUF DEN FILM VER-
ZICHTEN, WENN WIR DANN NICHT
ZUSAMMEN SEIN MÜSSTEN?
7 (ao, oo) lu WÜRDEST DU DICH ZU DEM FILM
AUFRAFFEN, WENN WIR DANN ZU-
SAMMEN SEIN KÖNNTEN?
8 (aa, od) lu WÜRDEST DU DICH ZU DEM FILM
AUFRAFFEN? WENN WIR DANN
NICHT ZUSAMMEN SEIN MÜSSTEN?
9 (oo, ad) 2 WÜRDEST DU AUF DEN FILM VER-
ZICHTEN, AUCH WENN WIR DANN
NICHT ZUSAMMEN SEIN KÖNNTEN?
10 (aa, ao) IK WÜRDEST DU DEN LAHMEN FILM
AUSLASSEN, AUCH WENN WIR DANN
ZUSAMMEN SEIN MÜSSTEN?
11 (ao, od) 2 WÜRDEST DU DICH ZU DEM FILM
AUFRAFFEN, AUCH WENN WIR
NICHT ZUSAMMEN SEIN KÖNNTEN?
12 (oa, oo) IK WÜRDEST DU IN DEN SPANNENDEN
FILM GEHEN, AUCH WENN WIR
DANN ZUSAMMEN SEIN MÜSSTEN?
288 Η. Richter

formula indicates how many components and which ones are intended to
change:
l u intended alteration concerning the environmental event (Umge-
bungsausschnitt, first components),
I K intended alteration concerning the communicator (Kommunikator;
second components);
2 intended alteration of both components
The formulations were generated by means of an explicit ad-hoc-grammar;
the event is going to the cinema (or not), the personal concern being to-
gether (or not)7 (cf. Table 2).
As to experiment II, the comments were construed so as to paraphrase
explicitly B's ja or nein in A's perspective, i. e. in terms of the presupposed
and intended situations plus B's inner situation resulting after A's question
(third pair in the characteristics of Table 3; (ao, aa, ad) is to mean that Β
has remained attached to A though A intended to change the presupposed
ο into a). Omitting comments on paradoxical results (with unintended al-
terations),8 we obtain the 31 comments listed in Table 3 in the order of
their occurrence in the experiment, plus the comment
ALSO DU WILLST DIR TROTZDEM DEN FILM ANSEHEN UND
IMMER NOCH MIT MIR ZUSAMMEN SEIN (characteristic (oo, aa,
oo), alteration formula 2 (Ο)).
This should have been number 1 of the corpus in the first half of the com-
ments, but was erroneously made equal to number 17, second half, by in-
serting NICHT between NOCH and MIT9. So number 17 was applied
with all 20 subjects. The alteration formula is now
number of changes intended (number of changes resulting);
(ao, aa, ao) thus has the alteration formula Ικ(Ο). The formulations are as
with experiment I mutatis mutandis.
In terms of L, O, and R the overall results are shown in Table 4. It can
easily be seen that the right-asymmetrical intensity curves are concen-
trated in occurrences of ja. But though the sound patterns [fricative or
semi-vowel + vowel] and [nasal + diphthong + nasal] are sufficiently
different to seem to account for a typological difference such as that be-
tween R, O, or L, the asymmetry-type of the intensity curve cannot be

7 For the present purposes we can leave open taxonomical questions of whether semantic,
pragmatic, communicative and/or psycholinguistic properties are involved in our stimuli.
Another question open for discussion is, whether the question is the means of changing B's
interest or rather the final attempt to make sure that previous interest-changing activities
had been successful.
' In one of the experiments in the project use was made of the paradoxical comments (see
note 7).
' Similarly (though by far not so incisively) the formulation of question 11 deviates in that
according to our stimulus-grammar it should to have been . . . AUCH WENN WIR
DANN NICHT ZUSAMMEN SEIN KÖNNTEN?.
Intensity as a Predictable Feature 289

Table 3: Standardized comments

Item Charac- Altera- Formulation


num- teristic tion
ber formula
First half of items, given to subjects 11-20
2 (oa, aa, ad) l u ( l u ) DU WILLST ALSO JETZT AUF DEN
FILM VERZICHTEN. NATÜRLICH
BRAUCHTEST DU NICHT MIT MIR
ZUSAMMEN ZU SEIN.
3 (oo, aa, od) 2(1κ) DU WILLST DIR ALSO TROTZDEM
DEN FILM ANSEHEN. IMMERHIN
WILLST DU NICHT MEHR UNBE-
DINGT MIT MIR ZUSAMMEN SEIN.
4 (αα,αο,αο) 1 κ (1κ) DU MAGST ALSO AUCH MAL MIT
MIR ZUSAMMEN SEIN. DEN FILM
KÖNNTEST DU NATÜRLICH AUS-
LASSEN.
5 (ao, oa, oo) 2(lu) DU WILLST ALSO TROTZDEM MIT
MIR ZUSAMMEN SEIN. IMMERHIN
WILLST DU DICH JETZT ZU DEM
FILM AUFRAFFEN.
6 (ao, aa, ao) 1κ(0) ALSO WILLST DU TROTZDEM MIT
MIR ZUSAMMEN SEIN. DEN FILM
KÖNNTEST DU DOCH SO ODER SO
AUSLASSEN.
7 (ao, oa, od) 2(2) DU WILLST DICH ALSO JETZT ZU
DEM FILM AUFRAFFEN UND NICHT
MEHR UNBEDINGT MIT MIR ZUSAM-
MEN SEIN.
8 (aa, oa, ad) lu(0) ALSO DU WILLST TROTZDEM DEN
FILM AUSLASSEN. DU BRAUCHTEST
DOCH SO ODER SO NICHT MIT MIR
ZUSAMMEN ZU SEIN.
9 (oo, oa, oo) 1 K (0) ALSO WILLST DU TROTZDEM MIT
MIR ZUSAMMEN SEIN. DEN FILM
KÖNNTEST DU DIR DOCH SO ODER
SO ANSEHEN.
10 (oo, aa, αό) 2 ( l u ) DU WILLST ALSO TROTZDEM MIT
MIR ZUSAMMEN SEIN. IMMERHIN
WILLST DU JETZT AUF DEN FILM
VERZICHTEN.
290 Η. Richter

Item Charac Altera- Formulation


num- teristic tion
ber formula
11 (oa, aa, od) lu(0)ALSO DU WILLST DIR TROTZDEM
DEN FILM ANSEHEN. DU BRAUCH-
TEST DOCH SO ODER SO NICHT MIT
MIR ZUSAMMEN ZU SEIN.
12 (oo aa, ad) 2(2) DU WILLST ALSO JETZT AUF DEN
FILM VERZICHTEN UND NICHT
MEHR UNBEDINGT MIT MIR ZUSAM-
MEN SEIN.
13 (oa, 00, oo) ΐκΟκ) DU MAGST ALSO AUCH MAL MIT
MIR ZUSAMMEN SEIN. DEN FILM
KÖNNTEST DU DIR NATÜRLICH
ANSEHEN.
14 (ao, oa, ad) 2(1K) DU WILLST ALSO TROTZDEM DEN
FILM AUSLASSEN. IMMERHIN
WILLST DU NICHT MEHR UNBE-
DINGT MIT MIR ZUSAMMEN SEIN.
15 (aa, oa, od) iu(iu) DU WILLST DICH ALSO JETZT ZU
DEM FILM AUFRAFFEN. NATÜRLICH
BRAUCHTEST DU NICHT MIT MIR
ZUSAMMEN ZU SEIN.
16 (ao, oa, ao) 2(0) ALSO DU WILLST TROTZDEM DEN
FILM AUSLASSEN UND IMMER
NOCH MIT MIR ZUSAMMEN SEIN.

Second half of items, given to subjects 21-30


17 (oa, ao, od) 2(0) ALSO DU WILLST DIR TROTZDEM
DEN FILM ANSEHEN UND IMMER
NOCH NICHT MIT MIR ZUSAMMEN
SEIN.
18 (oo ao, ao) luOu) DU WILLST ALSO JETZT AUF DEN
FILM VERZICHTEN. NATÜRLICH
KÖNNTEST DU MIT MIR ZUSAMMEN
SEIN.
19 (oa, ao, oo) 2(1K) DU WILLST DIR ALSO TROTZDEM
DEN FILM ANSEHEN. IMMERHIN
MAGST DU AUCH MAL MIT MIR ZU-
SAMMEN SEIN.
20 (ao, aa, ad) 1K(1K) DU WILLST ALSO NICHT MEHR UN-
BEDINGT MIT MIR ZUSAMMEN SEIN.
Intensity as a Predictable Feature 291

Item Charac Altera- Formulation


num- teristic tion
ber formula
DEN FILM KÖNNTEST DU NATÜR-
LICH AUSLASSEN.
21 {aa, 00, od) 2(lu) DU MAGST ALSO IMMER NOCH
NICHT MIT MIR ZUSAMMEN SEIN.
IMMERHIN WILLST DU DICH JETZT
ZU DEM FILM AUFRAFFEN.
22 {aa, ao, ad) ΐκ(0) ALSO MAGST DU IMMER NOCH
NICHT MIT MIR ZUSAMMEN SEIN.
DEN FILM KÖNNTEST DU DOCH SO
ODER SO AUSLASSEN.
23 {aa, 00, oo) 2(2) DU WILLST DICH ALSO JETZT ZU
DEM FILM AUFRAFFEN UND MAGST
AUCH MAL MIT MIR ZUSAMMEN
SEIN.
24 (ao, 00, ao) iu(0) ALSO DU WILLST TROTZDEM DEN
FILM AUSLASSEN. DU KÖNNTEST
DOCH SO ODER SO MIT MIR ZUSAM-
MEN SEIN.
25 (oa, 00, od) 1K(0) ALSO MAGST DU IMMER NOCH
NICHT MIT MIR ZUSAMMEN SEIN.
DEN FILM KÖNNTEST DU DIR DOCH
SO ODER SO ANSEHEN.
26 (oa, ao ad) 2(lu) DU MAGST ALSO IMMER NOCH
NICHT MIT MIR ZUSAMMEN SEIN.
IMMERHIN WILLST DU JETZT AUF
DEN FILM VERZICHTEN.
27 {oo ao, oo) iu(0) ALSO DU WILLST DIR TROTZDEM
DEN FILM ANSEHEN. DU KÖNNTEST
DOCH SO ODER SO MIT MIR ZUSAM-
MEN SEIN.
28 {oa, ao, ao) 2(2) DU WILLST ALSO JETZT AUF DEN
FILM VERZICHTEN UND MAGST
AUCH MAL MIT MIR ZUSAMMEN
SEIN.
29 {oo, oa, od) ΐ κ ( ΐ κ ) DU WILLST ALSO NICHT MEHR UN-
BEDINGT MIT MIR ZUSAMMEN SEIN.
DEN FILM KÖNNTEST DU DIR NA-
TÜRLICH ANSEHEN.
30 {aa, 00, ao) 2(1K) DU WILLST ALSO TROTZDEM DEN
FILM AUSLASSEN. IMMERHIN
292 Η. Richter

Item Charac- Altera- Formulation


num- teristic tion
ber formula
MAGST DU AUCH MAL MIT MIR ZU-
SAMMEN SEIN.
31 (ao, oo, oo) l u ( l u ) DU WILLST DICH ALSO JETZT ZU
DEM FILM AUFRAFFEN. NATÜRLICH
KÖNNTEST DU MIT MIR ZUSAMMEN
SEIN.
32 (aa, oo, aa) 2(0) ALSO DU WILLST TROTZDEM DEN
FILM AUSLASSEN UND IMMER
NOCH NICHT MIT MIR ZUSAMMEN
SEIN.

Table 4: Number of L, Ο, and R among occurrences of ja and nein

Experiment I Μ nein
Item number L o R L Ο R
1 1 2 3 4 _ _
2 2 3 — 5 — —

3 3 4 1 2 — —

4 3 1 — 6 - -

5 3 2 3 2 — —

6 2 1 2 4 1 —

7 1 1 4 3 1 —

8 2 2 1 4 — 1
9 — 1 — 6 2 1
10 4 2 2 2 - —

11 — 2 2 6 — —

12 1 2 3 4 - -

22 23 21 48 4 2 (Total: 120)

Experiment II ja nein
Item number L Ο R L Ο R
2 _ 2 1 3 3 1
3 3 1 4 1 — 1
4 4 3 3 — — —

5 2 — 5 2 — 1
6 2 — 2 3 2 1
7 1 2 2 4 — 1
8 - 1 3 4 2 -
Intensity as a Predictable Feature 293

Experiment II > nein


Item number L Ο R L Ο R
9 5 1 1 3 _ —

10 5 1 1 2 1 —

11 4 1 1 4 — —

12 3 1 5 — — 1
13 3 2 2 3 — —

14 4 - 1 4 1 —

15 4 1 1 3 — 1
16 4 3 — 2 1 —

17 4 1 5 7 1 2 (Total: 20)
18 1 2 2 2 2 1
19 3 1 1 3 1 1
20 2 1 2 4 — 1
21 2 1 2 4 1 —

22 1 1 — 7 — 1
23 2 1 6 — 1 —

24 2 — 2 4 1 1
25 3 1 1 4 - 1
26 2 2 2 3 1 —

27 3 2 1 4 — —

28 4 1 3 2 — -

29 2 1 2 3 1 1
30 4 — 2 3 1 —

31 4 3 3 — — —

32 1 - 3 6 - -

84 37 69 94 20 16 (Total: 320)

predicted on the sole basis of one's knowledge as to whether ja or nein has


been uttered. So we assume intonation to play a role; it will be shown in
what follows that making use of information about assumed interest-
changing properties of stimuli tends to increase the predictability of the
asymmetry type of the intensity curve.
As a measure of predictability we will use estimations I f| of the coeffi-
cient of correlation between stimulus properties and asymmetrytype, cal-
culated out of χ 2 for the pertinent contingency tables according to the for-
mula:
£
I rI = (see Hofstätter, 1953) (1)
c
max
where
c
-+^ - \ [(+l/?> + (+ Vv)]'
294 Η. Richter

η being the number of cases (120 or 320, respectively) and h and ν being
the number of lines and columns 10 .
A question to be answered before direct reference is made to the a! o-
charcteristics and alteration formulae seems to be: can the asymmetry-type
be predicted better when one knows to what degree the item in question
tends to make the subjects choose ja rather than nein in their simulated di-
alogues?

a) Asymmetry-type andja-affinity
The items were trichotomized for each of the experiments (m: number
of subjects responding with ja) into
7'd-affinity weak m< 4
jd-affinity medium 11 4 < m < 8
ja-affinity strong m > 8
These criteria have the result that 'weak' and 'strong' comprise the ex-
treme quartiles in experiment I and the extreme 6 items ( = 19.4 percent
each) in experiment II; thus 'medium' comprises the middle half of ranked
items in experiment I and 61.3 percent in the 'middle' of the ranking order
of items of experiment II.
Table 5 combines the four contingency tables to be distinguished (two
experiments, ./d and nein). Each contingency table gives the distribution of
ja- or wei»-occurrences upon pairs of asymmetrytype and level of ^ - a f f i n -
ity. Percentages adding up to 100 per column are given in parentheses.

Table 5: Asymmetry-type vs. ja-affinity

Experiment I
ja nein
weak medium strong weak medium strong
R 2 13 6 21 R 1 1 _ 2
(22.2) (39.4) (25.0) (4.8) (3.7) (0)
Ο 4 11 8 23 Ο 2 2 - 4
(44.4) (33.3) (33.3) (9.5) (7.4) (0)
L 3 9 10 22 L 18 24 6 48
(33.3) (27.3) (41.7) (85.7) (88.9) (100.0)
9 33 24 66 21 27 6 54
(99.9) (100.0) (100.0) (100.0) (100.0) (100.0)

10
The reader is assumed to be familiar with the x2-procedure applied to contingency tables.
Concerning the correlation coefficient as a measure of the goodness of estimating or pre-
dicting values of one variable on the basis of given values of another variable, see e.g.
Richter, 1967, p. 334/5.
11
The double criterion value was applied to item 17 of the comments (20 subjects).
Intensity as a Predictable Feature 295

Experiment II
ja nein
weak medium strong weak medium strong
R 11 34 24 69 R 4 10 2 16
(52.4) (29.6) (44.4) (10.3) (11.8) (33.3)
Ο 4 23 10 37 Ο 8 11 1 20
(19.0) (20.0) (18.5) (20.5) (12.9) (16.7)
L 6 58 20 84 L 27 64 3 94
(28.6) (50.4) (37.0) (69.2) (75.3) (50.0)
21 115 54 190 39 85 6 130
(100.0( (100.0) (99.9) (100.0( (100.0) (100.0)

100-1

Figure 2
296 Η . Richter

The correlation estimates are:


ja, experiment I .23 ( χ 2 = 2.3771, df= 4; n.s.)
nein, experiment I .15 (χ2 = .8687, df= 4; n.s.)
ja, experiment II .23 (χ2 = 6.8345, df= 4; n.s.)
nein, experiment II .22 (χ 2 = 4.1316, df= 4; n.s.)
We shall use these values in order to assess gains (or losses) in predictabil-
ity on the basis of different variables12. Figure 2, where use has been made
of the percentages in Table 5, reveals systematic variation of the intensity

12
None of the chi-squares is significant, even though the tables for nein give rise to rather
small 'expected values' in some cells which might lead to overestimations of the respective
deviation from the 'observed values'. According to Lorenz's rule of thumb for evaluating
single chi-square summands (comparing the summand with a criterion obtained by divid-
ing the threshold value for significance, here for Ρ = .05: 9.49, by the number of (inner)
cells of the table, here: 9), in the table for ja, experiment II,
(R, weak) can be said to be overrepresented more than accidentally,
(R, medium) and
(L, weak) can be said to be underrepresented more than accidentally;
(R, strong) and (L, medium) (overrepresentation) and (L, strong) (underrepresentation)
come close to fulfilling the criterion:

An important technical detail is that (non-)significance of a contingency table cannot sim-


ply be transferred to the derived correlation estimates. For the test a, b, and c it should be
kept in mind that a "factual" correlation would differ significantly from Ο
in experiment I (df= 64):
at the 5-percent level if it was » .24 or greater
at the 1-percent level if it was » .35 or greater
in experiment II (df<= 188):
at the 5-percent level if it was » .14 or greater
at the 1 -percent level if it was » . 18 or greater
As a factor to be checked because of its possible röle in the determination of asymmetry-
type (having at the same time nothing in common with at o-characteristics) I chose sex.
The test is negative:

Responses of male subjects


female subjects

R 32 58 90
Ο 26 34 60
L 38 68 106

96 160 256

I rI = .09 (χ2 = 1.1385, df=2\ n.s.). A "factual" correlation of .09 would not signifi-
cantly differ from 0 for df = 254.
Intensity as a Predictable Feature 297

pattern as depending on ja/nein and the ja-affinity of items (the categories


L, O, and R have provisionally been treated as forming part of a metrics):
- nein is almost exclusively L (hockey-stick graphs for nein in all in-
stances, the only exception being strong ^-affinity in experiment II with,
however, not more than 6 occurrences);
- with ja there is a virtual variation involving both L and R;
- given a proportion of L greater than that of R for one experiment, the
proportion of R will be greater than that of L for the other experiment un-
der the same jit-affinity (see the dotted auxiliary lines where there is no
linear progression in the percentages) 13 .
In accordance with the first two findings we will restrict the following
tests to occurrences of ja.

Table 6

Experiment I

lu IK 2
R 10 9 2 21
Ο 6 10 7 23
L 8 9 5 22
24 28 14 66

Experiment II

l u ( l u ) IkOk) 2(2) 2(lu) 2(1K) lu(O) 1K(0) 2(0)


R 7 9 16 10 8 7 4 8 69
Ο 8 7 5 4 2 4 3 4 37
L 9 11 10 11 14 9 11 9 84
24 29 31 25 24 20 18 21 190

b) Asymmetry-type and alteration formula (ja)


For the two parts of Table 6 we get a certain amount of increase in the
correlation estimates:
experiment I .28 (χ 2 = 3.7665, df= 4; n. s.)
experiment II .28 (χ 2 = 11.8539, df= 14;n.s.)

c) Asymmetry-type and a/o-characteristic (ja)


Basing the classification in the contingency table on one single compo-
nent in the double or, as the case may be, triple characteristic keeps the es-

13
This is in nice agreement with the reciprocal reconstruction tasks ('simulation α fronte'vs.
'simulation a tergo', see Richter, 1982), though no simple explanation seems feasible.
298 Η. Richter

timated correlation for questions equal, or even raises it, but lowers it for
comments, if compared with b:
'presupposed', experiment I .42 (χ 2 = 8.8472, 6;n.s.)
'intended', experiment I .29 (χ 2 = 4.0319, df= 6; n.s.)
'presupposed', experiment II .01 (χ 2 = 1.6518, df= 6; n.s.)
'intended', experimentII .13 (χ2 = 2.3823, df= 6; n.s.)
'resulting', experiment II .15 (χ 2 = 2.9728, df= 6; n.s.)
The low values for experiment II are not surprising when one realizes
that the classification, levelling the differentiation of the stimuli in two
components, results in an underspecification when compared with experi-
ment I, where only the differentiation of the stimuli in one component is
neglected. Indeed when leaving only one component of the comments un-
specified, we arrive at correlation estimates of a size comparable to those
of Table 7, experiment I (cf. Table 8).
'presupposed', 'intended' I f I = .05 (χ2 = 8.9449, df= 14; n. s.)
'presupposed', 'resulting' \f\ = .29 (χ2 = 11.3795, 12; n.s.)
'intended', 'resulting' I rI = .27 (χ 2 = 11.1086, 16; n.s.)
The unions of classes were formed in order to have enough cases in the
cells of the contingency tables (compare note 12). That it was possible to
do this on non-arbitrary grounds is due to our combinatorics, which had

Table 7

Experiment I
asymmetry-type asymmetry-type
vs. 'presupposed' vs. 'intended'
oo oa ao aa oo oa ao aa
R 4 5 9 3 21 R 7 4 5 5 21
Ο 7 6 5 5 23 Ο 4 8 7 4 23
L 6 5 2 9 22 L 5 5 9 3 22
17 16 16 17 66 16 17 21 12 66

Experiment II
asymmetry-type
vs. 'presupposed' vs. 'intended' vs. 'resulting'
oo oa ao aa oo oa ao aa oo oa ao aa
R 16 16 17 20 69 R 21 15 17 16 69 R 19 18 15 17 69
Ο 9 11 9 8 37 Ο 8 9 13 7 37 Ο 10 9 10 8 37
L 22 23 21 18 84 L 21 22 22 19 84 L 22 23 26 13 84
47 50 47 46 190 50 46 52 42 190 51 50 51 38 190
Intensity as a Predictable Feature 299

Table 8

Experiment II
asymmetry-type vs.
'presupposed' 00 00, oa oa ao ao aa aa
'intended' aa oa/ao ao oo/aa oa oo/aa 00 oaJao

R 10 6 11 5 8 9 13 7 69
Ο 3 6 5 6 5 4 2 6 37
L 11 11 13 10 11 10 9 9 84
24 23 29 21 24 23 24 22 190

'presupposed' 00 00 00 oa oa oa ao ao ao aa aa aa others*
'resulting' 00 oa ao oo oa aa 00 ao aa oa ao aa

R/ 5 8 6 6 10 7 11 7 4 5 8 8 21 106
Ο
L 8 5 6 6 112 6 8 6 6 8 2 10 84
13 13 12 12 21 9 17 15 10 11 16 10 31 190

'intended' 00 00 oa oa ao ao aa aa others**
'resulting' 00 oaJao oa oo/aa ao oo/aa aa oa/ao

R 11 7 5 10 8 4 8 8 8 69
Ο 6 1 4 2 6 6 4 3 4 37
L 9 11 7 11 9 9 5 14 9 84
26 20 16 23 23 19 17 25 21 190

* οο,αα; οα,αο; αο,οα; αα,οο


** οο,αα; οα,αο; αο,οα (αα,οο is item number 1)

the following effects 14 . As to the first of the matrices in Table 8, there are
just as many complete inversions ((οο,αα,?), {οα,αο}), etc.) as partial inver-
sions taken jointly. Similarly in the third matrix there is a relative prepon-
derance of complete agreement of the type (?, οο,οο), (?, αο,αο). The second
matrix shares with the third a weak representation of complete inversions
(types (?, oo,act) and (oo, ?, ad), but shows no marked preponderance of to-
tal vs. partial agreement (so the union had mainly to be made here in the
asymmetry-type dimension).

14 T h e combinatorially expected column sums are (18.4, 24.5, 24.5, 24.5, 24.5, 24.5, 24.5,
24.5) for the first matrix ( χ 2 [test vs. observed numbers:] = 3.4901, df = 7; n.s.); (12.3,
12.3, 12.3, 12.3, 18.4, 12.3, 12.3, 18.4, 12.3, 12.3, 12.3, 18.4, 24.5) for the second matrix
( χ 2 [test vs. observed numbers:] = 11.0110, df = 12; n.s.); and (18.4, 24.5, 18.4, 24.5,
18.4, 24.5, 18.4, 24.5, 18.4) for the third matrix ( χ 2 [test vs. observed numbers:] = 7.2394,
df= 8; n.s.).
300 Η. Richter

d) Asymmetry measure and combined characteristic (ja)


As long as one component in the situation characteristics remains un-
specified, predictability will presumably not reach its maximum. There is
an important difference in the socio-psychological impacts of the two
comments 10 and 18, which were both classified as (oo, ?, ao) in forming
the second matrix of Table 8. The invariants of interest-changing proper-
ties that can be stated in terms of alteration formulae and of characteris-
tics which are equal mutantis mutandis do seem to play a role in determih-
ing the respective intensity pattern. But even if this were not the case one
would have to be aware of the possibility that, given the stimuli in their
entire experimental specification, a close connection becomes feasible be-
tween the interest-changing content of a question or comment and the in-
tensity pattern of the one-word affirmations which succeed or preceed the
former in the subjects' scenes.
Having been forced, for statistical reasons, strongly to compress the
original matrices in order to obtain those of Table 8, it will be necessary to
change the technique for estimating the eventual increase in predictability
if the whole characteristic of a stimulus is taken into account. It can be
shown that any situation characteristic of our stimuli is wholly determined
by the alteration formula combined with one of the two or three compo-
nents (pairs) of the characteristic. (From alteration formula l u and pre-
supposed situation oo, we infer the characteristic {οο,αό) for a question;
from 2(1k) plus resulting situation aa we infer (αο,οα,αα) for a comment.)
Let us use combined characteristic to mean the representation of the stimuli
as an ordered pair, element of the Cartesian product of the set F of altera-
tion formulae and the set C of pairs in one (identical) component of the
characteristic. And let us introduce as asymmetry measure of an item the ex-
pression
(2)
m
where »R and «L are the numbers of R-curves and L-curves among the m
_/ii-responses to the item. Obviously we have
- 1 < μ < +1
μ > 0 if right-asymmetry prevails
μ < 0 if left-asymmetry prevails
The extreme difficulty of applying the %2-technique could be avoided after
a study of the mapping relation
F χ C -φ}
(alteration formula, component pair) —• μ
Provided that F x C can be linearized so that it can be mapped into the
set of real numbers in a way which is sensitive to our problem:
F χ C — IR
combined characteristic —• ξ G IR
we could simply calculate the correlations
Intensity as a Predictable Feature 301
1 -rru,nL·

»
( N being the number of items, tn^, m the arithmetical means, ίμ, ίξ the
standard deviations of μ and ξ), in oraer to approach the relevant degree
of predictability.

Table 9: Combined characteristic vs. asymmetry measure

Experimer It I
'presup posed' 'intei ided' sum of ph
aa 00 oa ao ao oa 00 aa line

lu -.20 0 0 + .50 0 -.20 + .50 0 + .30 3

IK -.25 -.25 + .33 + .33 - . 2 5 -.25 + .33 + .33 + .16 2

2 -.75 0 -.40 + .50 - . 4 0 + .50 - . 7 5 0 -.65 1

sum of
column -1.20 - . 2 5 - . 0 7 + 1.33 - . 6 5 + .05 + .08 + .33
pv 1 2 4 1 3 4

Experit nen it II
*presu| posed* 'inter ided' 'resu ting' sum of ph
oa 00 ao aa ao oa aa 00 ao oa 00 aa line

2(2) -.13 + .22 + .20 + .44 - . 1 3 + .20 + .22 + .44 - . 1 3 + .20 + .44 + .22 + .73> 8

2(0) + .10 -.57 + .50 + .10 - . 5 7 + .50 - . 5 7 + .10 + .50 -.05" 7

l„0u> + .33 + .20 - . 1 0 -.50 + .20 - . 5 0 + .33 - . 1 0 + .20 - . 5 0 -.10 + .33 -.07' 6
s
1„<°) -.50 -.33 0 + .75 - . 3 3 + .75 - . 5 0 0 0 -.50 -.33 + .75 -,08 5

2(lu) 0 - . 5 7 + .43 0 0 + .43 - . 5 7 0 -.57 0 + .43 0 -.14' 4

1K<V -.14 0 0 -.10 -.10 0 0 -.14 -.10 0 -.14 0 -.24' 3

2(1*) -.40 + .13 - . 6 0 -.33 -.40 -.60 + .13 - . 3 3 - . 3 3 + .13 - . 4 0 -.60 -1.20' 2

'K(°> -.40 -.57 0 -.50 -.50 -.57 0 -.40 0 -.40 -.57 -.50 -1.47' 1

sum of -1.14 -.64 -1.16 -.47 -1.50 -.75


column -1.00 + .26 -.86 -.03 -.97 + .70

pv 1 2 3 4 1 2 3 4 1 2 3 4

In Table 9 μ is given (inner cells) as a function of (alteration formula, 'pre-


supposed) and of (alteration formula, 'intended') for the questions, and of
(alteration formula, 'presupposed'), (alteration formula, 'intended'), and
(alteration formula, 'resulting') for the comments.15 The lines and columns
were ordered with increasing sums of μ. They were accordingly given

15
Item 1 was evaluated by m^ = - . 0 8 .
302 Η. Richter

rank numbers ξh and ξ ν , and ξ was defined as the sum of these numbers
for a cell:
Shv: = Sh + ξν (4)
Some correlation between μ and ξ must lead to a concentration of nega-
tive μ in the lower left area (low ξ^, ξ ν combining to low ξ) and a concen-
tration of positive μ in the upper right area (high ξ^, ξ ν combining to high
ξ) of the matrices. Cells with negative μ are marked by small triangles; so
it can be seen empirically to what degree the concentration takes place16.
As a result, we have a considerable further increase of coefficients:
'presupposed', experiment I Γμξ = + .80 (s. at 1-percent level, df=
10)
'intended', experiment I ίμξ = + -40 (n.s.)
(N= 12; τηξ = 4.5, ίξ = 1.3844; m^ = -.016, ί μ = .3654)
'presupposed', experiment II ?μξ = + .46 (s. at 1-percent level, df=
29)
(N= 31; m^ = 6.936, ς = 2.5645; m^ = -.079, ί μ = .3566)
'intended', experiment II Γμξ — + .46 (s. at 1-percent level, df=
29)
'resulting', experimentII Γμξ = + . 5 1 (s. at 1-percentlevel, d f =
29)
(N = 31; τηξ = 6.903, ίξ = 2.5318 [different location of the empty cell
for item number 1], mμ and as above) 17
What has been observed then, concerning the predictability of intensity
patterns of ja on the basis of our interest-changing questions and com-
ments on answers to such questions, is a general tendency for quantitative
expressions of predictability (correlation coefficients I r l a n d Γμξ) to in-
crease when the number of non-specified componente of the vectorially

14
The correlation indeed depends on an order in the data, which can be shown with the fol-
lowing possible matrix where the technique would obviously fail:

-1 +1 0
+1 -1 0
0 0 0

Since the numbers of responses per item are small, μ might have come out pseudo-exact.
Therefore also correlations r ^ were calculated, μ ^ ε ^ round values attributed to the ele-
ments of the equivalence classes of positive μ (μ' = + 1), negative μ (μ = — 1); for μ = 0,
μ' = 0. The outcome is very similar:
'presupposed', experiment I 'ξμ = + -73
Γ =
'intended', experiment I ξμ + -45
'presupposed', experiment II rξμ = + .42
'intended', experiment II 'ξμ = + -43
'resulting', experiment II 'ξμ = + -50
(significances like those for Γξμ, with the exception of .42 and .43 which are significant at
the 5-percent level for df = 29).
Intensity as a Predictable Feature 303

Maximal
coefficients
obtained
1.0 τ

questions

0.5 - comments

./'a-affinity

A Β C Number of
χ
0 Η non-specified
0 components
a alteration formula
b d/o-characteristic
c combined
Figure 3

characterized questions and comments decreases. Figure 3 clearly brings


out this general rule.
In Figure 3 the maximal correlations obtained are plotted against the
number of components not specified. The average values would have
given rise to a very similar picture, with the exception of a/o-characteris-
tics for comments where the obtained minimal value of .05 forces the aver-
age (.17) under the value for mere y'd-affinity (.23). This, however, is mis-
leading rather than simply undesirable. As is suggested by the considerable
span between minimal Τμξ (.40) and maximal r ^ (.80) for questions, there
can be more or less effective 'stimulus scaling' (in terms of ξ; 'response
scaling', also tentative, is in terms of μ). More generally, there can be a
grouping of data more or less effective for the purpose of revealing a co-
variation; so levelling the differentiation in the 'resulting'-component
seems less appropriate (I f\ = .05) than levelling the differentiation in 'in-
tended' ( | f | = .29) and in 'presupposed' (I f | = .27)18.

18
In this sense also a grouping according to 'type of transition' (sc. of change in equal posi-
tions of a pair in different components) is less effective than the groupings according to
alteration formula and characteristic (questions):
304 Η. Richter

For similar reasons I tend to be cautious about explaining the lesser de-
gree of predictability for the comments in terms of phenomena such as
difficulties on the subjects' side in coping with a more complex organiza-
tion compared with the questions. In experiment II the verbal reactions to
the comments were highly specific, so that missing a correlation as high as
.80 may well be due to the research worker's difficulties in finding an ef-
fective or appropriate stimulus metric.
This leads to another ground for caution. When at the beginning of this
text auditory perception was mentioned, any formulation was avoided
which might suggest a direct perception of right-asymmetry vs. left-
asymmetry. I shrink from interpretations in terms of, say, hesitating vs.
precipitating sound gestures. These must be possible with nein as concom-
itant too; nein, however, turned out almost exclusively to be left-asymmet-
rical. Strictly speaking, the predictable entities are, as has been seen in the
case of both the contingency tables and μ, proportions of responses rather
than shades of the individual's reaction.

e) Asymmetry-type and F^
Intonation is carried by more parameters than by intensity, and the
present text was not intended as an argument "against F0 (or pitch)". In a
forthcoming article19 I will, instead, study the covariation of situational
characteristics, stated in terms of 'open' vs. 'closed', with in greater de-
tail than when compared with the present observation regarding the
asymmetries of intensity. There is one question left, however, to be
answered immediately: to what degree is there a correlation between in-
tensity and Fq} Were this degree a high one, the observed regularities
might eventually have to be attributed to Fq rather than to intensity.

18
(Fortsetzung)
<J* a - er** ο —• a***
a ο
R 9 10 1 21
Ο 10 8 5 23
L 9 11 2 22

28 29 9 66

* items 1, 3, 5, 6, 9
** items 4, 7, 8, 10, 12
*** items 2, 11
I f I = .23 (χ2 = 2.4725, df = 4; n. s.), the amount of the coefficient coming close to the
5-percent threshold for factual correlations with df= 64 but not providing any gain rela-
tive to the estimated correlation for ja- affinity.
19
See Richter, in Winkler (ed.), 1983.
Intensity as a Predictable Feature 305

One can, by inspecting the graphic output from a pitchmeter, distin-


guish as configurations of F0 occurring with ja and nein in our sample 20 :
fall (f) level-fall (If) rise-fall (rf)
fall-level (fl) level (1) rise-level (rl)
fall-rise (fr) level-rise (lr) rise (r)
These configurations are associated with the asymmetry-types as shown in
Table 10. Figures on the upper left are for ja, figures on the lower right
are for nein.

Table 10: Asymmetry-type vs. ^ - c o n f i g u r a t i o n

Ex perime it I
f If rf fl 1 rl fl lr r

1 5 2 2 - 5 6 - 21
R
1 1 2
- 2 9 - - 4 8 23
Ο
1 —
3 4
2 2 11 1 3 2 22
L
2 14 18 1 9 1 1 1 1 48

3 9 22 4 12 16 66
2 14 20 1 10 1 4 1 1 54

20
T w o judges as above were in accordance with 94.8 percent of the occurrences (417 out of
120 + 320 cases). T h e 23 cases in which the judgements diverged were decided about in
overt discussion. Three ternary configurations having been discovered by the two judges
independently, were subsumed under binary types by omitting the non-initial/non-termi-
nal segment: f l r : f r , r f r : r , f r f : f (compare, in this connection, Table 11 and the relevant
computations).
Perceptually, most of the configurations will, as a consequence of low stimulus duration,
result in pitch gestalts in which no "falling", "level", or "rising" states o r movements are
experienced. Following Richter, 1967: 360, these auditory results could tentatively be la-
belled as follows:
hang (Hang) « - f jump (Sprung) «-If lift (Hub) « - r f
bend (Knick) «- fl jerk (Ruck) «-1 fly (Schwung) «- rl
turn (dreh) « - f r push (Schub) « - l r stand (Stand) «-r
It is a different question, how untrained subjects will label the phenomena that are given
them in their auditory experience. Nomenclatoric traditions from music may function
here in a way similar to the functioning of orthographic traditions in segmental phonetics,
and the gestalt of 'stand', to take one example, was assumed to be the response to the sig-
nal property of short rise not only on behalf of our own impression labelled with relatively
little innocence, but also as an accounting for many innocent people's labelling their im-
pression evoked by short rise in the signal as "level". So when we take holism seriously,
this does not lead us to a phenomenalist methodology but makes us favour the discrimini-
tion experiment. Note, by the way, that jumps or jerks in perception are not by necessity
evoked on the (sole) basis of FQ.
306 Η. Richter

Ex1 perimei it II
f If rf fl 1 rl fl lr r
10 4 23 2 2 - 13 12 3 69
R
2 4 6 1 2 1 16
3 6 11 - 1 1 9 5 1 37
Ο
5 4 4 1 3 2 1 20
20 9 18 4 9 15 5 4 84
L
16 21 34 1 10 5 4 3 94
33 19 52 6 12 1 37 22 8 190
23 29 44 1 12 10 6 5 130

Correlation estimates I rI according to formula (1), i.e. immediately


based on Table 10 as a (set of) contingency table(s), amount to about .35.
We need not discuss the numerical details here 21 , for the differences of I r\
for various sections of Table 10 are not great enough to justify the analytic
piece-work necessary. If we used the estimated correlation coefficient as a
measure of uncertainty reduction
R = 100- 100/rrp(5),
in percent, when predicting unknown values of one variable on the basis
of one's knowledge of the other variable's values, then for I r | = .35 we
would find the uncertainty in predicting asymmetry-type on the basis of
i^-configuration (and vice versa) reduced by 6.3 percent only22. Correla-
tions between asymmetry measure and combined characteristic are not
that feeble.
Table 10 can be rearranged so as to make its information more transpar-
ent for our problem. Let us assign numbers to the 'nominal scale' values as
follows:
f -1 L -1
1 0 0 0
r +1 R +1

Contingency table Χ2 df significance of χ 2 Irl


Experiments I & II, ja 29.1489 16 s. (.01{/>(.025) .38
Experiment I, ja 15.9450 12 n.s. (,l</>(.3) .27
Experiment II, ja 24.6284 16 n.s. (.05(/>(.l) .41
Experiments I & II, ja & nein 42.2079 16 s. (/χ.001) .35

As can be seen from the df, χ 2 was allowed to exceed the conventional tolerances (see
note 12) in order not to enforce too small correlation estimates.
22
For a more detailed discussion of ras a parameter of R see e. g. Richter, 1967: 334/5.
Intensity as a Predictable Feature 307

It is evident that at least the order of these numbers can be given consist-
ent meaning in the two series (sign of i^-slope and of the difference be-
tween intensity peak abscissa and interval centre, respectively). After tech-
nically extending f to ff, 1 to 11, and r to rr, we accordingly get the
correlation diagrams of Table 11 where figures on the upper left are again
for ja, figures on the lower right for nein; i meaning 1st F 0 -component, t
meaning FQ-component, A meaning asymmetry-type.
These combinations of data allow for the calculation of product-mo-
ment correlations (compare formula (3)) the underlying structure of
which is clearly brought about by extracting the one common factor for
which loadings g of variables can approximately be determined with the
'centroid method' of factor analysis (see e.g. F, 1980) 23 . As can be seen in
Table 12, some variation as to the two experiments and as to the two sen-
tence adverbs occurs, but the pattern is a general one:
- Correlations r,·, between the two FQ variables are negative; their amounts
are greater than those of r ^ and of r^; the negative sign is due to the high
proportion of signals resulting in 'lifts' and 'turns'.
- Correlations r ^ between the second /^-component and asymmetry-type
are positive; as a rule, their amount comes second. Correlations r ^
(asymmetry-type vs. first component) centre around zero.
- Factor loadings g are negative for i, positive for t and for A; since they
express the correlation between the empirical variables and the factor, the
factor itself appears to "push" terminal FQ upward, the intensity peak to

Table 11: Covariation of /^-components and asymmetry-type

Experiment I
W t -1 0 + 1 \ A -1 0 + 1
i n. (1) (r) (L) (O)
(f) i (R)
22 11 9 2 22
+ l(r)
20 1 1 22 22 20 1 1 22
9 4 16 6 10 13 29
0(1)
14 10 1 29 25 24 1 25
3 12 5 4 6 15
-1(f) 15 7 "1(f)
2 1 4 4 3 7
34 4 28 66 22 23 21 66
36 12 6 54 48 4 2 54

23 That I riA\ and I rtA\ are smaller than I f| estimated on the basis of Table 10 might be due
to some non-linear relation in Table 11; our basing the following argument on a factor an-
alysis will to a certain extent compensate for not having tried to increase the correlation by
transforming i, t, or A.
308 Η. Richter

\ A -1 0 +1
t (L) (O) (R)
5 12 11 28
+ l(r)
3 3 6

2 - 2 4
0(1)
11 1 12

15 11 8 34
-1(f)
34 1 1 36

22 23 21 66
48 4 2 54

Experiment II

i
t -1
<f)
0
(1)
+1
(r)
\i A -1
(L)
0
(O)
+1
(R)
52 1 8 61 22 13 26 61
+ 1(0 44 + l(r) 7 49
5 49 37 5
19 12 22 53 23 12 18 53
0(1) 0(1)
29 12 6 47 35 7 5 47
33 6 37 76 39 12 25 76
-1(f) -1(f)
23 1 10 34 22 8 4 34
104 19 67 190 84 37 69 190
96 13 21 130 94 20 16 130

\ A -1 0 +1
t (L) (O) (R)
24 15 28 67
+ l(r) 12 3 21
6
13 2 4 19
0(1)
11 1 1 13
47 20 37 104
-1(f)
71 13 12 96
84 37 69 190
94 20 16 130
Intensity as a Predictable Feature 309

Table 12: Factor analysis of rit> rjA, rtA

lOO^for:
r r r i t A
it iA tA gi gt gA
Experiments \bcl\Ja - . 4 1 + . 0 4 + . 1 2 - , 5 6 + .68 + . 1 4 31.7 46.0 2.1
nein — .29 — .08 + . 1 1- . 5 1 + .54 + .23 26.4 28.9 5.5
Experiment I, ja - . 6 4 - . 2 5 + . 2 5 - . 7 8 + .78 + .38 61.4 61.4 14.8
nein - . 5 1 — .14 + .27- . 6 6 + .73 + .38 43.0 53.2 14.8
ja&c nein - . 6 0 - . 2 4 + . 3 5 - . 7 3 + .78 + .47 52.8 61.1 22.5
Experiment II, ja — .34 + .12 + .07- . 4 8 + .64 + . 0 6 22.7 40.8 .4
nein - . 2 2 - .04 + .07 — .44 + .47 + .17 19.7 22.2 2.8
ja Si nein — .32 + .03 + .13- . 4 8 + . 6 1 +.18 23.1 36.8 3.3
Experiments I&II,
ja&c nein - . 3 8 - . 0 4 +.18 - . 5 5 + .64 + .27 29.9 41.3 7.5

the right, and initial FQ downward, when increasing, and to "push" termi-
nal FQ downward, initial FQ upward, and the intensity peak to the left,
when decreasing 24 .
— What counts most with regard to the covariation of FQ with intensity:
the 'explanatory power', measured as the square (multiplied by 100) of its
loading for a variable is by far greater for t and for i, than for A. In a ma-
jority of the cases distinguished, more than 90 percent of the variance in A
can not be attributed to a factor which, on the other hand, accounts for
about one third of the variance in each i^-component.
So determinants other than those relevant for FQ will occasion the rela-
tive location of the intensity peak. The sophisticated instrumental phoneti-
cian will not have overlooked an early hint at segmental influences likely
to play a röle; it is the bias of the occurrences of nein to left-asymmetry
that caused us to confine the subsequent tests to ja, but it is the asymme-
try-variation in ja that had rendered the status of bias to a uniformity
which would otherwise not have surprised us. To sum up: the shaping of
intensity is not sufficiently explained by reference to /^-configurations

24
It should be noted that for experiment II, ja, the solution in terms of something like an
"open cadence & intensity deferment factor" just fails to be the optimal one: a solution in
terms of a "closed cadence & intensity deferment factor" (g, = + .64, & = — .49, gA =
+ .14) is slightly better than the one given in Table 12 for reasons of conformity. (Cumu-
lating the squares of loadings, cf. the last three columns of Table 12, we get the ratio
.659: .639 = 1.03 [ = 3 percent gain by the alternative solution] as a rough criterion.)
Since for experiment I, ja, the solution given is the one with roughly the highest overall
explanatory power, and as the sums of 100 g2 for experiment II singled out (lines 6-8) irre-
spective of their optimality (with the faint exception of line 6) amount to only one half of
the sums of 100 g2 for experiment I (lines 3-5), one is tempted to think of a mapping, inex-
plicable yet, of 'a fronte' vs. 'a tergo' into the numerical data. (Compare the reflected im-
age traits of Figure 2, commented in note 13.)
310 H.Richter

and to the basic segmental make-up of the types in our sample, and its sta-
tistical linkage with situational characteristics can not be reduced accord-
ingly. It is an open question what mechanisms of phonetic substance medi-
ate the observed influence.

References

Faverge, J.-M. (1980). Mathematisch-statistische Methoden in der Psychologie. Huber, Bern etc.
Hofstätter, P. R. (1953). Einführung in die quantitativen Methoden der Psychologie. Barth,
München.
Richter, H. (1967). Zur Intonation der Bejahung und Verneinung im Hochdeutschen; in: Eg-
gers, H., et at. (eds.), Satz und Wort im heutigen Deutsch. Schwann, Düsseldorf.
Richter, H . (1974). Eine anschauliche Interpretation des Korrelationskoeffizienten nach Bra-
vais-Pearson. In Richter, Η., K. H . Rensch, M. Sperlbaum and E. Knetschke (eds.), Spe-
zielle methodische Untersuchungen sprachlicher Phänomene Niemeyer, Tübingen 1974 [Pho-
nai, suppl. 2].
Richter, H . (1980). Commenting on question-answer pairs. Grazer Linguistische Studien 17.
Richter, Η . (1983) FQ variation dimensionally reconstructed; in: Winkler, P. (ed.), Investiga-
tions of the speech process. Brockmeyer, Bochum 1983.
Rietveld, A.C.M., and L. Boves (1979). Automatic detection of prominence in the Dutch lan-
guage. Abstract in: Proceedings of the Ninth International Congress of Phonetic Sciences,
vol. I. Institute of Phonetics, University of Copenhagen, Copenhagen.
Seidel, R. (n.d.). Zur sozialen Variabilität bei Frage-Antwort-Interaktionen. Μ. A. thesis,
University of Bonn.
Takefuta, Y. (1979). Some experiments in the digital extraction of American intonation pat-
terns. Abstract in: Proceedings of the Ninth International Congress ofPhonetic Sciences, vol. I,
Institute of Phonetics, University of Copenhagen, Copenhagen 1979.
Zwirner, Ε., und Κ. Zwirner (1966). Grundfragen der Phonometrie Karger, Basel/New York,
2nd ed.
MITSOU RONAT

Logical Form and Prosodic Islands

Various recent studies have integrated intonational or prosodic parameters


into the general framework of generative grammar: 1 Liberman (1975),
Liberman and Prince (1977), Selkirk (1978), Bing (1979). Studies on in-
tonation introduced theoretical innovations such as "metrical trees".
These in turn led to substantial improvements of the theory of the phono-
logical component: see Halle and Vergnaud (1979), and the papers edited
by Safir (1979).
The present paper suggests that another aspect of intonation might be
important for the study of such central syntactic questions as logical
formOO and empty categories^), particularly the distinction between empty
categories (or between occurrences of the empty category) suggested by
Chomsky (1981).
Intriguing new facts, involving the Default Accent isolated by Ladd
(1980), give new support to Chomsky's hypotheses, and raise interesting
theoretical questions.
Since intonational evidence is not usual in generative studies, I will first
describe the type of data in question. In particular, a clear distinction will
be made between default accent and contrastive stress. I will then show
the effect of the prosodic binding created by default accent on the
antecedent/empty category relation. I will propose that well known anal-
yses can be extended, at no great theoretical cost, to account for these
data, although the present analysis will slightly modify the picture of the
prosodic component.

First I would like to thank my colleague and friend Jacqueline Gueron for her very important
criticisms and comments. Thanks also to Barry Schein and Thelma Sowley for having read
and criticized the present version.
1 For those readers who may be not familiar with the current framework in generative gram-

mar, I will summarize some definitions of terms ( w ) in a terminological appendix to this


article.
312 Μ. Ronat

I The Data

Default accent is a discourse phenomenon, in the sense that we need at


least two sentences to make it appear. Ladd (1980) proposes the following
example:
(1) A: Has John read Slaughterhouse-Five?
B: No. John doesn't read books.
In example (1), books has been weakened and is understood as a repeti-
tion, a reference to Slaughterhouse-Five. Ladd's work follows Bolinger
(1977); as such, it mostly develops hypotheses bearing on the relation be-
tween meaning and intonational patterns. Nevertheless, Ladd adopts Li-
berman's metrical trees for stress patterns, which he says are crucial: "be-
cause of the relational nature of rhythmic structure, this operation must
also involve the concomitant strengthening of read". Ladd uses binary
trees to represent (1): "The deaccenting involves switching a single pair of
node labels from weak-strong to strong-weak or vice-versa":

John doesn't read books John doesn't read books

Now the semantic interpretation of the phenomenon is crucial for the


present purpose. Ladd insists that default accent must not be confused
with contrastive stress, as had so often been the case previously (1980: 81):

the meaning of B's reply in (1) is not the explicit contrast: John doesn't READ books, he
WRITES (REVIEWS, COLLECTS, BURNS) them. Rather, the point of the accentual pat-
tern is that books is deaccented; the focus is broad, but the accent falls on read by default.

So, although stressed, read is not interpreted as contrasting with some


other verb, not even as the focus of the sentence. Notice that in Ladd's
other example:
(3) . . . B: No. John doesn't r6ad trash.
trash, although unstressed, adds new information, namely a predication on
the book mentioned by speaker A.
Thus the deaccented element in the second sentence must be perceived
by the speakers as related in some way to one element of the first sen-
tence;2 directly, of course, if the element is simply repeated, or deriva-

2
Dafydd Gibbon communicated his own criticisms on Ladd's hypothesis to me in this con-
text.
Logical Form and Prosodic Islands 313

tively, through various rhetorical operations such as metonymy, synec-


doche, allusion, etc.
In the following examples, the italicized constituents between brackets
are to be read as the deaccented elements. In (4), we have a genre/species
type of relation:
(4) Paul regards les informations tous les soirs; Marie est jalouse [de la
television].
(Paul watches the news every evening; Mary is jealous of television)
In (5), we find an allusion to the fact that the French academicians wear a
green uniform - and the discourse can be interpreted as a joke:
(5) A: Le professeur Dupont veut etre elu ä l'Academie Franfaise.
B: Oui, II aime enormement [les habits verts].
(A: Prof. Dupont wants to be elected to the French Academy
B: Yes. He is extremely fond of green suits.)
Thus the semantic interpretation of default accent seems to cover a scale
from zero, as in (6), to contradiction, as in (7):
(6) A: J'ai propose cette chambre ä Sophie.
B: Bonne idee; on dort tr£s bien [dans cette chambre].
(A: I suggested this room to Sophie.
B: Good idea; one sleeps very well in this room.)
(7) A: Paul a lu un livre extraordinaire sur le langage des singes.
B: Moi j'ai vu un film [stupide].
(A: Paul read an extraordinary book on the language of monkeys.
Β: I saw a stupid film, myself.)
In (7), the deaccenting of stupide changes the nature of the dialogue.
Without deaccenting, it would have presented a neutral exchange of infor-
mation; by deaccenting, speaker Β has found a way of criticizing speaker
A's appreciation concerning the book on monkeys.
In brief, contrastive stress and default accent can be distinguished in the
following way: contrastive stress opposes the elements which bear the
highest pitch (and those elements can be very close, semantically):
(8) Je n'ai pas dit qu'il AIMAIT ?a: il A D O R E 9a.
(I didn't say that he likes it: he adores it.)
Default accent brings together elements which can be semantically very
different, and nothing special indicates which element in the first sentence
is the target of the deaccented element in the second sentence.
Consequently, contrastive stress has a specific logical function in se-
mantic interpretation. Default accent has a discourse function: the major
function, Ladd says, of default accent, is "to keep discourse on track"; he
proposes the general rule "Deaccent something in order to signal its relation
to the context."
One must be aware of the difficulties inherent in the intuitive distinc-
tion between contrastive stress and default accent: in English, according to
Ladd, the phonetic realisation (and the corresponding abstract patterns)
314 Μ. Ronat

may be practically identical. Linguistic tradition would lead the reader to


construct contrastive interpretations when asked for judgments of accept-
ability on discourses involving default accent. These recommendations
must be kept in mind in the second part of the paper.
In French, phonetically speaking, the situation seems slightly different.
I will not enter here into the details of the prosodic patterns (cf. Ronat, in
preparation), 3 but I will simply give some observations, in order to help
the French native speaker to 'hear' the examples given below.
A plain statement presents the general form (9a), if the focus is broad
(in Ladd's sense), or (9b), if the focus is narrow:

( )
^" FOCUS"'

A sentence containing a default accent presents a curve broken into two


prosodic constituents, the first showing a curve identical to the first consti-
tuent in (9.b), and the second, a flat contour, realised wither on low pitch
(general case) or on high pitch (in certain questions). It is important to
note that the last high pitch of the first prosodic constituent in (10.a) is
not higher than the corresponding one in (9.b):

(10) a. [[^ ][ ]]

b- [[χ- •][ ]]

French contrastive stress obligatorily requires extra-high pitch. The differ-


ence between high and extra-high is sometimes subtle, but still audible, I
think. Default accent excludes extra-high pitch.
Again, although default accent is realised with a flat contour, all occur-
rences of flat contours do not correspond to default accent. In fact, flat
contours were observed in previous studies mainly in three cases: paren-
theticals, after contrastive stress, and after an internal focus.4 Examples
(11), (12), (13) illustrate these three cases:
(11) Marie viendra, repondit Jean, samedi soir.
(Mary will come, answered John, on Saturday night.)
(12) A: Je crois que Paul a deteste ce livre.
B: Non, Paul a ADOR£ ce livre.

3
In Ronat (in preparation), I propose a general analysis for the various phonological, syn-
tactic and semantic properties of prosodic binding.
4
See Adjemian (1978).
Logical Form and Prosodic Islands 315

(A: I think that Paul detested that book.


B: No, Paul adored that book.)
(13) A: Qu'est-ce que Pierre a mis dans sa poche?
B: Pierre a mis ton livre dans sa poche.
(A: What did Peter put in his pocket?
B: Peter put your book in his pocket.)
To get a default accent with speaker B's answer in (13), we should insert it
in a context similar to the following (implying causality here):
(14) A: La poche de Pierre etait entierement dechiree.
B: Piere a mis ton livre [dans sa poche].
(A: Peter's pocket was all torn.
B: Peter put your book in his pocket.)
Examples (13) and (14) show that the distinction between default accent
and focus does not rely on phonetic grounds but on semantic grounds (for
differences in syntax and pragmatics, see Ronat [in preparation]). In (13),
the constituent ton livre is FOCUS; in (14) the FOCUS, that is new infor-
mation, covers the whole sentence: the new information is Pierre a mis ton
livre dans sa poche, and not Pierre a mis ton livre. But, since poche has al-
ready been mentioned, the constituent dans sa poche can be deaccented.
In fact, although this article essentially presents some properties of de-
fault accent, it may be understood as providing additional criteria for dis-
tinguishing it from contrastive stress and Focus.
To summarize, default accent exists inside sentences, but only under
certain discourse conditions: I will refer to these phenomena and specific
conditions as PROSODIC BINDING, the prosodic binding which just
helps 'to keep discourse on track'.
After that brief description of the data and those caveats concerning in-
tuitions, I will center my analysis on what seems to me to be the core prob-
lem.

II The Problem

To my knowledge at least, the following problem has never been noticed


before: once a linguistic constituent has been deaccented by prosodic bind-
ing (given the definition above), it becomes an island; prosodic binding
creates prosodic islands(x). Two examples will immediately show the nature
of the evidence ('[e]' is an empty category):
(15) a. (Marie devait annoncer Pierre)
A-t-elle dit [qu'il est arrive]
b. (Marie devait commenter les repas de Pierre)
* Qu'a-t-elle dit [que Pierre a mange (e)]
(a. Mary was supposed· to announce Peter/Did she say that he ar-
rived?
316 Μ. Ronat

b. Mary was supposed to comment on Peter's meal/What did she


say that John ate?)
(16) a. (Paul avait comme ami uniquement des gens qu'il croyait capa-
bles de tuer le roi.)
*Mais tout le monde en etait [capable (e)]
b. ( = 16.a.) Mais tout le monde etait [capable de fa]
(a. Paul had as friends only people whom he believed capable of
killing the king/But everyone was capable of that,
b. But everyone was capable of that.)
In (15) and (16), and in the foregoing examples, the first sentences are
presented as possible or 'natural' contexts. 5 (5). In (15.b) and (16.), the
'prosodic islands' contain an empty category whose antecedent (que or en)
happens to be outside of it, in the first prosodic constituent. If the
antecedent is also in the prosodic island, the sentence is good. Compare
(15.b) and (17):
(17) (Paul se demande si Pierre est arrive ä l'heure). Marie sait [quand
Pierre est arrive (e]\
(Paul wonders whether Peter arrived on time. Mary knows when
Peter arrived.)
This observation suggests that something like the following constraint is
involved (first approximation):
(18) An empty category must be bound within its intonation contour.
Constraint (18) holds only in prosodic binding cases:6 (6): the flat contour
which follows contrastive stress does not correspond to a prosodic island,
and (19) is perfectly good:
(19) Marie n'en est pas capable mais eile en SERA capable. (Mary is
not capable of it but she will be capable of it.)
Constraint (18) is true for a number of syntactic operations. WH-Move-
ment in matrix questions illustrated in (15); in (20) and (21), relative
clause and indirect question behave similarly:
(20) Les professeurs doivent rencontrer le pere de tous ces enfants)
*Le gar90η dont Jean va voir [le pere (e)], c'est Toto. (John must
meet the father of all these children. The boy whose father John
will see is Toto.)
(21) % . . = 20). II ne sait pas de qui il va voir [lepere (e)] (He doesn't
know whose father he will see.)
Infinitival complements also provide relevant paradigms involving WH-
Movement and prosodic binding; in French, voir (= see) can take an in-
finitival complement with subject PRO( x ):
5
Of course, the notion of 'natural context', used here intuitively, should defined more pre-
cisely.
' Sentence grammar does not allow the test in the presupposition/ focus case, since there is
no WH-rule for verbs; moreover, flat contour is impossible on subjects (if it is in the first
prosodic constituent).
Logical Form and Prosodic Islands 317

(22) (Aux Beaux-Arts, les etudiants font le portrait les uns des autres)
a. Moi le garfon que j'ai vu (e) [peindre un portrait], c'est lui.
b. Moi le garg:on que j'ai vu [PROARB peindre (e) aux Beaux-Arts],
c'est lui.
([At the School of Fine Arts, students paint each other's portraits]
a. In my opinion, the boy who I saw paint a portrait, it's him.
b. In my opinion, the boy who I saw painted at the School of Fine
Arts, it's him.)
(22.a) is good, contrary to (15.b), because the empty element is not in-
side the prosodic island; (22.b) is bad because the empty category is inside
the prosodic island. Consequently, (23) not interpretable at all:
(23) (. . . = 22) Moi le tableau que j'ai vu [peindre aux Beaux Arts],
c'est celui-ci.
(In my opinion, the painting which I saw painted at the school of
Fine Arts is this one)
(23) is out, either because of the prosodic island constraint if tableau is
understood as the underlying object of peindre, or else because tableau is
not, semantically, a correct subject for this verb, which requires an ani-
mate one.
Quantifier-Movement is subject to the same constraint: quantifiers can-
not be separated from their sources by a prosodic island boundary:
(24) (Dans ce magasin, les objets soldes coütent dix francs).
a. Paul aurait bien tout voulu acheter (e) [pour dix francs]
b. Paul aurait bien voulu [tout acheter (e)pour dix francs]
c. *Paul aurait bien tout voulu [acheter (e) pour dix francs]
([In this store, objects on sale cost ten francs) Paul would indeed
have liked to buy everything for ten francs.)
(25) (. . . = 22)
a. Combien as-tu vu peindre (e) [de tableaux]}
b. *Combien as-tu vu [peindre (e) de tableaux]?
(How many paintings have you seen painted?)
Interestingly, the prosodic island constraint functions only for the ele-
ments embedded in the deaccented constituent. That is, the empty-cate-
gory/antecedent relation is not blocked if the deaccented element precedes
or follows, but does not contain, the empty category. In (26.a) the relation
is blocked, in (26.b) it is not:
(26) a. * [ X (e)]
b. [(X) (e)]
Since the deaccented element always coincides with the last constituent of
the sentence, the difference between (26a) and (26b) is difficult to test. Yet
the intuitive judgements are clear. Consider example (27): it is ambiguous;
it can mean either that Mary appreciates the beauty of Paris or that she
proved to be sensitive when she was in Paris.
(27) Marie s'est montree sensible ä Paris
318 Μ. Ronat

(a. Mary has shown herself sensitive to Paris.


b. Mary has shown herself sensitive in Paris.)
After Clitic-Placement (a Paris/y) (27.a) corresponds to structure (26.a),
and (27.b), to (26.b). In appropriate contexts, we can see that only (26.b)
is acceptable:
(28) A: Crois-tu que Paris peut emouvoir ses visiteurs?
B: *Marie s'y est montree [sensible (ej\
(A: D o you think that Paris can move its visitors?
B: Mary has shown herself sensitive to it.)
(29) A: Crois-tu que Paris peut favoriser les sentiments?
B: Marie s'y est montree [sensible] (e)
(A: D o you think that Paris favours feelings?
B: Mary has shown herself sensitive there.)
(One might wonder whether Heavy-NP-ShiftW can apply in (29B), mov-
ing sensible to the end of the sentence, and giving Marie s'y montree (e)
sensible.) Constraint (18) is also relevant for movement in logical form
(LF). As is well-known, a sentence like (30) is ambiguous:
(30) Tout le monde ici parle couramment deux langues.
(Everyone here speaks two languages fluently.)
Prosodic binding disambiguates the sentence. Only narrow scope is possi-
ble, that is, wide scope with deux langues is excluded:
(31) A: Pour avoir ce job, il faut connaitre deux langues.
B: Tout le monde ici parle couramment [deux langues]
(A: To get that job, one must know two languages.
B: Everyone here speaks two languages fluently.)
The fact that wide scope is blocked by prosodic binding supports May's
(1977) hypothesis concerning a Quantifier-RaisingW rule in Logical
Form. But the fact that the sentence is good under one interpretation,
when deux langues is totally unspecified, suggests that Gueron (to appear)
is right in claiming that in those cases the quantified N P does not move in
Logical Form.
These data contrast with those involving WH-Raising( x ), as discussed
recently in Aoun-Hornstein-Sportiche (1981). They propose that the
WH-constituent in situ at S-Structure is moved in LF by a one step rule to
the top-most complementizer. If constraint (18) is true, it supports this hy-
pothesis, since (32) is totally impossible:
(32) A: II nous faut des gens pour nouer des contacts
B: ""Qui serait capable [de rencontrer qut\
(A: We need people to interact with.
B: Who would be capable of meeting whom?)
For further confirmation, see Sportiche and Koopman (1981).
Thus, at S-structure as well as in LF, an empty category cannot be iso-
lated in a constituent 'frozen' by prosodic binding.
Chomsky (1981) defines two other occurrences of the empty category,
Logical Form and Prosodic Islands 319

namely PRO and NP-trace. One may ask whether or not their relation to
their antecedent is affected by prosodic binding. The answer is no:
(33) (A: Veux-tu venir au bord de la mer?)
B: Avec plaisir; j'aime beaucoup [PRO nager], c'est Jean qui m'a
appris [a PRO nager]
(A: Would you like to come to the beach?
B: With pleasure, I like swimming very much; it's John who
taught me how to swim.
(34) L'enseignement actuel ne favorise pas les vocations d'ecrivains.)
Le fils de Jacques [voulait PRO ecrire]
(Modern teaching does not favour the writer's vocation. James'
son wanted to write.)
(35) (A: Tu sais la nouvelle? Paul a battu Marie)
B: Jean a ete [battu (e) par Paul]. Rien d'etonnant ä ?a.
(A: Have you heard the news? Paul beat Mary.)
B: John was beaten by Paul. Nothing surprising about that.)
(36) (A: Robert ne s'interesse pas aux gens bien portants)
B: Le fils de Jean-Jacques [semble (e) etre malade] Presente-le-lui!
(A: Robert is not interested in healthy people.
B: Jean-Jacques' son seems to be sick. Introduce him to him!)
Consequently, examples (33)-(36) illustrate a new distinction between
NP-traces and P R O on the one hand, and A-bound traces on the other.
(For the importance of the opposition A-bound/ A-bound( x ), see
Chomsky, 1981). This is not surprising, since other distinctions estab-
lished on independent grounds are now well-known, particularly the "in-
visibility" of PRO and NP-traces to phonological rules, e.g. the wanna
contraction^) (see Chomsky, 1981: Ch. 3.2.2., Jaeggli, 1980).
After presenting the basic data concerning prosodic binding and empty
categories, I will turn to a more interesting question: how is it possible to
explain such phenomena in the current framework?

Ill Alternative Hypotheses

It is true that evidence involving discourse data is rarely included in stud-


ies concerning core grammar: generative grammar is sentence grammar,
by definition. However, if prosodic binding requires a reference to con-
text, the blocking of the antecedent/empty category relation as stated for
instance in (18) is defined within sentence boundaries. And in fact, the
data presented in this paper do not seem unfamiliar; rather, they closely
resemble the data which occupy the center of investigation of most recent
studies in the field, namely the data which are taken into account by bind-
ing theory. One can suppose that well-known analyses can easily be ex-
tended, at no great theoretical cost.
320 Μ . Ronat

We immediately have two candidates, namely the ECP, the empty cate-
gory principle proposed by Chomsky (1981) or, in another version, by
Kayne (1981a), and the CCC, the complete constituent constraint, pro-
posed by Gueron (1980). Roughly, the former is a "syntactic" hypothesis
(if LF is to be considered as syntax), and the latter is an interpretive one.
These conditions are each given in (37), (38), and (39), respectively:
(37) (Chomsky (1981)):
If α is an empty category, then
(i) α is PRO if and only if it is ungovernedW
(ii) α is trace if and only if it is properly governedW
(iii) α is a variable only if it is Case-marked
(38) (Kayne (1982)):
An empty category β must have an antecedent α such that
(1) α governs β or
(2) α c-commands β and there exists a lexical category X such that
X governs β and α is contained in some percolation projection of
X.
(39) (Gu6ron (1980)):
A complete constituent X' may not contain a variable not bound in
XK
(A complete constituent is an X' in which X*~l is governed by a
logical operator; examples: N P as names, S as propositions, VP as
properties).
(The relation (X\ X1-^) is that of 'immediate domination' in a constituent
structure tree, where both X> and are heads of constructions in the
sense of X-bar theory. X usually ranges over {5, NP}, and i over {1, 2} e. g.
5° = S; S1 or S immediately dominates S, S2 or S immediately dominates
S, etc. Ed.)
Chomsky's hypothesis, ECP, essentially requires that the empty element
be properly governed; Kayne's insists on the relation between the anteced-
ent and the empty element: the chain of 'superscripts' must not be broken;
Gueron's, CCC, is relevant in cases where a variable is free in the 'presup-
posed' part of the sentence.
Within the ECP framework, the natural claim is that prosodic binding
creates an abstract governing category**) (an abstract S?) which functions
as an absolute barrier for government and prevents empty categories from
being linked to their antecedents:
(40) Prosodic binding creates an abstract governing category7
Thus, (40), if true, might explain observation (18). In this case, one must
note that although structure building rulesW for LF may not be desirable

7
O n e can explain why the relation between an Α-bound category and its antecedent is not
affected by the 'governing category' created by prosodic binding, in suggesting that in this
case the coindexing takes place before the matching of syntax and prosody.
Logical Form and Prosodic Islands 321

if they are extensions of syntax - because of the projection principled) -


they may be desirable if they are extensions of the prosodic component,
where the projection principle is irrelevant. The prosodic restructuring of
the domain of the empty category may be compared, to a certain extent, to
the restructuring resulting from readjustment rules in phonosyntax.
This hypothesis accounts for the data presented so far, involving W H -
Movement, WH-Raising, Q-Movement, Q-Raising, etc. However, it does
not explain why it is prosodic binding and not other comparable pheno-
mena, which create prosodic islands. In (19) we saw that the flat contour
following contrastive stress does not correspond to such an island. And
there exist other rules which break the intonational curve into several
prosodic constituents, e.g. parentheticals, without creating prosodic is-
lands; (41) is perfect:
(41) Et qu'a-t-elle cru, dit Jean, que Paul a v u ( e ) ?
(And what did she think, said John, that Paul has seen?)
Gueron's CCC, given in (39), predicts that a constituent included in the
presupposed part of a sentence cannot contain a 'hole', a free variable.
One may reasonably compare the notion of presupposition, and the no-
tion of 'relation to context'; they look similar. They are very close, but in
fact, as noted above, not exactly identical: in (3), (4), (7), etc., we have
seen that the deaccented element may carry new information. So it could
be that the term presupposition is not really appropriate here. Because it is
necessary, anyway, to explain why prosodic binding, and not parentheti-
cals, creates prosodic islands, I will suggest slightly extending Gueron's
CCC, replacing the term 'presupposition' by a more general one, say 'dis-
course-bound'. Constraint (39) could be rewritten as (42):
(42) A complete constituent X' may not contain a variable not bound in
X.
(A complete constituent (. . . = 27); a CC is discourse-bound.)
Gueron's hypothesis would have the advantage of including the present
analysis in a general 'functional' analysis. However, her constraint is cru-
cially concerned with variables. Therefore it cannot for cases where empty
categories, although A-bound, are not bound by quantifiers. 8 Clitic-Place-
ment(x) is one such case.9

8
Anaphors must receive special consideration. They are problematic in the context of this
paper, for two reasons. First, it is difficult to find agreement among native speakers on
such data, since it is difficult to construct natural discourse involving prosodic binding and
anaphors; informants generally tend to reconstruct contrastive situations. Secondly, the ac-
ceptability (or not) of such data may be important for the model itself. See Appendix 2 to
this article for further discussion.
' Riny Huybregts (unpublished work) explores the hypothesis that the clitic is an A-binder.
Similarly, Aoun, in his GLOW paper delivered at Göttingen (1981), explores this hypothe-
sis.
322 Μ. Ronat

(43) (A: Paul est diabetique)


B: *Et Marie lui a laisse [manger (e) des gäteaux]
C: Et Marie l'a laisse (e) [manger des gäteaux]
(A: Paul is diabetic.
B/C: And Mary let him eat cakes.)
(44) a. (A: On dit que Jean est etudiant aux Beaux-Arts)
B: Oui, je Tai vu (e) [peindre aux Beaux-Arts]
(A: They say that John is a student at the School of Fine Arts.
B: Yes, I saw him paint at the School of Fine Arts.)
b. (A: On dit que cette voiture a ete peinte aux Beaux-Arts)
B: *Oui, je l'ai vu [PRO^RB peindre (e) aux Beaux-Arts]
(A: One says that this car was painted at the School of Fine
Arts.
B: Yes, I saw it being painted at the School of Fine Arts.)

IV Theoretical Considerations

On the basis of the data available so far, it seems very difficult to choose
between the two hypotheses, extended ECP or CCC, the former having
the advantage of empirical adequacy, and the latter the advantage of ex-
plaining the contrast (15.b)/(41). One can speculate that a solution may be
found in a third hypothesis, which would subsume the advantages of the
preceding ones.
This solution will depend crucially on the place attributed to intonation
in core grammar. Previous studies either rejected intonation as being out-
side of the domain of competence (and it is true that emotions, for in-
stance, are also expressed by means of intonation), or they included it in-
side the grammar, but as a part of the phonological component, very close
to stress phenomena. See Liberman's work (op. cit.). Selkirk's (1978)
proposals, however, constitute a first departure from the standard posi-
tion: they establish that prosody must be represented by a set of metrical
trees, independent of syntax, which expand autonomous prosodic catego-
ries. These prosodic categories are supposed to be matched with syntactic
trees, at the level of surface structure, on the left side of the model (cf. Ap-
pendix 1 to this article. Ed.)
But the data presented in this paper show that intonation must look at
S-Structure, at Logical Form and perhaps at deep structure, too: obvi-
ously, it is not sufficient to propose a matching between prosodic trees
and surface structure. The idea which seems natural, then, is that intona-
tion may function as a component which is autonomous from but related
to syntax, filtering sequences generated by the syntactic component from a
'third' dimension, to refer to a concept used by Vergnaud in another con-
text.
Logical Form and Prosodic Islands 323

The third solution might adopt a more abstract point of view than syn-
tax and prosody, and say that Universal Grammar may define what counts
as a governing category in each component. The rhythmic notion of 'repe-
tition', for instance, could subsume the recursive nature of syntactic gov-
erning categories (TVPand 5),10 and the nature of discourse binding. Then
(40) and (42) could be tentatively restated as (45):
(45) An Ä-bound category must be bound within all the governing catego-
ries in which it is embedded.
Needless to say, much work remains to be done in this area. In any case
the evidence presented in this paper strongly supports a theory which pos-
tulates the existence of empty categories over one which does not, and
strongly supports the linguistic reality of the antecedent/empty category
relation, since it can be 'heard', indirectly. Moreover, the same evidence
indicates that an important part of intonation must be treated as part of
linguistic competence; it would be strange to postulate that rules of per-
formance could be based on the PRO/WH-trace distinction.

Appendix 1: Terminological Explanations

Generative grammar is considered as a set of autonomous but interrelated


subcomponents which establish correspondances between sound and
meaning through syntax according to the following schema:

BASE
1
Lexicon + Deep
syntactic structure
Π
SYNTACTIC
TRANSFORMATIONS

S( = Surface syntactic)-structure

PHONOLOGICAL SEMANTIC
INTERPRETATION INTERPRETATION
I
Surface structure
I
Logical Form

10 Notice that for the purpose of our discussion, the syntactic definition of governing cate-
gory does not include the notion of 'accessible subject'. In fact, the examples presented in
this paper show that the constituent under prosodic binding must contain the governer of
the empty category. This must be taken into account in case (29).
324 Μ. Ronat

'Logical Form' is the name of the component in which linguistic theory


describes the syntactic aspects of semantic interpretation: either because
the input of semantic interpretation crucially depends on syntactic struc-
ture, or because semantic rules presents a syntactic form. The level of Log-
ical Form essentially describes rules of quantification, negation, etc., and
must be distinguished from 'Semantic Interpretation II' which takes into
account meanings. Its formalism includes standard logic (predicate calcu-
lus, variables bound by quantifiers etc.). It supposes the existence of se-
mantic movement rules (invisible in surface structure) like 'Quantifier-
Raising' which raises the Quantifier to the beginning of the sentence. So
the surface-structure:
Pierre est heureux avec une femme
may have a semantic representation similar to:
(3x) χ is a woman and Peter is happy with χ
Other definitions:
- A-bound/Ä-bound: Bound by an argument/bound by a non-argument.
An argument is a subject or a subcategorised (direct) complement of the
verb; a non-argument is not in these positions (for instance, the WH-
element in questions is in 'Comp', that is, attached to 'S' (the sentence).
- Binding Theory (see Chomsky, 1981): The binding theory summaries in
two principles the properties (complementary distribution) of pronouns
and anaphors. Moreover, the relation between full and empty elements
to their antecedent are explained by these principles.
- Case-marking: Case is assigned to NPs by their governor (see below),
even in languages which do not show overt Case paradigms. Case is ob-
ligatory for full elements. Verbs and prepositions assign (in English)
Objective and Oblique Case. Tense assigns Nominative Case; Genitive
Case is found in [NPX] Structures.
- Clitic-Placement is a syntactic transformation which moves the pronoun
from the argument position to preverbal position.
- Core grammar: the grammar described by the schema above. It is sup-
posed compatible with universal grammar conditions. Outside of core
grammar are exceptions to general rules.
- Govern: Verbs and prepositions (properly) govern their subcategorised
complements. Tense governs the subject of the sentence.
- Governing categories: constituents in which a governer can govern a
complement (NP and Sor S).
- Heavy-NP-Shift: (Stylistic?) transformation which inverts two comple-
ments of the verb, the longest going to the right of the shortest.
- Island: 'frozen' constituent, ie. constituent from which nothing can be
extracted nor refered to by a semantic rule.
- Projection principle: this principle says that once a verb is described as
having some subcategorisation properties in the lexicon, it must keep
them during the whole derivation, including logical form.
Logical Form and Prosodic Islands 325

- Structure building rules: create syntactic structures (consequently, change


the subcategorisation of verbs) during the syntactic derivation of the
sentences.
- Wanna contraction: Want+ to can become wanna when the empty cate-
gory is PRO (who do you want PRO to see (e)) but not when the empty
category is a trace (who do you want (e) to go there).
- WH-Raising: Movement rule in Logical form, 'moving' the WH-ele-
ment in multiple questions (who said what, etc.).
The concept of 'empty category' is one of the most important in recent
work in generative grammar. It refers to syntactic categories whose pres-
ence is indirectly attested by some grammatical rule, although they have
no phonetic content. Empty categories are:
1) 'PRO', which represents the empty subject of infinitivals or gerunds
(John thinks that [PRO to feed himself]/[PRO feeding himself] will be dif-
ficult);
2) 'traces' left by a syntactic constituent moved by a syntactic movement
rule; for instance, 'WH-Movement' moves the WH-element from its
deep structure position you gave what to Paul, to what did you give (e)
to Paul, where (e) is the 'empty trace' of what;
3) variables: empty element at the level of logical form, bound by the an-
tecedent Quantifier (see example above). Note the analogy between
WH-(e) and Q-x.

Appendix 2: Data for Anaphora and Default Accent

Personally I reject sentences containing a default accent on lexical ana-


phors. For instance, for me (46) is a well-formed discourse, whereas (47)
is not:
(46) A: Paul a parle pendant des heures au telephone.
B: A propos, les Dupont ont casse [le telephone]
(Paul talked for hours on the phone/ By the way, the Duponts
broke the phone)
(47) A: Paul et Marie ont parle l'un de 1'autre pendant des heures.
B: *A propos, les Dupont se souviennent [l'un de l'autre]
A: Paul and Mary talked about each other for hours.
B: By the way, the Duponts remember each other.
The problem with (47b) is that one can get a derivative interpretation
making it acceptable (Gibbon, personal communication). If the discourse
tends to mean that there is an opposition, a contrast, between to talk and
to remember, implying for instance that Paul and Mary are good friends,
and the Duponts are not good friends anymore. Then the discourse is ac-
ceptable. Moreover, one must avoid the interpretation: 'it is not true that
the Duponts do not remember each other'.
326 Μ. Ronat

If the anaphor/antecedent relation is affected by prosodic binding, then


the binding theory can handle this case, since it says that an anaphor can-
not be free in its governing category - while the CCC is inoperative, since
no variable is involved. But in this case one can see a discrepancy between
NP-traces and anaphors (see (35)-(36)), which must be taken into account
in some way. Lack of space prevent us from discussing this discrepancy
and other phenomena such aus inalienables, idioms, discontinous consti-
tuents, etc. These questions will be dealt with in a subsequent study.

Reference

Adjemian, J. C. (1978). A Functional Generative Theory of the Structure of French: Intonation


and the Problem of Syntax. Unpublished dissertation: University of Washington.
Aoun, J., N. Hornstein & D. Sportiche (1981). On wide scope quantification. Journal of Lin-
guistic Research 1 (3).
Bing, Janet M. (1979). Aspects of English Prosody. P H . D.: University of Massachusetts, Am-
herst.
Bolinger, D. L. (1977). Meaning and Form. London: Longmans.
Chomsky, N. (1981). Lectures on Government and Binding: the Pisa Lectures. Foris: Dor-
drecht.
Gu6ron, J. (1980). Logical operators, complete constituents, and extraction transformations.
In May & Koster (eds.), Levels of Syntactic Representation. Foris: Dordrecht.
Guferon, J. (1981). Remarques sur la representation de la quantification. In Attal, P. (eds.),
Actes du Colloque "Syntaxe et Semantique", U. de Haute-Bretagne.
Halle, M. & Vergnaud, J. R. (1979). Metrical structure in Phonology: a fragment. MIT: unpub-
lished paper.
Jaeggli, O. A. (1980). Remarks on To contraction. Linguistic Inquiry 11: 239-245.
Kayne, R. S. (1981a). ECP extensions. Linguistic Inquiry 12: 93-133.
Ladd, R. (1980). The Structure of Intonational Meaning. Bloomington: Indiana University
Press.
Liberman, Μ. Y. (1975). The Intonational System of English. Ph. D.: Cambridge, Mass.: ΜΓΓ.
Liberman & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry 8: 249.
May, R. R. (1977). The Grammar of Quantification. P H . D.: Cambridge, Mass.: MIT.
Safir, K., ed. (1979). Papers on Syllable Structure, Metrical Structure and Harmony Processes.
MIT Working Papers in Linguistics I.
Selkirk, L. (1978). On prosodic structure and its relation to syntactic structure. Unpublished
paper, Indiana University Linguistic Club.
Sportiche, D. & H . Koopmann (1981). Pronouns and the bisection principle. Unpublished
paper, Montreal (UQAM).
PETER WINKLER

Interrelations Between Fundamental Frequency and Other


Acoustic Parameters of Emphatic Segments

1. Aims

Following the classifications of intonation and pitch contours into certain


levels described b y ' t Hart & Collier (1975), f D movements and perturba-
tions (jitter) exist at the "atomistic" level of pitch or intonation patterns. It
is evident that f D is sometimes absent in small parts of utterances, and
sometimes the listener does not hear a pitch, but the gestalt of an intona-
tion contour and the general pattern of intonation results from other pro-
perties than f„ alone. Pitch contours that have rather different 'fine struc-
tures' with respect to their f 0 variations do not appear to have any
linguistic relevance (Lieberman 1974: 2434). It is possible that f D move-
ments and interruptions are perceived not as the pitch contour or as a
fundamental sound of the voice, but as a paraphonetic marker - as sound
which is not specific for language units like phonemes or intonemes. The
minimal pitch movements may be a supplementary criterion for speech
perception; these phenomena are investigated exhaustively in reports con-
cerning speech perception and the function of voice onset time (VOT).
For example, Summerfield, Bailey, Seton & Dorman (1981) showed that
the perceptual difference between /slit/ and /split/ is connected with the
typical movement of intensity and f Q within the segments. In real, natural
utterances there is a large scale of f D changes which are necessary neither
for the pitch contour nor the phonematic specific V O T nor the coarticu-
lated phoneme boundary. These f Q -changes are, in articulatory terms, to a
high degree 'uneconomical'. If a voiced [z] in fluent speech is spoken as a
devoiced but lenis [s] (or [z]) with the vocal cord vibrations starting within
the segment, it is the normal coarticulatory effect and the usual realization
of a phoneme boundary. But if the whole segment [z] in connected speech
is voiced, this part of the utterance sounds 'unusual', the missing intrinsic
voice onset 'means' something in a paraphonetic sense (something affec-
tive, personally typical, interactionally meaningful, sociolectally impor-
tant, etc.). Surely there is no overt reflection about what has caused this
sound. It manifests a specific character of the utterance which will be clas-
sified by the listener as 'emphatic' or 'conspicuous'. In such cases . . lis-
328 P. Winkler

XTR

RH0

WGL

PGL

see .

250 .

230 .

158 J

100 J

50

RKM
Figure 1: Acoustic parameters of
an emphatic utterance
(female speaker)
ose AKM = maxima of auto-
correlation function
PGL = intensity
WGL = speed
RH0 = zerocrossings
S 2. 2,23 ζ, Μ 3.« £.3C XTR - minimax
I I I I n n H I I I 1 1 II I I I I H I t I I I H I I 1 I I i I U I I l-ll I t I 1approximation
1ι1I
Acoustic Parameters of Emphatic Segments 329

teners are unable to differentiate the contours at any linguistic level


though they may ascribe different emotional contexts to the contours."
(Lieberman 1974: 2434). F 0 changes within a segment are in respect of
paraphonetic information not atomistic, but essential features (if we inter-
pret the term 'feature' not as a certain paraphonetic 'register' with a fixed
meaning or a fixed acoustical structure).
In Fig. 1 an example is given where the speaker inserts a coarticulated
superfluous interruption of f D between the boundary of [m] and [a]. Si-
multaneously the noise portion shifts (see parameters AKM = mean of
the autocorrelation function, and RH0 = zero crossings). This sequence
is a part of the utterance " . . . da haben die mir lauter R's gezeigt, und mal
war's verkehrt 'rum und mal war's richtig 'rum . . The speaker has seen
some laterally inverted occurrences of "R". The sequence is spoken in a
colloquial style, very emphatically (ironically; perceived as odd) and with
many deviations from the usual articulation. A paraphrase might be "This
was a crazy test; I had to identify letters in normal and inverted position -
I have no idea why this test was conducted". The German semantic unit
"und mal war's" (sometimes it was) has completely switched over to the
meaning "whatever it was, it was crazy". The meaning of this sequence is
independent of the semantic units; the actual meaning is constituted by
phonetic means at the moment of pronouncing the three words.
If we look for the acoustic pecularities which caused this effects we can
exclude the pitch contour as a whole. The paraphonetic information lies
well below the level of intonation, even sometimes under that of phonemic
segments. Additional features like voice timbre are present only in small
degrees (the spectral information of the speech material is limited to be-
low 10 kHz). We can also neglect the influence of accent (in the sense of
Lieberman, 1974: 2434) or stress (in the sense of Jakobson, Fant & Halle
1965: 15). Because acoustic parameters are used as well as linguistic and
paralinguistic markers, the paraphonetic information cannot be an attri-
bute of some particular 'substance' (like the on-off character of fG or the
absolute signal to noise ratio). We chose a "dynamic" hypothesis: the para-
phonetic information marks the manner of realizing the linguistically ne-
cessary configuration.
Let us continue with the same example. In observing the developmental
characteristics of all parameters, some typical and some atypical curve
types appear. The prototypical sound pattern of [t] has lack of f D and, si-
multaneously, increasing-decreasing movements of intensity (PGL), speed
(WGL), and zero-crossing (RH0) likewise. Atypical is the strong increase
in speed within the segment [s], but the form of f 0 , RH0 and intensity
(PGL) seems to be 'normal'. We can check this configuration by listening
to the tape: this segment indeed sounds striking. The paraphonetic infor-
mation is marked by dynamically changing the phonematically necessary
combination of acoustic parameters. These changes consist of short-term
330 P. Winkler

replacements of the standard curve types by other curve types. The new
configurations are not totally 'new' patterns, but they start from the
phonematically predetermined constellation. The change has to produce a
contrast effect in order to be audible as non-linguistic marking. The re-
dundancy will be used to interpolate further information; this can be done
by intensification, interruptions, supplements etc., in respect of one or
more parameters. This results in changing the combination of curve types
within a segment or segment-boundaries. It also follows from this that
neither particular curve types nor particular parameters can carry the in-
formation alone, but rather the configuration as a whole. The resulting
new combination may perhaps be prototypical for some types of affect, si-
tuation and so on. This will be ignored in the following analysis; the pur-
pose is only to list and calculate the combination patterns and to look for
prototypical differences between neutral and emphatic sequences.
To this end, neutral and emphatic utterances of two speakers were se-
lected from a dyadic live conversation. The sequences were classified into
seven curve types of four acoustic parameters at the segmental level for
computing the probabilistic combinations and comparing the two sorts of
utterances. Only the dynamic characteristics of parameters will be consid-
ered (such as curve direction: increasing or decreasing, etc.), not absolute
frequency (Hz) or intensity (dB). The results describe the combinatory
possibilities of f G with other parameters in a comparison of neutral and
emphatic segments.

2. Material and Methods

Material: Tape recordings of a non-prestructured dyadic conversation; no


instructions are given to the speakers and there was no phonetic preevalu-
ation, or eliciting of particular conversational articulatory styles. (Material
from the TAKE D of the Konstanz project "Analyse unmittelbarer Kom-
munikation und Interaktion als Zugang zum Problem der Entstehung so-
zialwissenschaftlicher Daten", financed by the Fritz-Thyssen-Foundation,
headed by Professor Th. Luckmann and P. Gross.) Recording of the se-
quences took place in a normal echoic room (film studio), with separate
microphones (Beyer Dynamic) and recorders (NAGRA, 19 cm/sec) for
each speaker.
Acoustic analysis: Digital signal analysis (f 0 : autocorrelation method; in-
tensity: smoothed absolute amounts; speed: differences x n + i - x n ; RH0 =
zerocrossings); computer-aided segmentation and transcription (PDP-
11/50 processor and software by the Institut für Phonetik und sprachliche
Kommunikation der Universität München, headed by Prof. Η. G. Till-
mann).
Acoustic Parameters of Emphatic Segments 331

Selection and classification: 245 segments of 25 emphatic utterances; 244


segments of neutral utterances. 'Emphatic' is the category for all se-
quences which seem to be non-neutral, non-normal, non-standard or
non-factual/detached.
The curve plots were arranged into two categories with the support of
listening to the tapes; after this the curves within the segments were styl-
ized as:
/ increasing ->• stable or = 0 \ decreasing
Λ in-/decreasing V de-/increasing f abruptly increasing
or abruptly decreasing or finishing within the segment (likewise without
considering the placement within the segment).
The curve characteristics were assessed for general tendencies; jitter and
small deviations were ignored. The type of curve is related to the whole
segment which is constituted by auditive evaluation ('phonemic segment'),
not by acoustic criteria. More than these seven curve types were antic-
ipated, but did not appear. In the example of Fig. 1 the segment [t] has
been classified as Λ Λ Λ"; the segment [ο] as "\ Λ /-»·"; [m] as
"\ / Λ A". The frequency of the curve types was listed in a matrix, the
combinations were calculated following the proposals of Altmann & Leh-
feldt (1980: 295f.).

3. Results

The 244 neutral segments are, in detail: 48 open vowels, 43 closed vowels,
25 voiced plosives, 9 unvoiced plosives, 20 voiced and 52 unvoiced frica-
tives, 32 nasals and 15 liquids. The usual phonetic classification of the em-

Table 1: Combinations of parameters and curve types emphatic vs. neutral speech
type / \ V Λ I ί parameter
parameter Σ ϊ S KS Γ XJ

M Α
Fo emph.
neuL A
<= Ρ
Ρ P«
<= Μ
Μ
Α^
P-=
Ρ
Ρ
Ρ
Ρ
245
244
35
34.9
24.4
32.2
.34
.48
.87 17.89

PGL emph. Ρ Μ Ρ^ Ρ Ρ Α ^ Μ 245 35 32.5 .32


.83 54.66
ncut Ρ Μ Α"* Ρ Ρ Μ"= Μ 244 34.9 56.3 .39
WGL emph. P^ Α^ Α Ρ Α Μ Μ 245 35 28.1 .15
.89 41.74
neut A«= Μ"·= Α Ρ Α Μ Μ 244 34.9 53.4 .30
RH0 emph. A^ Ρ Μ Α^ Α c Μ^ Ρ^ 245 35 26.5 .19
.98 23.49
neut M"= Ρ Μ Μ"* Ρ-= Α-= 244 34.9 41.9 .30

Σ emph. 87 221 191 259 57 94 71


neut. 52 247 110 409 53 41 62
it emph. 21.8 55.3 47.8 64.8 14.3 23.5 17.8
neut 13 61.8 27.5 102.3 13.3 10.3 15.5
S emph. 9.5 31.8 20.2 32.1 3.4 13.5 14.9
neut. 5.5 54.2 9.5 68.1 5.6 8.5 14.5
KS emph. .17 .28 .24 .24 .09 .23 .37
neut. .18 .50 .13 .47 .19 .33 .37
r .58 .93 .7 .98 .22 .66 .85
X! 5.27 18.61 4.01 177.61 4.30 10.84 6.59
332 P. Winkler

phatic segments is somewhat problematic, for the reason that untypical


phenomena occurred (exactly these are the subject of analysis); e. g.
voiced elements within unvoiced consonants, vowels with low signal to
noise ratio, etc.
Table 1 shows the statistical parameters and the classification of the
neutral and emphatic curve types. The letters mean:
If n;j> 0:
Ρ = preferred Ρ ^ 0.05 ζ > 1.645
A = actual Ρ > 0.05 - 1 . 6 4 5 < ζ < 1.645
Μ = marginal Ρ ^ 0.05 ζ < -1.645
If njj = 0:
V = virtual (see above)
I = irrelevant
Measure of concentration, that is: specialization (KS):
κ
KS = (η Σ |njj-E;j|)/2ni(ni) njj replaced by nj;
1 1
~ n; replaced by nj
Significant differences are underlined.
(%2 .οι;3 = 9.21; χ 2 qi;6 = 16.8; in the case of KS differences
about .1 are relevant).
Tables 2-4 list the combinations of f G with the rest of the parameters.

Table 2: Combinations of F 0 -types and PGL-types emphatic vs. neutral speech


F 0 -type PGL
PGL-cype / - \ Λ V 4 r Σ X s KS r Xs

/ e A A A 28 4 .17
η
<=
v^ A A V^
A^
V^ V^
A
-s
P^ 19 2.7
3.3
3.8 .35
.76 8.25
A
e A A A^ «= V A^ A 12 1.7 1.9 .25
.57 8.97
η A A Vs V s
V V^ A 5 0.7 1.0 .42
\ e A
<= A P^ A A P<= M^ 71 10.1 9.1 .25
.82 4.74
η V« A A^ A A A*= A « 29 4.1 3.8 .23
Λ e A P^ A A A A Α β 90 12.9 10.2 .14
.90 11.66
η A A^ A A A A M^ 161 23 22.5 .20
V e A A A A A A 19 2.7 1.1 .19
.51 4.58
η A A A A V^ A A 18 2.6 2.4 .19
1 e V A V A P^ 22 3.1 2.7 .36
.40 18.95
η V A v-= V A V« 5 0.7 1.3 .51
t e V A A^ V V v^ A 3 0.4 0.5 .34
.48 17.54
η V A V<= V V A«= A 7 1 1.3 .50

F„Z e 8 76 52 20 11 43 35
η 10 104 38 18 18 19 37
X e 1.1 10.9 7.4 2.9 1.4 6.1 5
η 1.4 14.9 5.4 2.6 2.6 2.7 5.3
s e 1.1 11.54 8.4 2.8 1.5 6.4 4.9
2.9 25.4 9.4 5.2 5.6 2.8 6.3
KS e .15 .17 .20 .13 .28 .19 .30
η .26 .07 .11 .18 .24 .37 .34
r .77 .90 .62 .90 .51 .72 .82
X> 13.21 17.29 15.34 11.4 24.8 9.35 10.03
Acoustic Parameters of Emphatic Segments 333

Table 3: Combinations of F 0 -types and speed-types (WGL) emphatic vs. neutral speech
F 0 -type speed
WGL-type / - \ Λ V 1 t Σ i S KS Γ XJ

A A A A A^ A 28 4 2.2 .28
/ e <= <=
P<= A v-= V^ A
.57 7.27
η A 16 2.3 2.5 .33
-*• e A A A A A A 51 7.3 6.1 .10
.85 6.97
η A A A V= A A A 26 3.7 4.7 .21
\ e A^ A A A A A A^
c
456 6.6 5.5 .12
.72 6.5
η V^ A A A A A P 28 4 3.3 .25
Λ e A A A A I^ A A 86 12.3 9.3 .09
.73 21.25
η A A A A A-= A A 154 22 21.3 .13
V e v^ A A A A A A 13 1.9 1.2 .28 .42 2.65
η A^ A A A A A A 10 1.4 0.8 .19
ι V A A«s V A^ A^ 13 1.9 1.4 .23
.37 35.69
η V A V^ V V v-= 3 0.4 1.1 .58
t V A V V 8 1.1 1.9 .36
.35 18.3
η V A v-= V V P«= A^ 7 1 1.3 .50

FCS e 1.2 9.2 6.2 2.5 2.2 6.5 3.9


η 2.6 24.2 7.7 5.1 4.3 3.9 6.9
KS e .14 .12 .06 .26 .56 .16 .11
η .23 .14 .17 .22 .12 .20 .17
Γ .72 .83 .86 .92 .28 .87 .85
X! 13.55 19.76 9.97 11.9 26.17 8.57 9.04

Table 4: Combinations of F 0 -types and R H 0 - t y p e s emphatic vs. neutral speech

Fo-Wc
RH0-type / \ Λ V i t Σ χ S KS r x>

/ e V A A Α A A 23 3.3 2.9 .18


η V A A
<=

<= Α.,
ν*5 A 7 1 1.3 .27
.69 9.6
e A
c A A A A ^ Μ 82 11.7 9.2 .11
.86 16.94
η P^ A P<= A A M<= Μ 112 16 15.6 .34
\ e A^ A A A Vt A A^ 22 3.1 2.2 .22
.67 5.54
η V= A A A A^ A P^ 15 2.1 2.5 .34
Λ e A^ P^ A A A 63 9 7.4 .16
.57 26.17
η I <= P^ M^ A <= A A A 76 10.9 14.6 .26
V e V^ A V V^ A 14 2 2.3 .24
.14 15.52
π A"= A V^ V A"= V^ A 7 1.1 1.2 .46
I e V A A A V A P^ 16 2.3 2.5 .36
.89 8.73
η V A A A V A A^ 16 2.3 2.4 .23
t e A Ae A A^ 25 3.6 1.6 .23
.48 8.09
η A 1 ^ A V^ V"= P^ p·* 11 1.6 2.1 .67

F
oS e 1.4 9.4 7.6 2.7 1.6 4.7 2.1
η 2.9 20.4 10.1 4.1 4.4 3.2 3.1
KS e .22 .14 .15 .34 .22 .14 .23
η .49 .18 .36 .18 .28 .41 .40
Γ .57 .93 .69 .70 .71 .63 .70
Χ1 15.24 20.03 15.97 13.92 14.01 8.57 2.15

4. Discussion

Methodological aspects: The grouping of segments into emphatic vs. neutral


categories is legitimized by the phenomenological principle of 'conspicu-
ousness'; i.e. one of the elements of pre-predicative reasoning. Unusual
sound configurations, which are not specifical to a phoneme, result in a
334 P. Winkler

conspicuous auditory impression. This impression will be confronted with


the measurements, then confirmed or falsified with the aid of listening to
the tapes. The stylization of the curve plots again is impressionistic and
takes the usual, plausible (or implausible) development of parameters into
consideration. Stylization of the curves could also be done with mathe-
matical criteria (smoothing, averaging, calculating the slope etc.). Me-
thodologically speaking, mathematical stylization depends on phenomen-
ological idealization. The arithmetic classification duplicates the natural
phenomenal constitution of types; it is validated with observational or, in
the philosophical sense, empirical tools. All methods of classification used
in the present paper are based on the scientific application of empirical
perception (Anschauung). This is also true in respect of statistical classifica-
tion, which uses mathematical algorithms for transforming empirical and
common sense knowledge about the normal distribution of probabilities
into scientific tools. To investigate such transformational steps is one of
the general aims of the project at Constance (see conclusion of the present
paper).
Differences between emphatic and neutral segments: In 12 combinations,
emphatic speech differs from neutral speech (see Table 1).
In emphatic sequences the following are more frequent than in neutral
ones:
/ speed, RH0
-*• speed
\ intensity
VRH0
ARH0
| intensity
t RH0
In neutral segments f D more frequently increases, decreases or in- and de-
creases; RH0 decreases abruptly more often than in emphatic segments.
The curve types of all four parameters correlate closely in both sorts of
utterance. This will be interpreted as support for the argument that no
specifically paraphonetic parameters or curve types exist. Because the
Chi-square test indicates significant differences in respect of all four par-
ameters, the types of utterance are different in their internal structure.
This can be measured with the KS parameter (concentration; specializa-
tion), although the measure KS is limited to one of the possible structure
attributes, i. e. the concentration of a curve type onto one or more acoustic
parameters. KS may be different in these cases, too, if the Chi-square test
indicates no significant difference, and vice versa. For example, f Q n e u t r a l is
attached more strongly to a few curve types than is f D emphatic ( - 4 8 vs. . 3 4 ) ,
curve types which are distinguished statistically (these are: / \ A). Also
speed {/-*•) and RH0 (/ Λ V) are attached to fewer parameters than in
emphatic segments (intensity shows no differences). Emphatic speech sim-
Acoustic Parameters of Emphatic Segments 335

plifies, compensates and sometimes removes the specializations typical of


phonemes.
With the distribution measure (curve types distributed over parameters;
see statistical parameters under the columns of the matrix) the two sorts of
utterances again correlate with high degree (exception: type A; r =.22).
It is to be interpreted as follows. Acoustic parameters can be used both as
linguistic and non-linguistic means. Only three curve types show differ-
ences: -*• \ and j.. The concentration KS is lower in emphatic sequences;
or, conversely, the curve form typical of phonemes is not associated with a
single or a small number of parameters. The same curve type appears - in
neutral segments - in 3-4 parameters (cf. the "normal" constellation of [t]
in Fig. 1).
Differences between combinations of f0: The virtual (V) classification will
not be interpreted (i.e. statistically possible but not in the samples).
The f„ neutral a n c ^ ^o emphatic show a medium to high degree of correla-
tion (exception: f D ν and s p e e d y ) . There is no essential difference in par-
ameter and curve type between phoneme-related and emphatic combina-
tions. But the paraphonetic prototypic distinctions - combinations which
help the listener to perceive emphasis - appear in the relation of f 0 and
parameter type.
F 0 V 4- speed A are statistically irrelevant in emphatic segments f G / +
RH0 A or f 0 - • -I- RH0 \ in neutral segments.
In emphatic utterances the following pairs are more frequent:
f c intensity RH0
\ \ i A
I t
A /
ί i i
The decrease of loudness accompanied by a decrease of fundamental fre-
quency is, for example, in German a signal of emphasis; in Danish it may
be the standard combination 'decreasing loudness' - the so-called stod.
More probable in neutral segments are:
f Q speed RH0 intensity
- / \ A
\ / t
Λ A
V A
4 ί i
ί \ 4 / \
Most of the differences between neutral and emphatic segments result
from the manner of combining the RH0-parameter. The structural attri-
butes of f D and RH0, however, have only two significant differences (RH0
and RH0 A), but nearly all concentrations KS are different. The f Q
types are concentrated differently in five cases (Chi-square indicates only
336 P. Winkler

a single difference = fD -*•); the correlation of RH0 with all fQ types is the
lowest one. F 0 Λ in emphatic sequences is attached more strongly to R H 0
types than in neutral segments; this is a deviation from the general tend-
ency that emphatic combinations are more varied.
F 0 and intensity are combined distinctively only in the abrupt initial/
terminal feature. The concentration KS of the seven f 0 -types in relation to
intensity or intensity in neutral segments is fixed onto few types. F 0 -+ and
fDv in emphatic segments are differently distributed over the 7 intensity
types than the neutral segments. In this case, this is not caused by the con-
centration KS; there must be another structural attribute which is not dis-
cernable with the methods used. The concentration of f 0 | for the intensity
types 1-7 is different, but Chi-square indicates no difference. It is clear
that Chi-square is a global, equalizing parameter which obscures the inter-
nal structural differences.
The combination of fD + speed is different only with the curve types A
f J,. The difference with speed Λ is not caused by the concentration KS.
F0-<- shows a difference in the Chi-square parameter between emphatic
and neutral segments, but it is not clear for what reason (neither concen-
tration nor statistical differences occurred). The case of fQ V in relation to
all speed types showed the only one negative correlation; in addition, the
concentration KS is very different.

5. Conclusion

The curve types of the fundamental frequencies were selected in the pres-
ent study as a reference point because this parameter can be set as the in-
dependent variable. Certain other parameters are related to aspects of fQ:
lower periodicity and greater loudness or speed at voiceless consonants.
The hypothesis was that these combinations can be destroyed in order to
mark paraphonetic information. The analysis of the combination suggests
that paraphonetic information is not a function of a special parameter, a
special curve type or - in our case — of additional elements (timbre ex-
cluded). It is a function of dynamically changing combinations of parame-
ter-curve types. The structural displacement which occurs in phoneme-re-
lated sound structure is fluent and brief. In a statistical sense, in emphatic
sequences the combinations will be used freely, the neutral structures will
be infringed, the concentration of curve types and parameters diminished
and the special links between curve type/parameter or parameter/curve
type changed or removed. Whether particular combinations are related to
certain sorts of meaning, affect, situation or personal characteristic was
not treated in the present paper.
The purpose of the analysis was to describe structural attributes with
which in everyday communication 'conspicuous' or emphatic utterances
Acoustic Parameters of Emphatic Segments 337

could be identified. The results also describe the criteria for scientific-
phonetic procedures of the impressionistic type for distinguishing neutral
and 'other' material. The phonetician judges on the basis of his experience
of 'plausible' and 'implausible' curve types whether the phonetic material
is (a) speech, (b) contains emphatic sequences or noise, (c) is analyzed by
machines and computers 'correctly' and how (d) speech synthesis rules
must be formulated.
The statistical classification reported in this paper is also a description
of parts of intersubjective (phonetic) knowledge: observational assess-
ment, checking and selecting of certain types of phonetic material is based
on the intuitively known probabilistic and combinatory structure of
speech stimuli. With criteria such as these, the linguistic phonetician elimi-
nates material which does not satisfy the neutral condition typical of pho-
nemes. Linguistically oriented phonetic research has developed technolo-
gies for smoothing emphatic structures, which is rather difficult because
pure elimination of parameters is senseless (for this reason the often crit-
ized phonetic 'laboratory pronunciation' is used). Researchers in speech
synthesis and speech recognition often apply the plausibility criterion as a
heuristic step in constructing a 'natural' synthesis procedure, or in imple-
menting statistical decision rules for automatic speech recognition sys-
tems. Haggard (1979) describes his algorithm as a tool for discarding
. . large families of implausible shapes including those in which constric-
tion changes sharply from point to point. . (p. 264). Or, a further exam-
ple, the same underlying heuristic decision algorithm is used by Jassem
(1979) for classifying short-time spectra (which results in logicomathemat-
ical formulae). Jassem chose for the purpose of finding criteria for statisti-
cal attributes in automatic speech recognition (a) shape and position along
the frequency axis, (b) the area contained by the spectral envelope, (c) as a
material object of uniform thickness with top and bottom surfaces (p. 82).
These classification features were further transformed into binary codes
for the classical distinctive features of fricative phonemes.
Evident knowledge - based on a common sense understanding of what
is 'conspicuous' - is, from a methodological point of view, the more or
less hidden resource of scientific activity. To discover some of these struc-
tures of knowledge was one of the implicit aims of the present paper.

References

Altmann, G. and Lehfeldt, W. (1980). Quantitative Phonologie. Bochum: Brockmeyer.


Antoniades, Ζ. and Strube, H. W. (1981). Untersuchungen zum "intrinsic pitch" deutscher
Vokale. Phonetica 38, 277-290.
Barry, J. W. (1981). Prosodic function revisited again! Phonetica 38, 320-340.
Brokx, J. P. L. and Nooteboom, S. G. (1982). Intonation and the perceptual separation of si-
multaneous voices. Journal of Phonetics 10, 23-26.
338 P. Winkler

Fönagy, I. and Berard, Ε. (1972). "II est huit heures": Contribution ä l'analyse sfemantique de
la vive voix. Phonetica 26, 157-192.
Haggard, M. (1979). Experience and perspectives in articulatory synthesis. In Lindblom, B.
and Öhman, S. (Eds.): Frontiers of speech communication research. London, New York, San
Francisco, pp. 259-274.
't Hart, J. and Collier, R. (1975). Integrating different levels of intonation analysis. Journal of
Phonetics 3, 235-255.
Jakobson, R., Fant, G. and Halle, M. (1965). Preliminaries to speech analysis. Cambridge,
Mass., 6. printing.
Jassem, W. (1979). Classification of fricative spectra using statistical discriminant functions.
In Lindblom, B. and Öhman, S. (Eds.): Frontiers of speech communication research. London,
New York, San Francisco, pp. 77-91.
Lieberman, Ph. (1974). A study of prosodic features. In Sebeok, Th. A. (Ed.): Current trends
in linguistics, Vol. 12. The Hague: Paris, pp. 2419-2449.
Lieberman, Ph. and Michaels, S. B. (1962). Some Aspects of Fundamental Frequency and
Envelope Amplitude as Related to the Emotional Content of Speech. Journal of the Acousti-
cal Society of America 34, 922-927.
Nakatan, L. H. and Dukes, K. D. (1973). A sensitive test of speech communication. Journal
of the Acoustical Society of America 53, 1083-1092.
Martin, H. (1981). The prosodic components of speech melody. The Quarterly Journal of
Speech 67, 81-99.
Summerfield, O., Bailey, P. J., Seton, J. and Dorman, F. (1981). Fricative envelope parame-
ters and silent intervals in 'slit' and 'split*. Phonetica 38, 181-192.
Name Index

Abercrombie, D., 203, 205, 208 f, 222, 226 Collier, R. 21, 23ff, 86, 197ff, 327
Adams, C. 203 f Coulthard, Μ. 46, 120
Adjemian, J. C. 314 Cruse, D. A. C. 67, 74
Allan, G. D. 208 Cruttenden, A. 14, 16, 67, 239f
Allerton, D . J . 16,67 Crystal, D. 20f, 121 f, 166, 171, 178,
Altmann, G. 331 180ff, 238
Anderson, S. 99 Currie, K. L. 120, 200
Andrews, A. 96, 112 f Cutler, A. 78f, 8 I f f
Armstrong, L. E. 121, 124, 130, 170, 188
Darwin, C.J. 85
Arnold, G. F. 68, 159, 226, 235, 240
Davy, D. 178, 180 ff
Aston, C. H . 86
De Pijper, J. R. 197
Atkinson, J. E. 32
De Rooij, J. J. 198
Aoun, J. 318,321
Dinnsen, D. 106
Bachem, A. 177 Dodd, D. 74
Bailey, P.J. 327 Dogil, G. 103,114,116
Bansal, R. K. 79 Donegan, P. 101 ff
Bearth, T. 162f Dorman, F. 327
Benveniste, E. 147 Downing, P. 258 f, 263 f
Berman, A. 134 f Dowry, D. 258 f, 265
Bing, J. M. 11 f, 67, 73, 166, 171 f, 188, 311 Drachmann, G. 101
Bolinger, D. 11, 12, 15, 67, 134ff, 170ff, Dressler, W. 98 f, 101 ff, 116
203, 228 f, 239, 312 Dubnowski, J. J. 245
Bolozky, S. 92ff, 99f, 106, 116
Bond, Z. S. 79 Eckman, F. 106
Bouwhuis, D. 42 Eikmeyer, H.-J. 183
Boves, L. 35, 44, 166, 283 Engdahl, E. 80
Bowen, J. 92,98 Erb, Η . J. 250
Brazil, D. 46f, 120, 127, 131, 171, 183, Fanshel, D. 144, 148
187, 234, 242 Fant, G. 329
BresnanJ. 95, 105, 134f, 139ff, 154, Faverge, J.-M. 307
160 f Fay, D. A. 78
Browman, R. 78 f Fellbaum, Κ. R. 249
Brown, G. 120, 171 ff, 186, 200, 226, 234 Fillmore, C. J. 144
Brown, P. 154 Fodor.J. A. 87
Brown, R. 78 Foss, D . J . 85
Fox, A. 122,125,175
Chafe, W. L. 144
Fricke, H . 163
Chao, Y. R. 171 f
Fries, W. 98
Cheung, J. Y. 86
Fritsche, B. 286
Chomsky, N. 96f, 99, 105f, 109, 112f,
Frekjaer-Jensen 250
174, 205, 253 f, 256, 311, 318 ff, 324
Fuchs, A. 67, 135 ff, 144, 155 ff
Clements, G. 107f
Fujisaki, H . 201
Clifton, C. E. 81
Cohen, A. 21, 23 f, 175, 195 Gabriel, K. R. 211
Coleman, H . O. 132 G a n o n g . W . F. 81
340 Name Index

Games, S. 79 Kisseberth, C. 99, 104


Geiger, R. 163 Klatt, D. H. 197
Gibbon, D. 77, 108, 162, 163, 166 f, 171, Klemm, G. 174
173 ff, 207, 312, 325 Klinghardt, Η. 174
Giegerich, H. 94,115 Knowles, G. 162, 226
Gimson, A. 93 Koopman, H. 318
Goldsmith, J. 107 Koutsoudas, A. 106
Govaert, G. A. 198 Krause, M. 21
Grice, H. P. 144 Kretschmar, J. 251
Gross, P. 330
Guiron, J. 311,318, 320 f Labov, W. 144, 148
Gussmann, E. 99 f, 104 Ladd, D. R. 11, 67, 73 f, 83, 114, 135,
137ff, 167, 171, 187, 240 f, 255 f, 258,
Haggard, M. 337
311 ff
Halle, M. 99, 106, 108, 115,174,205,
Ladefoged, P. 203, 206, 207
253 f, 311, 329
Lagerquist, L. M. 79
Halliday, Μ. A. K. 47, 67, 73, 124, 131 f,
Lakoff, G. 95f, 105, 134
144f, 150, 161, 175, 205f, 210, 222f, 226,
Lasnik, H. 96f, 105, 109, 112
241, 255 f, 258
Lea, W. A. 86, 203
Hardcastle, W. J. 168
Leben, W. 106
Harris, J. W. 91,92, 104,108
Lee, W. R. 73, 127
Hawkins, J. A. 67,144
Lees, R. B. 253, 258, 261, 265
Herok, T. 92 f, 99, 102, 112
Lehfeldt, W. 331
Hettwer, G. 249
Lehiste, I. 86,171,203,209
Hirst, D . J . 80
Levi, J . N . 255,258,261,265
Hockett, C. F. 210
Levinson, S. 154
Hofstätter, P. R. 293
Liberman, M. 13, 73, 156, 171, 175, 255f,
Holden, A. D. C. 86
311 f, 322
Hooper, J. 99
Lieb, H.-H. 267 ff
Hornstein, N. 318
Lieberman, P. 24, 169, 327, 329
Huybregts, R. 321
Lightfoot, D. 96 f
Iannucci, D. 74 Linell, P. 106
Isard, S. D. 82 ff Lowenstamm, J. 116
Luckmann, T. 330
Jackendoff, R. 256
Jaeggli, O. 112f, 319
MacCarthy.J. 108
Jakobson, R. 329
MacCawley,J. 105
Jarman, E. 239 f
Maeda, S. 197
Jassem, W. 77, 130, 166, 181, 185, 203,
Markel, J. D. 247
205, 207 f, 222 ff, 337
Mascar6,J. 106
Johns, C. 46, 120
Martin, J. G. 85
Johnson, S. C. 26
Martinet, A. 162
Jones, D. 209
May, R. R. 318
Kahn, D. 107 McHugh, A. 85
Kay, P. 258 f McNeill, D. 78
Kaye, J. 116 Mencken, H. L. 260
Kayne, R. S. 320 Miller, G. 38,41,43
Keenan, E. O. 144 Minifie, F. D. 86
Kenstowicz, M. 99, 104 Mohanan, K. 115
Kenworthy, J. 120 Mötsch, W. 258
King, H. 95, 105
Kingdon, R. 130, 170, 174, 226, 230, 255 Nakatani, L. H. 86
Kiparsky, P. 100, 108, 115 f Neisser, U. 175
Name Index 341

Noll, Α. Μ. 246 Sgall, P. 145, 150


Nooteboom, S. G. 198 Shen, Y. 203
Shields, J. L. 85
O'ConnorJ . D. 68, 159, 203, 208 ff, 226, Silva-Fuenzalida, I. 92, 98
235, 240 Skibniewski, L. 107
Odell, J. 204 Smith, H. L. 253, 255 f
Ohala, J. J. 189 Sowley, T. 311
Olive, J. 86 Sperber, D. 256
Ozga.J. 171 Sportiche, D. 318
Palmer, Η. E. 73,121,130 Stampe, D. 101 ff
Peterson, G. G. 203 Stanley, R. 106
Pheby.J. 131 Steele,]. 203f, 226
Pierrehumbert, J. 175 Stockwell R. P. 92, 98, 134 f
Pike, K. L. 10, 98, 166, 203 Streeter, L. A. 86
Pope, E. 132 Sudo, H. 201
Postal, P. 94, 96, 105, 112 £ Summerfield, O. 327
Poutsma, H. 253 Sweet, H. 95
Prince, A. 156, 255 f, 311 Swinney, D. A. 88
Pullum, G. 94, 96, 105, 112 f Szamosi, Μ. 134 f
Pynte, J. 80
Takefuta, Y. 283
Quirk, R. 122,236,253 't Hart, J. 21, 23 ff, 86, 175, 189, 195ff,
203, 327
Rabiner, L. R. 250 Terhardt, E. 243
Radford, A. 94,105 Tillmann, Η. G. 330
Reich, P. A. 175 Tonelli, L. 92 f, 99, 101 f, 112
Rennison, J. 99 Trager, G. 253, 255 f
Richter, H. 283f, 286, 294, 297, 304 ff Trim, J. L. M. 171
Riemsdijk, H. 109,113
Rietveld, T. 44, 283 Uldall, E. 203
Robinson, G. M. 80
Romporti, M. 21 Vaissifere, J. 201
Ronat, M. 113f, 143, 314f Van Katwijk, A. 23, 198
Roskam, E. 26 Vergnaud, J. R. 108, 115 f, 311, 322
Ross, M . J . 249 Verner, K. 228
Rossum, N. 35 Vögten, L. L. M. 21,195
Rubach.J. 106 von Essen, O. 173,189
Rudes, B. 99 f
Ward, I. C. 121, 124, 130, 170, 188
Safir, K. 311 Wegner, D. 286
Sag, I. 73 Willems, L. F. 21, 195
Schein, B. 311 Williams, Ε. 109,113,115
Schieffelin, B. 144 Wilson, D. 256
Schmerling, S. F. 256 f Winkler, P. 324 ff
Schubiger, M. 73, 122, 148 Wise, J. D. 249
Scott, D. R. 86 Witten, I. H. 205 f, 222
Scuffil, M. 74 Worth, D. S. 150
Searle, J. R. 147 Wurzel, W. 106
Seidel, R. 286
Selkirk, E. 94, 96, 98, 104f, 109, 111, Zimmer, Κ. 258 f
114f, 311,322 Zwicky, A. 93 ff, 102, 106, 115 f
Selting, M. 173 Zwirner, E. 283f
Seton.J. 327 Zwirner, Κ. 283
Subject Index

aa-structure (Fox) 125, 129 f artificial contour 195 f, 198


A-bound/Ä-bound opposition 319ff aspiration 170
ab-structure (Fox) 125, 128, 130 assertion 16,64
accent A (Bolinger) 162, 169 £f, 228 assimilation 93, 102, 104, 111 ff
accent assignment 144, 154 ff, 159 asymmetry measure 300 ff
accent Β (Bolinger) 162, 228 asymmetry-type 288, 293ff
accent C (Bolinger) 162, 169ff, 228 attitude 14, 68 f, 240, 270
accent place(ment) 134 ff, 279, 281 ~ of the speaker 16 f, 166
accent point 228, 231 f, 234ff doxastic ~ of the speaker 267, 281
accent pulses 169, 185 f, 189 f propostional ~ of the hearer 267 f, 270,
accent sequences 174, 176, 180, 186, 188 274, 278,280
accent suppression 229 f, 232 f attitude/content pairs 277 ff
accentual measure 103 attitudinal meaning 121, 130
acceptability 195f, 314, 321, 325 attributes 259 f, 261 ff
acoustic correlates 4, 5 ff attributive constructions 134
acoustic signal 204, 243 f audition 168, 190
addressee of utterances 270, 281 auditive evaluation 331
adjectival modifiers 73 auditory correlates 4, 23
adjectives 74f, 82, 136, 255 autocorrelation method 245 f, 249 f, 330
adult-to-infant speech 173, 187, 234 automatic speech recognition 337
adverbials 16, 18, 48, 69, 72f, 95, 129 autonomous prosodic categories 322
adverbs 73, 166 autonomy hypothesis of intonation 2, 181,
affect 327, 330, 336 207 f, 227
affricates 211 ff auxiliaries 95, 103, 140, 146ff
algorithm 32, 246, 337
allegro 9 i f f B-contour 11 ff
allophones 122 bandwidth 172
allusion 313 bar 174, 204f
A M D F method 249 f baseline 172, 183, 185 ff
ambiguity 17, 18, 86, 124, 155, 271 ba-structure (Fox) 125, 128, 130
amplitude 85f, 186, 251 beat 205,207
anacrusis 171, 185, 206f, 213f, 218ff belief 166
analogue signal processing 244, 250 speaker- 267f,278,280
anaphora 67, 166f, 184, 321, 325f Bengali 235
andante 91 ff binary features 24
antecedent/empty category relation 311, binding theory 319, 324
316f, 319f, 323 body 172, 174
apology 62 booster 180
appositional phrases 129 boundary 177, 185
appraisal 166, 179f, 187f ~ of a prosodic island 317
Α-rise (Ladd) 1 I f f , 73 constituent ~ 166
artefacts of analysis 183 grammatical ~ 104
articles 47 f, 55 f, 67, 82, 103 pause ~ 99
articulation 168, 177, 190, 327 phoneme ~ 327, 329
articulatory correlates 4, 169 sentence ~ 319
Subject Index 343

syllable ~ 99 lenis ~ 21 Iff


syntactic ~ 86 unvoiced ~ 36, 331
tone u n i t - 12, 14, 232ff voiced ~ 81, 331
word- 104f, 206 voiceless ~ 81, 336
British School of intonation analysis 8 conspicuous utterances 327 f, 336 f
constant/variable distinction 229 f, 235,239
C-contour 11 ff constituent structure 271
call-contour 167, 170 f, 177, 186f, 240 f content 227
cap-pattern 28 f lexical ~ 147, 154
case 187, 320, 324 content words 160
casual speech 204,210 context 67f, 71, 75, 80, 82f, 85
categorial explication 174 contrastive ~ s 186
categories and syncategories 166, 187 discourse ~ 120f, 132f, 166f
categorization 259 ff extragrammatical ~ 91, 116
centre-clipping 245, 247 f interaction ~ 47 ff
centre in S 270, 275, 279 intonational ~ 121
cepstrum method 245 ff locutionary ~ 190
child-adult speech 173 natural ~ 316
chroma 166, 172, 177, 186 preceding ~ 236
citation 47, 62, 166 f relation to the ~ 313, 321
classificatory relation 259 ff situational ~ 137, 148
clefting 86 f context sensitivity 187 ff
cliticisation 95f, 111, 115f contextual feedback 190
clitic-placement 318, 321, 324 contingency tables 293 f
clitics 209,321 contours vs. levels 189
coarticulation 327,329 contours, syncopated 234 ff
cockney 70 f contraction 95, 97, 105, 115, 319, 325
coindexing 320 contradiction 240 f
comedians 232 contrast 3, 12, 14, 100, 166, 322, 330
comma 14,86 contrastive accent 82f, 86, 134ff, 162, 267,
command 227, 240 f 281
comment clauses 129 downward ~ ~ 275, 279
common ground 48, 59, 64 contrastive intonation 49 ff
common knowledge 141 contrastive stress 229, 238, 255 f, 311 ff
communication 144f, 155, 161, 193, 199, co-ordinate clauses 131
226 f, 288 copula 82 f
comparative philology 228 coreference 139,258
competence 97 f, 168, 197, 200, 322 f core grammar 319, 322, 324
complementary distribution 3 correlation 4, 168 f
complementisers 95 correlation analysis 245, 300 ff
complement phrases 95 crescendo 241
complete constituent constraint 320 ff cricothyroid pulse 169 f
complex nominals 255 cues in prosody 18, 85 f, 89
compounds 82f, 141, 163, 229, 253ff culinary terms 261, 263
creation of new ~ 263 cyclicity 106, 113,156,232
comprehension 86,230
computer-aided segmentation 330 Danish 335
concatenation 128 D contour (Ladd) 11 f
concomitance 4 , 9 deaccenting 82f, 134f, 198, 229, 254ff,
conformity 47 312f, 315, 317, 321
conjunctions 82,94 declaration 27 f
consonants 96, 112, 226, 228 declarative sentences 275
fortis ~ 21 I f f declination (line) 21, 38ff, 195f
344 Subject Index

deep structure 147, 253, 322 f, 325 empirical adequacy 322


default accent 67, 83, 87, 134f, 311 ff empty categories 112f, 311, 315, 317f,
deictics 144 321, 323, 325
delayed accent 229 empty category principle 320, 322
deletion 93 f, 106, 112 empty trace 325
denotation 146, 148, 160 empty words 240
dependent clauses 131 enclitics 116,206
determiners 94 endings 78 f, 148
dialects 71, 73, 98 f, 100, 170, 172 f, 226, English spoken by Indian speakers 79
230, 235 f, 240, 256 English tradition 120,172
dialogue progression 178 epithets 14, 15
dialogue schemata 267 ff equilibration 177
dialogue structure 177 errors in pitch measurement 245, 250 f
diasystemic approach 226, 241 excited speech 172
digital signal analysis 330 exclamation 174
digital signal processing 244, 250 f explanation 167 f, 322
diphthongs 211 f, 228, 288 expletives 14, 16
discourse 143,312,314,319,321 exponent 137
~ articulation 151 Extended Standard Theory 109 ff
~ binding 321,323
factor analysis 307 ff
~ conditions 59, 135, 141, 162, 315
fast speech 9 I f f
~ cues 86f
feedback 168, 172, 177, 187, 190, 207
~ domain 10, 16, 18
feelings 18
~ organisation 166
filtering 250
~ phenomena 280
inverse ~ 247
~ phonology 9
flaps 100,112
~ setting 49, 53, 61
F 0 movements 22 ff, 304 ff, 327 ff
~ structure 188, 280
focus 74, 82ff, 137ff, 166f, 256, 258, 312,
~ type 166, 178
314ff
~ unit 147
foot 112, 204 ff
discovery procedure 279
formal locus 147, 149
discrimination experiment 305
formants 245, 248, 251
disease nouns 260
fortition 101 f, 170
domain 4, 16
fossilisation 101
dominance 59, 62 ff, 240
Fourier transform 246
double nucleus 125
free variation 98,141
downdrift 172, 177
French 80, 313 ff
downslope 180f, 183, 188
frequency domain methods 244
duration 85 f, 244
fricatives 211 ff, 288, 331, 337
~ of phones 207,211,220,223 function 101, 147
~ of syllables 209 demarcation ~ 197
Dutch 22, 44, 189, 193, 197, 199f discourse ~ 14ff, 166f, 125, 127f, 313
dyadic conversation 330 domain-switching ~ 166
metalocutionary interpretation ~ 166
early phoneticians 203 f prominence- lending ~ 197f
echo tags 71 functional equivalence 142
electromyography 200 functionalism 167 f
ellipsis 230 functional variation 166,168
embedding 126, 184, 317, 323 function words 82 f, 94 f, 114 f
emergency contour 199 fundamental frequency 11, 13, 21, 22 ff,
emics vs. etics 162 35ff, 85, 243 ff
emphasis 82f, 86, 131, 137, 140, 147, 154,
166, 170, 187 gender 80
Subject Index 345

genera verbi 146 f, 149 infinitival complements 316


genre 166, 187, 189 infinitivals 325
German 92, 106, 114, 146, 149ff, 173, infinitives 147, 149, 154
175 f, 188 ff, 198, 267, 270, 275 ff, 285 ff, informants 276,321
329 information 259, 281, 313
gerunds 325 context ~ 186
gestalt 168, 284, 305, 327 lexical ~ 186, 188
gestures information point 124
facial ~ 234 integration 136, 163
sound ~ 304 integration techniques 249
thoracic and laryngeal ~ 168 intensification 330
given vs. new 52ff, 68, 87, 89, 120, 128, intensity 34, 168, 327ff
135ff, 144f, 161, 166, 229f, 235, 280, ~ movements 3 2 7 , 3 3 0
312, 315,321 interaction 146, 160, 327
glottalisation 100 interaction of accentuation and intona-
governing 320, 323 f tion 199f
gradient 226, 230, 233, 238 ff interest-changing questions 286 ff
grammar 53, 68 f, 75, 103, 227 interjections 174, 179
generative ~ 258, 265, 319, 325 interstress interval 208
~ of intonation 25 f, 35, 39, 197, 200 intersubjective knowledge 337
poly centric ~ 103 intonation contour 10,11
greeting 174,235 intonation grammar 40
intonation group 6 8 , 7 2 , 1 7 4
harmonics 250
intonation lexicon 175
hat-pattern 27 ff
intonation structure 275
head 171 f, 176, 180, 183 ff, 231 ff,
intonemes 327
head nouns 255, 257, 259 f, 263, 265
intonator 22, 27, 43, 195
head of construction 320
Irish 237, 240, 242
hearing 243 f
irony 7 1 , 7 5 , 3 2 9
Hebrew 92 f, 116
isochrony 203, 226, 230
hesitation stretches 18 5 f
Italian 92 f
heterarchy of speech perception 207
iteration 142, 144 ff, 174 ff, 18 8 ff
heuristics 167 f
hierarchy Japanese 201
~ of accent assignment 154ff jitter 327, 331
~ of tone units 231 ff judgments similarity ~ 26
high-fidelity 245
holism 305 key 120, 171, 234
Η-pattern (Fox) 122ff knowledge 166

ictus 230 laboratory proununciation 337


idealization 334 lack of accent 137
idiolect 99 largo 92 f
~ of the linguist 281 larynx 168 f, 177, 190, 195
idiolect system 268, 2 7 0 f f laxing 102
idioms 265,326 least-square straight lines 283 ff
»/clauses 54,237 left-asymmetry 284 ff
illocutionary act 147 leftward shift of stress 265
illocutionary force 227, 269 lengthening 100,231
imitation technique 27 ff lenition 101 f
immediate constituents 155 f, 159 levels (vs. contours) 23, 44, 120, 189
immediate domination 320 levels of analysis 3 f, 174
imperative(s) 63, 147, 152 levels of organisation 181
indeterminacy relation 244 lexical category 320
346 Subject Index

lexical elements/items 68 ff, 145, 258, 261, musical notation 209, 226
264 musical scale 21
lexical interpretation 270 musical tradition 305
lexicon 103, 113, 227, 265, 323
linear prediction method 247 ff naming of artifacts and material 263 f
liquids 331 narrow rhythm unit 206
locution 165, 182, 184, 186ff, 231, 236f nasals 112, 211 ff, 288, 331
loudness 203, 244, 336 native speaker 274, 281 f, 321
low-content elements 140, 154f, 159f natural discourse 321
low rising contour 10,12 natural speech 27
L-pattern (Fox) 122ff negation 148 f, 151, 166, 324
negatives 69, 73 f
markedness 131 f, 138, 171 f, 186, 189, negotiation 178 f
255 f networks 174 ff
— for deaccenting 83 noise 337
~ for focus 83 nominal group 47 f, 55
~ for stress 77 f, 82 f non-final intonation phrases 10
marking structure 271 nonsense sequences 80 f
masking 244 non-syllabic vocoids 211 ff
matrix of acoustic valuations 7 noun group 272
maximum likelihood method 248 f noun phrase 256 f, 320
meaning 238, 242, 258, 312, 323, 327, 336 noun phrase tags 128 f
l e x i c a l - 269, 272, 275 f nouns 67, 82, 136
~ of intonation 120f, 132, 198 NP-shift 317, 324
~ of sentences 268 f, 277, 280 NP-trace 319,326
~ of words 268 N R U (Jassem) 206 ff
meaning base 269 ff nuclear accent 140
meaning switch 329 nuclear pitch pattern 123
measure of concentration 332 nuclear fall 11
memorized patterns 207 nuclear syllable 11,12
memory 237 nuclear tone 12
musical ~ 207 nucleus 120, 131, 174, 180, 186 ff, 231 ff
mental categories 200 object 135ff, 317
mental lexicon 78 f, 81, 89 obligation 166
metacommunication 150 observation capacity 193
metalinguistic judgements 167 obstruents 106
metalocutionary hypothesis 166, 177 octave leaps 32, 245
metaphors 75 onset 39, 171, 174, 180
metonymy 313 opposition 145
metrical trees 311 f, 322 phonological ~ 48
microcomputers 251 oppositive vs. relational features 146 ff
mirror image rule 106 organisational frames 183ff
modalisation 146, 149 orientation (Brazil) 51 ff
modality 166 orthography 14
Modern Hebrew 92 f orthographic convention 86
monosyllable rule 114 orthographic tradition 305
monotonising 85 orthographic word 270
monotony 43 f oscilloscope 251
mora 206
morphology 108, 149f, 204 paralinguistic markers 329 f
morphophonemic syllabification 210 paralinguistics 227
morphophonological variation 104, 106 parameters of acoustic signals 7 f, 24 ff,
multidimensional scaling technique 26 330, 334 ff
Subject Index 347

paraphonetic information 327, 329, 336 pitch extraction 35 f, 243 ff


paraphrase 167 pitch movements 20ff, 38, 226, 228 f, 234,
paratone-group 122, 183 239, 241 f, 305 ff
parenthesis 184f, 188 accent-lending ~ 199 f
parentheticals 95, 314, 321 perceptually relevant ~ 197 f, 200
participles 147, 149 prominence-lending ~ 23 ff
particles 103, 163 virtual ~ 36 f, 39 f
pattern recognition 175 pitch perception 243 f
pauses 13, 14, 36, 71, 171, 180, 204 pitch range 122
P-data (Gibbon) 167f, 174, 187, 189 pitch sequence 183
peak-clipping 245, 247 ff place names 260 ff
peakline 172, 177, 183f, 186f., 189 place phrases 237
perceptibility 101 plosives 331
perception of accent and stress 86 ff poetry 209
perceptual equivalence 2 I f f point of relevance 155 f
perceptual relevance 27, 36, 39 polite expressions 17
performance 89f, 168, 323 post-clitics 209
periodicity 245 f, 249 f, 336 pragmatics 82, 84, 258f, 263, 265, 288, 315
personal characteristic 327, 336 pre-clitics 209
phenomenalism 305 predicate 134 ff, 209, 223, 258, 264
phenomenology 334 predicate calculus 324
phoneme-monitoring experiment 84 f, 87 f predicate noun 151,162
phonemes 211,327, 333,335,337 predication 312
phoneme variants 122 nexus vs. statement function of ~ 147 ff
phonemics 98 f, 329, 331 predicative markers 163
phonemic stability 98 predicative nominal 148
phones 204,209,211,223 prediction 167
phonetics 207, 239, 314 f, 325, 329, 337 prediction coefficients 247 f
physiological ~ 189 prehead 171, 176, 180,186
quantitative ~ 283 prenuclear fall 12
phone-to-syllable synthesis 210 prepositional phrase 97
phonological component 311, 320, 322 f prepositions 82 f, 114, 140
phonological interpreation 323 pre-predicative reasoning 333
phonological representation 99, 106 ff preprocessing 245,249
phonological rules 99,106,319 presto 92 f
phonological word 205, 270 presuppositions 237, 316, 320
phonology 177 f, 189f, 223, 239, 255, 265, primary stress 78
314 P R O 319, 322f, 325
Autosegmental ~ 107 ff, 174 procedural explications 174
Generative ~ 2, 205 process-oriented approach 168, 174, 189
Lexical ~ 108 proclaiming vs. referring 48, 54, 62, 120,
Metrical ~ 107 ff 127f, 131
Natural Generative ~ 99 ff pro-forms 230
Natural (Process) ~ lOOff programming of intonation 199
phrasal ~ 108 prominence 23, 27, 34, 38
Standard Generative ~ 99, 103ff pronounceability 101
phonometry 283 pronouns 67, 84, 103
phonosyntax 321 projection principle 321, 324
phonotactic constraints 106, 115 proper names 262
phrasal stress 253,259,261,263 proper noun 231
phrase boundary 11,13 proper response relation 268, 273, 277 ff
phraseology 208 propositions 269, 320 f
pitch accent 23, 27, 35, 38, 42, 198, 200 proprioception 168
348 Subject Index

prosodic adjacency 111 resynthesis 21


prosodic binding 311, 314 ff rhetoric 227 ff, 313
prosodic component 311, 320 ff, 323 rhyme 230
prosodic indexing 160 rhythm 2, 20, 108, 175, 180, 185f, 206,
prosodic transformations 114 ff 226 ff, 323
psycholinguistics 7 7 f f , 288 rhythmic accent 207
Ρ t o n e (Brazil) 47 ff rhythmicality 203 f
pulse accent theory 165, 183 ff rhythmic linking 206
pulse amplitude 169 rhythmic structure 255 f, 312
pulse modifications 187 right-asymmetry 284 ff
pulse timing 169 role relationship 62 ff
pulse types 170 roots 78 f
punctuation signs 270 f R . P. 7 1 , 7 3 , 2 0 8 , 223, 226 ff
R tone (Brazil) 47 ff
quality measure 38
quantification 166, 321, 324
sampling frequency 244 f
quantifier-movement 317, 321
sandhi 231 f
quantifier-raising 318, 321, 324
scaling techniques 26
question-answer pairs 281
S - d a t a (Gibbon) 167f, 176
questions 10, 27, 87f, 97, 120, 124, 134ff,
secondary accent 186
170, 234, 237, 2 4 1 , 3 1 4
secondary stress 24, 255
indirect ~ 316
segment duration 239
matrix ~ 316
self-reference 270
rhetorical ~ 70
semantic effects of accent occur-
tag ~ 65, 124, 126, 129
rence 268 f, 277 f
wh- ~ 134
semantic frames 166, 177 ff
yes/no ~ 132, 227, 237, 286 ff
semantic interpretation 323 f
quotation 15
semantic opacity 265
radio-broadcast 199 semantics 82, 84, 96, 134ff, 166, 178, 207,
ramification 155 ff 2 3 8 , 2 5 8 , 2 6 0 , 265, 288, 314f
ratification 144 ~ of sentences 48
reading 34, 40, 42 ff, 46 ff, 166 f, 184, 187, semitones 32
189, 204, 234 semi-vowels 288
readjustment c o m p o n e n t / r u l e s 253, 321 sentence accent 139, 155,162
real time analysis 245 f, 248 ff sentence adverbs 237, 285 ff
recitation 231 sentence stress 187,200,267
rectification accent 140 sentence understanding 84, 86, 89
recursion 174ff, 323 sequences
reduction 81, 94 ff, 111, 113 ff, 230 ~ of segments 210
redundancy 330 ~ of syllables 207
reference 269 ~ of tone groups 121 f, 125
~ to the context 2 5 5 f , 2 5 8 f , 319 sequencing of speech acts 166
referring vs. proclaiming 48, 54, 62, 120, shorthand 231
127f, 131 shortening 100
register 166 f short-time spectrum 337
regression analysis 217 ff signal processing 2
relative clauses 121, 129, 134, 316 signal spectrum 248
relative pitch 234 silent stress 2 0 5 , 2 4 1
request 62, 147, 149, 240 situation 137, 190, 330, 336
response 70 slips of tongue and ear 78, 84
response compatibility 268, 273 f, slow speech 204, 207
276 ff small talk 166
restart 180, 182, 186, 189 sociolect 327
Subject Index 349

sociolinguistics 226, 235 f preictic and postictic ~ 208


sound change 103 unstressed/unaccented ~ 185, 208,223
Spanish 92 syllable division 209 f
speaker-dependence 28,40 syllabic contoids 211 ff
speaker's attitude 16 ff synonymity 18, 73
speaker's doxastic background 281 syncopated contours 234 f
specifiers 148 f, 153 f syncopated rises 235 f
speech acquisition 88 f, 168 syncopation 242
speech/language comprehension 79, 86, syncopy 102
97, 230, 232 synecdoche 313
speech errors 79, 84 syntactic hierarchy 2
speech/language perception 77, 79, 89, syntactic stress 271
168, 171, 177, 189f, 193ff, 204, 207, 210, syntactic transformations 323
327, 330 syntactic units 268 ff
speech/language production 77 f, 84, 89, syntagmatic relation 145, 161
193 ff, 230, 232, 276 syntax 8, 86, 96f, 109, 112f, 123f, 126,
speech style 44, 93, 98 ff, 204, 329 128, 130 f, 134, 136f, 155ff, 187, 203 f,
speech synthesis 20, 337 206f, 210, 222, 237, 253f, 256, 258f, 261,
speed 328 f, 333 ff 264f, 311, 314f, 320ff
spontanous conversation 197 synthetic contours 2 I f f
spontanous discourse 151 ff synthetic stimuli 81
spontanous speech 281
staff notation 21 tag questions 65, 69ff, 124, 126, 129
statement 120, 124, 227, 314 tags 172, 184 f, 188
statistical classification 334 tail 13, 186, 230, 233, 235, 238, 240 f
stem 148 f taxonomic models 101
stops 81,85, 21 I f f telephone transmission 230,251
story-telling 63 tempo 2, 20, 100, 107, 206, 229, 231,233
sted 335 temporal organization of speech 2, 20, 224
strategies 177, 190, 227 f, 236 tense 146
interpretive ~ 258 testing of rules 42 ff
stress assignment 159 test waveform 248 f
stress pattern 78, 80, 82, 89 text 169
stress scores 37, 42, 44 text-to-speech conversion 197
stress-timed languages 94,114 textual status 127
string adjacency 92, 95, 104, 111, 115 theatre plays 197
strong vs. weak constituents 255 ff, thematisation 52 f, 55 f
312 theme 145, 149 f
structural adjacency 95f, 98, 102f, 111 tiers 107
structuralism 226 timbre 244, 329, 336
structure building rules 320 time domain methods 244
stylistics 324 time phrases 237
stylistic variation 166 f io-adjunction 96f, 103, 105, 112f
stylization 4, 172, 175, 186f, 241 tone group 120ff, 175, 180, 222f
stylization method 44, 195 ff, 334 tone languages 107, 162 f
S-structure 318,322f tone model 226, 233, 239
subglottal mechanisms 169,171 tones 46ff, 80, 122, 125, 127f, 130, 175,
subject 134ff, 209, 236ff, 316f, 325 185, 228
subordination 131 ff, 223 tone-unit 230 ff
superlatives 69, 74 f tonic 186
surface structure 253, 322 f topicalisation 86
syllables 23, 35, 115, 166, 169ff, 196, 198, topic vs. comment 166,174
200, 204ff, 228, 230f, 235, 240f, 263 total rhythm unit 20 ff, 224
350 Subject Index

trace 325 modal ~ 63


trace theory 104, 113 phrasal- 140, 148 ff
trancription 20 ff ~ of 'saying' 15
auditorily based ~ 21, 193 verse 204,230
computer-aided ~ 330 virtual pitch 36, 243
derived ~ 37 ff vocal cords 169, 194 f
phonemic ~ 206, 208, 222 f vocative 12ff, 14, 237
tonetic ~ 178, 180 vocoder 43,247
trial and error 195, 249 voice onset time 81,327
trough 232 f, 236, 238 ff voice quality 226, 234, 240
T R U (Jassem) 207 f voicing 112, 239, 245, 249ff
True Generalization Condition 99 vowels 93, 102, 106, 209, 211 ff, 226, 228,
truth conditions 10, 14 ff 230, 233, 331
tune 67 ff, 121, 124, 242 vowel harmony 107
turn 178 vowel onset 29
turn-taking 166, 179 vowel quality 81
underlying relations 258, 264
wanna contraction 105 f, 319, 325
universal grammar 323
warning 240
universal phenomena lOOff
weak forms 95
universals 239
weak vs. strong constituents 255 ff, 312
universe of discourse 145
Welsh 93 f, 116
utterance types 128 f
West Germanic languages 172, 189
varieties of English 226, 228, 241 tfA-movement 105, 316, 321, 325
Vedic tradition 228 wh-raising 318, 321, 325
verbal facility 50 WH-trace 112,323
verb complement 144,149 word recognition 79 ff
verbs 67, 82f, 95f, 112f, 136f, 142, 147ff, written language 86
227,237, 312, 316
epistemic ~ 13f, 18 zero-crossings 328 ff
RESEARCH IN TEXT THEORY
UNTERSUCHUNGEN ZUR TEXTTHEORIE
Editor: Jänos S. Petöfi, Bielefeld

Grammars and Descriptions


Studies in Text Theory and Text Analysis
Edited by Teun A. van Dijk and Jänos S. Petöfi
Contributions in English, French, and German.
1977. Large-Octavo. X, 404 pages.
With 3 foldout plates and numerous tables. Bound D M 148,-
ISBN 3 11 005741 7 (Volume 1)

Current Trends in Text Linguistics


Edited by Wolfgang U. Dressier
1977. Large-Octavo. VI, 308 pages. Bound D M 72,- ISBN 3 11 006518 5
(Volume 2)

Text Processing/Textverarbeitung
Papers in Text Analysis and Textdescription/
Beiträge zur Textanalyse und Textverarbeitung
Edited by Wolfgang Burghardt and Klaus Hölker
1979. Large-Octavo. X, 466 pages. Bound D M 156,- ISBN 3 11 007565 2
(Volume 3)

Prices are subject to change

W
Walter de Gruyter DE
Berlin · New York
G
RESEARCH IN T E X T THEORY/
UNTERSUCHUNGEN ZUR TEXTTHEORIE
Editor: Jänos S. Petöfi, Bielefeld

Frame Conceptions and Text


Understanding
Edited by Dieter Metzing
1979. Large-Octavo. XII, 167 pages. Bound DM 68,- ISBN 3 11 008006 0
(Volume 5)

Words, Worlds, and Contexts


New Approaches in Word Semantics
Edited by Hans-Jürgen Eikmeyer and Hannes Rieser
1981. Large-Octavo. VIII, 515 pages. Cloth DM 178,- ISBN 3 11 008504 6
(Volume 6)

Psycholinguistic Studies in
Language Processing
Edited by Gert Rickheit and Michael Bock
1983. Large-Octavo. VIII, 305 pages. Cloth. DM 124,-
ISBN 3 11 008994 7 (Volume 7)

Prices are subject to change

W
Walter de Gruyter DE
Berlin · New York
G

You might also like