A Theory of Language and Information A Mathematical Approach
A Theory of Language and Information A Mathematical Approach
A Theory of Language and Information A Mathematical Approach
A Theory of
Language and Information
A Mathematical Approach
ZELLIG HARRIS
derivational relations of the language. That they are not actually said may be
due to their complexity or length; in many cases it would be impossible to
formulate a regular grammar of the language that would exclude them as
ungrammatical. Therefore the unsaid sources can be counted upon as
derivational sources for the given word combinations.
Since some of the chapters may be read independently of the others, data
relevant to several different issues have been repeated in the respective
different chapters. A reader who seeks only the more general formulations
can safely omit various sections, chiefly 1.1.1,3.2.2-3.2.5, 3.3.2.1-3, 3.4.2-
3.4.3, 5.3.2-5.3.4, and Chapters 7 and 8.
It may be in order to state what language-acquaintance underlies the
results presented here. The statements in this book have been based on
detailed grammatical formulations of several European and Semitic languages,
and on outlinings of the structures in a variety of American Indian languages
(especially Algonquian, Athapaskan, Siouan, Iroquoian) and Eskimo, and
also in some African languages (especially Swahili) and some Dravidian.
The statements have been checked against existing grammatical descriptions of
a sampling of languages: Basque; languages of ancient Asia Minor (where I
had the advantage of information from Anna Morpurgo Davies); Sumerian;
some languages of the following groups: Australian, Austronesian, Papuan
in New Guinea, Micronesian; and finally, Chinese and Korean. The variety
of structures in the languages surveyed is of course vast; but it was possible to
see that the dependence relation of operator on argument, on which rests
the syntactic structure presented here, is a possible underlying description
for them. There was no possibility of determining whether some of the other
constraints, defined here upon the basic dependence, hold for all the
languages.
The work here starts off from the distributional (combinatorial) methods
of Edward Sapir and of Leonard Bloomfield, to both of whom I am glad to
restate my scientific and personal debt. I am also indebted as always to
many enlightening and felicitious conversations with Henry Hiz, Henry
Hoenigswald, M. P. Schutzenberger, and Maurice Gross, and to very
valuable critiques and suggestions by Andre Lentin and Danuta Hiz.
Contents
I. INTRODUCTION
1. Overview 3
1.0. Introduction 3
1.1. A theory in terms of constraints 5
1.1.0. The major constraints 5
1.1.1. Example: The set-vs.-member connectives 9
1.1.2. Example: The passive construction 12
1.2. Mathematical properties 15
1.3. Science sublanguages 18
1.4. Information 21
1.5. The nature of language 24
2. Method 30
2.0. Introduction 30
2.1. The issue to be resolved: Departures from equiprobability 30
2.2. The relevant method: Least redundancy in description 31
2.3. Simplicity 36
2.4. Can meaning be utilized? 40
2.5. Properties of the data 43
2.5.0. Introduction 43
2.5.1. Unbounded language, finite speakers 43
2.5.2. Discrete elements 43
2.5.3. Linearity 44
2.5.4. Contiguity 44
2.5.5. Variety within a class 45
2.5.6. Synchronic single-language coexistence of forms 46
2.5.7. Diachronic change 47
2.6. From method to theory 48
viii Contents
3. A Theory of Sentences 53
V. INTERPRETATION
Index 407
List of Figures
Introduction
1
Overview
1.0. Introduction
This book presents in the first place a formal theory of syntax (chs. 3, 4), in
which a single relation on word-sequence gives to the set of these sequences
the structure of a mathematical object, and produces a base set of sentences
which carry all the information that is carried in the language. Within this set
of sentences there acts a set of reductions which change the shape of words
but not the information in the sentences, thereby producing the remaining
sentences of the language. Prior material to this theory is an analysis of how
words are constructed from sounds (7.1-4), and preliminary considerations
of how words combine into sentences (7.5-8, 8.1). Following upon the
establishment of sentences, we can go on to an analysis of how sentence sets,
especially sentence-sequences, are constrained in longer utterances and in
discourses (ch. 9), and finally in sublanguages (ch. 10). The relation of
syntactic structure to information and to the development of language is
then discussed in Chapters 11, 12.
The general structure of language consists then of: (1) the construction of
words from sounds, in a way that does not affect the structure of syntax;
(2) the construction of sentences from words, by a single partial-ordering
relation; (3) the transformation of sentence shapes, primarily by a process of
reduction. The independence of word-sounds from sentence construction
leaves room for various languages to have sound-syntax correlations:
particular sound similarities among words having the same sentential status.
This is the chief basis for additional grammatical constructions (beyond
those listed above), such as conjugations and declensions.
In the chain of operations, from individual sounds to complete discourses,
each operation is defined on the resultant of prior operations, and each word
that enters into the making of a sentence enters contiguously to the con¬
struction created thus far. Because of this, every construction (aside from
statable exceptions) can be analyzed in an orderly way in terms of these
operations as first principles.
It is thus not the case that language has no precise description and no
structural relation to meaning, but rather that these properties are reached
only via particular methods. These methods arise from a crucial property,
namely that language has no external metalanguage in which one could
4 Introduction
define its elements and their relations (2.1). Hence, the elements can be
established only by their redundancies in the set of utterances, i.e. by the
departures from equiprobability in the observable parts of speech and
writing. From this it follows that no matter what else a theory of language
does, it must account for the departures from equiprobability in language.
Furthermore, this accounting must not confuse the issue by adding to these
any additional redundancies arising from its own formulation. That is, we
need the least grammar sufficient to produce precisely the utterances of a
language. It then turns out—necessarily so, on information-theoretic grounds
(1.4)—that the elements and relations which are reached in a least-redundant
description have fixed informational values which together yield the informa¬
tion of the sentences in which they occur. In this approach, finding the
structure simultaneously exhibits the information (11.4).
To find the departures from equiprobability, we first show that not all
combinations of elements occur (3.0), or are equally probable, in the
utterances of a language (e.g. Go a the is not an English sentence), and then
show what characterizes the occurring combinations as against the non¬
occurring. Given the great number and the changeability of the sentences of
a language, the word combinations cannot be listed. However, they can be
characterized by listing constraints on combination, such as can be under¬
stood to preclude all the non-occurring combinations, leaving those which
are indeed found (3.0). It has been possible to find constraints that are
relatively stable, and are such that each contributes a fixed meaning to the
sentences formed by it. The initial and basic constraint shows that sentences
are produced by a partial order on words in respect to their ability to occur
with other words in the set of sentences. When the constraints can be
formulated in such a way that each one acts on the resultant of another, we
obtain not only an orderly description of the structure of the utterances, but
even a way of quantifying the effect of each constraint without counting
anything twice (12.4.5). These constraints are presented in 1.10 and in
Chapter 3.
This least set of constraints, which characterizes language, is found to
have mathematical properties that are relevant to their informational effect.
In particular, the initial constraint, in producing the set of possible sentences,
necessarily creates a mathematical object (1.2, ch. 6).
The same least-constraint method shows that sentence-sequences within
longer utterances can best be described as satisfying additional constraints,
of word repetition (ch. 9). This produces finally a double array of the
sentence-making partial order among the words (9.3).
It is further found that when the constraint-seeking methods used to
obtain the structure of a whole language are applied to the notes and articles
of a single sub-science, they yield not the grammar of the language, but a
somewhat different kind of grammar which has some of the properties of
mathematical notation, and which reflects the structure of information in the
Overview 5
The theory presented in Chapter 3 states that all occurrences of language are
word-sequences which satisfy certain combinatory constraints; furthermore,
for reasons related to mathematical Information Theory, these constraints
express and transmit information (1.4). The first constraint (3.1) is what
gives a word-sequence the capacity to express fixed semantic relations
among its words: it consists in a partial ordering on words (or morphemes,
7.4), whereby a word (or morpheme) in a sentence serves as ‘operator’ on
other words, called its ‘argument’, in that sentence. (In Sheep eat grass, we
take eat as operator on the argument pair sheep, grass.) The essential
generality of this partial-order relation is considered in 1.2. All the other
constraints act on the resultants of the first.
The second constraint (3.2) reflects the specific meaning in a sentence, by
recognizing that for each argument word certain words are more likely than
others to appear as operator on it. (Eat is more likely than drink and than
breathe as operator on the pair sheep, grass.) The meanings of words are
distinguished, and in part determined, by what words are their more likely
operators or arguments; and the meaning of a particular occurrence of a
word is determined by the ‘selection’ of what words are its operator or
arguments in that sentence (11.1).
6 Introduction
also Sheep eat sheep. Grass eats sheep, etc.; but all of these are syntactically
possible sentences. This constraint suffices to create all possible unreduced
sentences.
The operator-argument constraint imposes a partial order on the words
of a language with respect to their defined ability to co-occur in sentences or
utterances: in this order eat is higher than sheep or grass, while sheep is
neither higher nor lower than grass; know, probable, occur are higher than
eat (in (a) above). Each sentence is a set of word-occurrences satisfying this
partial order (i.e. of word-occurrences within which the operator-argument
relation holds), possibly with (statable) reductions. The partial ordering
itself has a meaning: the operator is said about (i.e. predicated of) the
argument (3.14). Hence, given the meanings of the words (by real-world
referents, or by selection of operators and arguments), finding the operator-
argument relations among the words of a sentence yields its meaning
directly: that meaning is the hierarchy of predicatings among the meanings
of the words of the sentence. The operator hierarchy, i.e. the partial
ordering, is overt in the unreduced sentences; it and its meaning—though
not its overtness—is preserved under reduction.
The grammar of a language is then the specification of these word classes
in their partial ordering, and a specification of the reductions and linear¬
izations. This seems quite far from the traditional contents of grammar. But
the usual grammatical classes, such as verb and adjective, are obtainable
from the partial order. Some (e.g. adjectives) are by-products of reductions
on operators; others (e.g. verbal nouns) are secondary statuses such as that
of an operator {eat) appearing as argument (eating) of a further operator
(e.g. under occurs all the time, in Sheep’s eating grass occurs all the time).
And such syntactically local constructions as plural, tenses, conjugations,
and declensions are merely patternings of the form, likelihood, and syntactic
status of certain reductions. Thus the usual complex grammatical entities
and constructions, important as they may be for practical descriptions of a
language, are only incidental to (and derivable from) the universal and
simple form-and-meaning constraints of language structure. The traditional
constructions mentioned above can be brought out by seeking the similarities
and interrelations among the reductions in a language, and correlations
between the shape of words and their syntactic status.
In general, the constraints—including the vocabulary and the co-occurrence
likelihoods of the words—are discovered from the regularities seen in
sentences. Since each constraint acts on the resultant of previous constraints,
the successive constraints have monotonically descending domains; and
more importantly, at each stage of analysis the remaining constraints do not
alter what has been obtained thus far. Furthermore, each sentence is a
partially ordered structure of words and reductions. In its unreduced form,
the words are ordered by their operator-argument relations there—in
effect, by their ordered entry into the making of the sentence. The reductions
Overview 9
conjoins any arbitrary two sentences; and the apparently independent (2)
Only John didn't come (with stressed only, similar to saying John alone
didn't come). However, the only of (2) turns out to be a case of the only of
(1), if we derive (2) from (3) Everyone (else) came, only John didn't come
(which is structurally of type (1) above). In (3) the whole first component
sentence is zeroable (reducible to zero) on grounds of low information in
respect to the set-vs.-member selection of the bi-sentential operator (con¬
junction) only. Note that the first component of (3) is identical with the
second component, except that everyone (else) with optional else replaces
the argument (John), and that the two verbs differ in negation alone (didn’t
come). When the two sentences which are arguments of only differ in just
this way the whole of the first argument (here, Everyone (else) came) is
zeroable, because of the set-vs.-member selection of only. This reduces (3)
to (2). In other cases, e.g. (1) above, no zeroing occurs and only remains
visibly bi-sentential.
Although the meaning of only in (2) seems to differ from that in (1) and in
(3) , that difference is seen on closer inspection to be merely a trace of the
zeroing of the first sentence under only. Indeed, (3), with the meaning of
everyone else not, except John, and (2), with the meaning of John alone,
carry the same information. It is instructive to see here how meaning¬
preserving reductions (here, zeroings) in the first argument of only in (3)
yield such a difference in the meaning of post-reduction only such as we see
in (2).
Next, except. Here too we can find a juxtaposition of structurally arbitrary
component sentences, as in The meeting was fine, except (that) John didn’t
come with optional that, and also of structurally contrasted sentences as in
(4) Everyone came, except that John didn’t come. What is zeroable in the
contrasted sentences around except is just the not (optionally) plus verb plus
that (here, that. . . didn't come) of the second component sentence, more
rarely the verb alone (here, came). Then (4) reduces to (5) Everyone came
except John or (5a) Everyone except John came or the rarer (5b) Everyone
came, except not John. (5b) seems to negate (5), but is actually a paraphrase
of it, differing only in not zeroing the expected not. (The second came must
be zeroed if not is.) Again, the meaning of (4) is preserved in (5).
Then, but. We have any arbitrary component sentences in e.g. The
meeting was fine but John didn’t come, and contrasted ones in e.g. (6)
Everyone (else) came, but John didn't come. Under but, in structurally
contrasted sentences such as (6) one can zero the second verb, and if so also
the high-expectability not. Then (6) reduces to (7a) Everyone came, but not
John or Everyone, but not John, came (separated by commas) and to (7b)
Everyone came but John or Everyone but John came (without commas), all
preserving the meaning of (6). The not and the zeroability of the repeated
verb may be in the first component sentence. Here the optionality of zeroing
the not (compare 7a and 7b) yields the peculiar case of having identical
Overview 11
meanings for a sentence and for its apparent negation: in There weren't hut
two people left and There were but two people left. (The zeroings that
produce the last form are not available under except.)
This analysis shows only, etc., to be a subset of the sentence-introducing
‘particles’ (therefore, however, nevertheless, too, so, even, etc.), which do
not normally appear on the first sentence of a discourse—this last because
they are all derived from conjunctions under which various zeroings can
eliminate parts of the component sentences, including (as under only) the
whole first sentence (G393). In addition to this general derivation, the only,
etc., discussed above have a high likelihood for every matched against a
subset of whatever is referred to in the every, and only, except, but, also have
high likelihood for not on one sentence but not both.
This illustration does more than show that explanatory analyses can be
obtained by the syntactic theory presented here. It shows that regularities
among suitable words can be discovered by comparing their environments.
It also shows that the regularities apply to processes, such as zeroings here,
on those environments. Furthermore, it shows that the analysis of a given
environment has to be made in respect to all the other environments of the
word in question, so that the given one is obtained from the others by the
regular processes. This means taking into account all the environments of a
word rather than saying—unhelpfully—that in different situations it is a
different word or a different use or meaning of that word. It also means
trying to take into account the etymology of a word, i.e. its earlier uses: only
is from one + -ly; except is from Latin ex ‘out of + capere ‘take’; but is from
early forms of by + out + adverbial suffix. The etymology is relevant
because: either the current word in its environment is a descendant (i.e. a
repetition or a continuation) of its earlier source, appearing in the source of
its current environment; or else at some time the old word was extended into
the current environment by some regular or analogical relation to the earlier
environments of the source word. Such etymological comparison is therefore
not a checking of the modern word against its earliest known cognate, but a
tracking of the changing use of the word in various environments, to the
extent possible. And this evidential ‘use’ is a matter of specific environing
words in various sentences, not a matter of meaning, because we have little
knowledge of past meanings except via past word-environments. Further¬
more, appeal to meaning would not suffice to predict the increasing preference
for certain environments, which underlies the zeroabilities within them: why
did the above sources of only, except, but (but not, for example, unless)
develop the likelihoods and zeroings of every and not which they now have
(while unless, not far in meaning, did not)? Why did even not move into a
more contrastive use, with special likelihood for not, as it does slightly in
even so (compare French quand meme)l
12 Introduction
The other illustration is the case of the passive construction in English (G16,
362): The book was taken by John, by the side of the ‘active’ John took the
book. It is first of all clear that the passive is not an independent combination
of words, but rather a transform of the active (i.e. derivable from it), since
for every passive sentence (except for those in subsets explained below)
there corresponds an active sentence, as above (cf. 2.3). It can further be
seen that the passive is not an independent transformation, but rather a
resultant of three elementary transformations. (1) One is the by which arises
in English on the subject (first argument) of a verb when a further operator
acts on that verb, e.g. above in taken by John as argument under was. Note
that this use of by differs syntactically from the ‘prepositional’ operator by
which appears equally on active and passive sentences, as in John read the
book by candlelight and The book was read by John by candlelight. (2) A
second component of the passive which is known elsewhere in English
grammar is the stative -en (-ed) suffix. This suffix appears also in the ‘perfect’
tense (John has taken the book), in the ‘passive adjective’ (a crooked smile, a
learned man), and in a suffix on nouns (a monied family, a two-wheeled
vehicle). (3) The third component is the moving (‘permutation’) of the
object (second argument) of a verb to subject position, more precisely to
being the subject of a further verb (here, is) that is operating on the original
verb. This is seen in The Armenians suffered repeated attacks by (or: from)
Azerbaijanis, where the grammatical (and physical) object of the attack
were Armenians; the form is comparable to the passive The Armenians were
repeatedly attacked by Azerbaijanis, from Azerbaijanis repeatedly attacked
the Armenians.
As to the elementary transformations:
cannot be said to have such a state in the present time. But the present
perfect can be used on the passive of the above, in Lago di Garda has been
visited by Goethe and others of his generation, because the lake still exists
and can be said to have that state. (If the passive were merely a direct
reshuffling of the active, it should have the same range of tenses.) In the
active, the present-tense state (on visit) would have been asserted of
Goethe, but in the passive it is asserted (on be) of the lake. There is also a
limited perfect tense with is, seen for example in The Sun is risen. We are
agreed. We are resolved. She is grown quite tall, I am finished (or: done) with
it. He is possessed of great wealth-, this is a stative or completive of We agree,
etc. Further, there is the passive adjective as in the curved horizon, a learned
scholar, a professed (as well as professing) Christian. Finally, there is the
stative -ed on nouns, as in a monied family (which means not a family which
happens at the moment to have money, but a family in the ongoing state of
having money).
(3) Having the object of a lower verb (e.g. eat, praise below) appear as
subject of a higher verb (e.g. is ready, deserve, is below) occurs independently
of the -en. Among the many forms, consider the following examples: It is
ready to eat. It is ready for eating (where it is the zeroed object of eat); He is to
blame (the missing object of blame is him)-. He deserves our praise (where of
him can be added, as object of praise)-, It improves with (the) telling. It is in
process of dismantling (where it is object of tell, dismantle, and can be said,
e.g. . . . of dismantling it); This is past understanding, Much money is owing
to them, obsolescent The house is building (in all these the subject of is is the
same as the zeroed object of understand, owe, build); This is understandable,
which could be derived from This is capable of one’s understanding it (as
though from This is able for one to understand it). In all these cases the lower
verb can also be said in the passive: It is ready to be eaten. Much money is
owed to them. The house is being built (the last, only in recent English).
To say that the passive includes these three constructions means that it is
the resultant of the application of all three to the source of the passive, i.e. to
the active. Hence the passive (The book was taken by John) results from
taking the simple active (John took the book) and placing it under a higher
operator, is -en, which belongs to the set of (3) above whose subject is
usually the same as the object of the verb under them. We then have is -en
operating on the pair book and John took the book (which under -en is
nominalized to taking of the book by John). Here of the book is zeroed as
being a necessary or usual repetition of the subject of is -en; then the subject
book of is -en looks like a permuting of the object book of take, whereas in
the derivation we see that the former is j ust a repetition of the latter that then
caused the latter (the lower occurrence of the object book) to be zeroed. The
-ing of taking is not an independent accretion (an operator or argument) to
the sentence, but merely a marker, an indicator (G41) that the operator to
14 Introduction
' But one can say Goethe says this in many places or Goethe is one of the greatest German
writers, because these are continuing properties of Goethe’s work. (He is still saying things in
his books.)
Overview 15
There are reasons to expect that the structure of language has in some
respects a mathematical character. The grammatical classification of words
is defined in general by relations among the words, rather than by particular
properties (phonetic, morphological, semantic, or logical categories) that
each word has by itself. An indication of this is seen in the simple fact that
4 It is for this reason that the examples in the present hook are almost entirely from English,
although a single language is far from sufficing for a survey of language structure. In the case of
English, there are available above all the multi-volume Oxford English Dictionary, and Otto
Jespersen's Grammar of English on Historical Principles; in addition, the present writer's A
Grammar of English on Mathematical Principles (referred to here as G) presents English
grammar in terms of the syntactic theory of chs. 3-4 below. The latter book not only provides
examples for the discussion in the present volume, but also shows how the methods and theory
stated here suffice to yield an orderly and detailed grammar of a language. The English
grammar of G thus serves for systemic evidence, without which a theory in a field such as
linguistics would not be adequately established. As noted at the end of 12.1, the interrelations
and regularities of the data are essential to any theoretical formulation (on grounds given in
2.2), so that the episodic presentation of examples in support of one analysis or another is of
little relevance.
16 Introduction
every language can accommodate new words. For a language at any one
time, we can think of its word combinations as being determined by specified
lists of words, e.g. the list of words including sheep, child, grass, each of
which can ‘make a sentence’ with a word of a list that includes fall, sleep, eat,
but not with any word of its own list: Sheep eat grass, but not Sheep child
grass. However, when a new word enters the language (i.e. comes to be
used), the question is: on what basis can we say what list it belongs to? If the
new word combined with other new words and not with the previously
existing words, the language would develop separate encapsulated gram¬
matical systems for different lists of words; this is not generally the case,
although something approaching it is seen in the morphological structure of
the English perceive, conceive, persist, consist, etc. If the new word combined
with old words, but differently than the old words did among themselves, we
might in time lose any systematicity over the whole language; this too is not
the case. What we find in each language is that large subsets of words act in
uniform ways in respect to other large subsets.
This means that the late-comers (whatever words they were) combined
with other words in the same ways that existing words did. Such fitting into
existing word sets means that the new words satisfied the condition of being
usable in the existing types of word-set combination. Thus the overall
criterion of classification is the condition that words enter into the same
combinatory relations. Hence the general characterization of word classes is
in terms of their common combining-relations to other classes. It will be seen
that this opens the door to characterizing the entities of language by their
relations rather than by their inherent properties—a necessary condition for
relevant mathematical characterization of language structure.
Beyond the considerations above, language has to have a constructive and
relevantly mathematical structure if it is an unbounded set of different
activities (different utterances) carried out by finite beings (2.5.1), and if it
was not built by anyone on some template or model (the existence of which
prior to language would be inexplicable, 12.4.3-4). Also, if language is
structured as an information carrier (11.5), we can expect it to have
necessary properties related to those known from mathematical Information
Theory (11.7).
This last is of central importance. If we begin analyzing language without
any preconceived guides, we find that the lack of an external metalanguage
in which to conduct the analysis leads us to describe the data in terms of
departures from equiprobability of combination (2.1-2); these are the
constraints of 1.1 and Chapter 3. It is then immediately apparent that these
constraints create the kind of redundancies studied in Information Theory,
so that the structure of syntax is a system of contributions to the total
redundancy—both of language and of its information.
If we now consider the data of language, it indeed seems overtly to have
possibilities of mathematical treatment, as in the transitional probabilities
Overview 17
among the successive letters that make a word in alphabetic writing, or the
successive words that make a sentence. However, the transitional prob¬
abilities do not yield useful characterizations of words and sentences (although
particular stochastic processes can be so formulated as to yield word and
sentence boundaries, 7.3, 7.7). Also, free semigroups are formed by the
sequences of letters and by the sequences of words. But these semigroups do
not represent natural language, and their structure is not relevant to the
structure of language. For example, no simple relation stated on these sets
can characterize which different sequences have which different frequencies
of occurrence. Rather, these sequences constitute sets, of which the words
and the sentences of natural language are undefined subsets. Indeed, to
define the particular subsets which constitute language is the first and
essential task of a theory of language.
The present theory will be seen to have various mathematical properties
(ch. 6). The crucial mathematical property lies in the fact that the essential
determination of operators and arguments is made by a single relation on
their combinations. That is, in each language, the classes of operators and
arguments are not each a collection of words which have been selected on
syntactic or morphological or semantic grounds (such as having a tense, or a
particular suffix, or a particular kind of meaning). Rather, words are
assigned to classes entirely on the grounds of a single co-occurrence relation:
We say that A is operator on ordered classes, say B, C (or: A depends
immediately on B, C), to mean that A is in the set of all words which appear
in an unreduced sentence only if some word of B and some word of C, in that
order, also appear in that sentence; here B or C may be stated to be the same
class as the other, or as A, or different classes, or the null set (3.1. l)r ln turn,
membership in B or C is not given by a list of words or by some property
other than this dependence. Rather, B is itself defined as the set of all words
which depend, in the above sense, on classes D, E\ and so for C; and so on
(3.1.2).5 The word classes are thus defined by their dependence on word
classes which are in turn defined by the same dependence relation. An
infinite regress is avoided only due to the existence of a class of words whose
dependence is on the null set of words. The single relation among words is,
then, dependence on dependence; and a sentence is any sequence of
words—or other objects—in which this relation is satisfied. In accord with
this, there are just a few dependence-on-dependence statuses that a word
can have (i.e. types of argument that a word requires). Hence, languages
have a set of words whose status is that they can be defined (for reasons given
in 3.1) as depending on null (these are ‘primitive arguments’, 3.1.2, e.g.
John, sheep), words depending on (one or more) words that depend on null
(e.g. sleep, eat), and words depending on words that depend on something
5 The reasons for this situation, going back to the lack of an external metalanguage, are
reviewed in ch. 6 and 12.4. Sublanguages, which have an external metalanguage, do not have
the mathematical-object property (10.51).
18 Introduction
not null (e.g. occur, is probable); this last type includes words whose status is
that one of their arguments depends on null while another argument
depends on something not null (e.g. know, whose first argument may be
John while the second may be eat ((a) in 1.1).
It may seem surprising that a practical grammar, able to produce the
actual sentences of a language, can be built out of word classes defined so
generally and indirectly, rather than out of linguistically determined word
lists. This development was perhaps necessary, because if word combinations
were in respect to specified word lists, it would have been difficult to reach
public understanding on how new words are to combine. Greater stability is
attained if once a new word is used in a particular dependence-on-dependence
status, it is publicly clear what types of argument it can occur with (5.4,
12.4). In any case, the fact that we can characterize in this way the word
classes that are sufficient for grammar means that words, whatever phonetic
and semantic properties they have, have also as their defining syntactic
property their position in a network of occurrence-dependences in respect to
each other. Thus the co-occurrences (sequences) of words in unreduced
sentences as created by the first constraint is a mathematical object: a system
of entities defined only by their relations and closed in respect to them, with
operations defined on these entities.
Showing that a given system in the real world is describable as a math¬
ematical object means that any other real system which is also an interpretation
of that mathematical object shares the relevant properties of the first. In
particular it means that any other set of objects or events that satisfied the
first (dependence) constraint would also have sentence-like properties. And
indeed, mathematical notation will be seen below to satisfy the first con¬
straint but not the second (10.6), with the similarities and differences
between language and mathematics being indicated thereby.
In addition, the effect of various constraints has been to produce various
mathematical structures in language—orderings, semigroups, mappings,
and decompositions involving sets of words and of sentences. These are
noted in 6.3, 7.3, 7.6-8, 8.1.5-6, 8.3. What is important is that these
structures are not merely describable, but are relevant to the informational
function of language (11.5). The dependence-on-dependence relation under¬
lies virtually everything in language structure; and beyond that, the math¬
ematical structures noted here suffice to define almost all the semantically
(and grammatically) important classifications and relations; and even the
kinds of information capacity that language has (11.3, 11.6).
in that science. Utilizing both the discourse and the sublanguage properties
of scientific articles makes possible (in principle) computer processing of
their information.
It will be seen in 10.3 that while the sentences of the science are a subset of
the sentences of the whole language, the grammar of the science is not a
subgrammar of the grammar of the whole language. Rather, it is a kindred
system, different not only in detail but in some basic properties which go
beyond whole languages. In their purest form, the word classes and sentence
formulas of a science (independently of their original language), and in a
different way the double array of a single discourse, constitute not just
sublanguages but new language-like systems of notation. These systems,
intermediate in specifiable respects between natural language and math¬
ematics, show something of the nature of scientific knowledge on the one
hand, and of discussion in general on the other (10.5).
1.4. Information
meaning ‘to say about’ supports other evidence that language developed as a
carrier of information (11.3, 12.3). It will be seen that other structural
properties of language are explicable on this ground, and that, in contrast,
little in grammar is structured to carry subjective expression, emotion and
the like: indeed, such expressive features of speech as the intonations of
anger, hesitancy, etc., are not included in grammar (hence, in effect, in
language), because they do not combine with anything else in a way that
forms further structures. That the content of language is seen here to be the
communicating of information, rather than the expression of meaning in
general, leaves room for the expression of other meanings, such as feelings,
to be carried by other vehicles and behaviors such as music and the other arts
(10.7).
There is another important result concerning information. Those trans¬
formations which do not bring further operators into a sentence ((b) in 1.1)
have been found to be mostly either alternative linearizations of the original
partial ordering of words, or else reductions in the phonemic (phonetic)
composition of words in the sentence, under conditions in which no relevant
information is lost or added in the sentence. The reduced form may lose
some words which seem to have contributed information but did not really,
as in / expect him any minute from I expect him to be here any minute. Or it
may put things in a different grammatical relation than the source did, as in
The conductor, who was in a great rush, did not look back from The
conductor did not look back; the conductor was in a great rush (where only in
the reduced form is the secondary sentence a modifier on conductor). There
is something important which is preserved under transformation: it may be
called the substantive or objective information, which is the same in the
reduced sentence and in its source. The meaning difference between an
unreduced sentence and its reduced form consists largely in the speaker’s
attitude toward the information (e.g. what is the topical part), or in the
hearer’s access to the information, and of course in the compactness of the
information.
When we include among the constraints the choosing of particular words
for a given sentence—in effect this is the second constraint (3.2)—the
methods of the present theory isolate the first two constraints, which are
universal and essential, as bearing all the substantive information and as
creating a base sublanguage from which the whole language is derived.
Within each sentence of the base, each application of these two constraints
contributes accfetionally to the information in the sentence. The other
sentences are derived from the base by constraints which do not change the
substantive information, though they may lose distinctions by degenerately
deriving from two different base sentences the same word-sequence (which
is thereupon ambiguous).
In the theory presented here every grammatical constraint (above all, the
operator dependence) has a stated meaning which is the same in all its
Overview 23
occurrences. And every word has stated meanings; its meaning is the same
whenever it occurs with the same operators and arguments. That is, the
association of words and grammar with phenomena of the world is made in a
regular way. Hence it is possible to work toward orderly processing and
retrieval of the specific information in sentences and discourses, on the basis
of the syntax and words of sentences (cf. ch. 4 n. 1).
A major gain in describing language in terms of constraints—which are a
construct of theory—rather than in terms of word combinations—which are
the observables—is that each constraint can be associated with a meaning,
in a way that each type of combination cannot always be. The relation of
constraints to meaning is not surprising. On grounds of Information Theory
we would expect the total departures from equiprobability, created by all
the constraints together, to determine the total informational capacity of
language. In the present work, it is seen that each constraint, acting on its
predecessors, adds thereto its own departures from equiprobability and
therewith its additional informational capacity (11.3.2). The constraints
determine even what kind of information a system can carry: that math¬
ematics can carry truth but not fact; that natural language cannot distinguish
nonsense, but sublanguages can (10.5-6). A detailed stepwise correlation is
thus created, in ihe base sublanguages, between form and content (11.4).
The constraints—on sounds, on what constitutes words, on syntax—arise
from efficiencies of communicational behavior (12.3.5) and from the speakers’
experience of the world (11.6); but they appear physically as the structure of
language events, as classifications of components of sound-waves. Hence,
the relations within information can be processed on computers by carrying
out the structural operations of language, although otherwise the attempts
to mimic the brain and cognition by computer programs are not at present
based on adequate or relevant methods. The relation of language structure
to information thus makes possible the complex processing of information.
In addition, it provides a stepping-stone toward understanding the structure
of information, by showing that total informational capacity can be reached
as the ordered product of specifiable contributions to information capacity,
and by suggesting that all information involves a departure from equiprob¬
ability acting on departures from equiprobability (11.7).
We see here that there is a particular dependence of co-occurrence which
creates the events which are called language, and that this dependence
necessarily gives these events the power to carry information. More specific¬
ally, the underlying structure of language is the structure of information
(11.5), and in 11.6.1 it is proposed that the co-occurrence constraints which
create information are a reflection of co-occurrence constraints in the
perceived world.
24 Introduction
The theory of syntax presented here does not constitute a theory of language
in all its manifestations; it is not clear that there can be a single such theory.
However, what is shown here about syntax creates a framework which has to
be taken into account in many other questions concerning language, and
which casts new light upon these: for example, on the relation of language
change to language structure, on the differences among languages, and on
the difference between 'natural' language and language-like systems such as
discourse structures, science sublanguages, mathematics, and computer
languages. More centrally, certain considerations are established here with
respect to the structure of information, and with respect to the nature—even
the early development-of language. (Unavoidably, the sketch below is a
very inadequate summary of the discussion in Chapter 12.)
It must first be noted that the primitive elements are determined by
objective means, not by meaning or even by meaning-differences. The
phonemes, which carry no meaning, are obtained by a behavioral test of
what constitutes repetition in the speech community (7.1), and the words by
a stochastic process on phoneme-sequences in utterances (7.3). Given this,
it has been seen that the syntactic relations, which carry such fundamental
grammatical meanings as being predicate, or being subject, (a) do not have
to be invented or discovered as underivable sui generis properties of
grammar, but (b) can be derived from constraints on word co-occurrence.
In (a) such relations as being subject of a verb are necessarily unique to
language and cannot be compared with anything else. In contrast, for (b),
the dependence constraint is a kind of dependence that we can seek in other
systems and in other types of activity.
The syntactic and informational structure of language contributes to
understanding its development (12.3). The operator constraint gives the
sentential occurrence of each operator in terms of the occurrence of its
arguments, but not vice versa. Thus in the descriptive order of sentence¬
making, the operator classes are 'later' than their argument classes. This
descriptive order has some bearing on the development of language, in
particular on the early development of sentences and syntax. It says nothing
about the existence of words independently of sentences. Many words which
later became arguments, and many which later became operators, must
have been used in pre-sentence times, said separately or in various com¬
binations. But the coming into existence of sentences as explicit kinds of
word combination, with explicit sentential (predicational) meaning for the
combination, arose out of certain words coming to require, for their entry
into a combination, the presence of words from stated sets or else of null.
The order in which the definitions of syntactic classes refer to other syntactic
classes, which is the elementary order of sentence-making, may have a
Overview 25
relation to the historical order of the syntactic classes: words may not have
become established at a particular operator level before words of their
argument levels have become established as being such.
In this developmental picture, the dependence may well have arisen not as
a structure, but as a preference: simply a matter of certain words A being
more likely to be used in association with words of some particular set B than
alone (or than with other words A). However, once this likelihood of
occurrence became conventionalized, with some B present or implicit (via
zeroing) whenever an A appears, the requirement became a (sentential)
structure and gave rise to predication as the meaning of that word relation
(12.3.5). Such conventionalization of a use-likelihood into a requirement,
with an attendant grammatical meaning, is common in grammar, for example
in the distinction between verbs and adjectives (G189).
All other syntactic relations can be obtained from the operator constraint.
And since the meaning of that constraint is ‘saying about’—the operator is
said about its argument (1.4)—one can see that the constraint contributed
a successful form of juxtaposing words so as to give information. This is
of importance for understanding the development of language. For it is
obvious that language has to have developed from a state in which there was
no language. The problem is not resolved by attributing language to a
specifically linguistic mode of neural processing, since it is even harder to
understand how such a mode of processing arose before language existed.
Furthermore there would remain the question of why the neural processing,
and language itself, has the particular grammatical dependence-constraint
that it universally has. In contrast, once that constraint exists, its learning
requires no unique preconditioning: one need only hear a great number of
sentences constructed by this dependence and thereupon construct further
juxtapositions of words that satisfy the same dependence.
The same operation which creates the elementary sentences is then
recursively applied to make extended sentences. Other sentences are
further derived from existing ones by other conventionalizations of language
use.
Sentences are obtained, and defined, as segments of speech within which
the operator-argument relations, and most reductions, hold. Within a
sentence there may be transformations. Phenomenologically, a sentence
transformation appears as a relation (a constraint) among observed sentences.
However, as a process it is merely a change of shape (primarily a reduction)
within a sentence (8.1.7, 9.0), yielding not another sentence but another
shape of the given sentence. Beyond this, there are also constraints among
sentences. These are necessarily quite different from transformations: they
are preferences or requirements for word repetition in certain operator-
argument relations to each other within successive sentences (e.g. discourses)
and within other special aggregates of sentences (sublanguages). Hence
each sentence, discourse, and sublanguage is a structure, the locus of
26 Introduction
There are also many areas of investigation which are related to language
structure as presented here, but remain to be undertaken. For example:
Ongoing cases of institutionalization of language use;
Specifics on the motive forces and directions for the evolving of language
structures;
The relation of the mechanisms of historical change to evolving structures;
Does the dependence-on-dependence relation indeed apply to all known
natural languages, as it seems to?
How does the operator-argument structure compare with the old deaf
sign-language which is not based on knowledge of spoken language?
Detailed descriptions, in terms of the dependence presented here, of
major language families, and of little-known languages such as those of
Australia and Borneo;
Detailed descriptions of reductions in various language families; many
details remain to be worked out in English too;
Special problems in deriving morphemes from operator-arguments and
reductions: affixes, pronouns, prepositions, particles;
Relatively recent development of such problem morphemes: e.g. articles;
The regularization of the double array in discourses (9.3);
Argumentation and consequence as constraints on sentence-sequences in
science (10.5.4).
In sum, the most general theoretical finding reported here is that the word
combinations which constitute sentences can be characterized not by a
sequential relation but by a partial ordering relation on words, with all
grammatical relations being defined in terms of this partial ordering. The
most general grammatical finding is that if we take, as primitive events on
the sentences of a language, a set of statable reductions in the shapes of
words which have high expectancy in their sentences, then the residual form
of each sentence, after undoing any reductions in it, is an overt fixed
operator-argument relation among its words (the above partial ordering).
The most general informational finding is that every grammatical relation
has a fixed meaning, and that the constraints which create the partial
ordering also create its information.6
f1 At various points, the conclusions which are reached here turn out to be similar to well-
known views of language. For example, the operator-argument relation has similarities to the
predicate structure in Aristotelian logic. The way it functions in the formation of sentences has
similarities to functors (and more generally to the deep understanding of language theory) in
the categorial grammar of S. Lesniewski, followed by K. Ajdukiewicz and others in the Polish
School of logic; cf. particularly W. Buszkowsi, W. Mariszewski, J. van Benthem, eds.,
Categorial Grammar, Benjamins, Amsterdam, 1988; H. Hiz, ‘The Intuitions of Grammatical
Categories’, Methodos (I960) 311-19 and 'Grammar Logicism’, The Monist 51 (1967) 110-27;
J. Lchrberger, Functor Analysis of Natural Language, Mouton, The Hague, 1974. Various
conclusions about structure and meaning here appear similar to, or inverses of (this too being a
similarity), arguments made by L. Wittgenstein. And the analysis of science sublanguages
Overview 29
Method
2.0. Introduction
Linguistics has not in general been one of the sciences in which the relevance
and correctness of statements are determined by controlled methods of
observation and argumentation. It is therefore desirable to consider what
methods are relevant in linguistics, in the hope of establishing criteria for
investigation and analysis. Choice of method is not less important than
responsibility in data, and the choice should be determined not by personal
preference or current custom but by the nature and the problems of the data.
Furthermore, it will be seen in 2.2 that it is not enough to go by the separate
observable features of the data, such as their linearity or their carrying
meaning, but rather that one has to consider the regularities of all the data,
and their interdependence.
and mathematics, the statements and formulas which are comprised in them
have such explicit forms—-of symbols and their combinations—that it is
clear that statements able to describe logical and mathematical formulas do
not themselves have the structure of those formulas. The statements that
describe a system are in a different system, called the metalanguage—in this
case the metalanguage of logic or mathematics—which is richer in certain
respects than the system it is describing. In the case of natural language, we
have no different system in which the elements and combinations of language
can be identified and described. The identifications for a corpus of language
utterances can indeed be made, but only in the sentences of a natural
language (the same or another)—that is, in sentences which have already
been built out of the same kind of elements and combinations which they are
describing (5.1). We cannot describe the structure of sentences except in a
system which itself has sentences, with predications and word-likelihood
differences. Furthermore, no external metalanguage is available to the child
learning its first language, nor to early man at the time language was coming
to be.
In the absence of an external metalanguage, one could seek to identify the
entities of language by extra-linguistic means, such as the occurrence of
words in life circumstances that exhibit their meaning. Many words may
indeed be identified on such grounds. But many other words, and the
ordering which makes out of them sentences as against ungrammatical word
collections, cannot be thus identified. In contrast, when only a small
percentage of all possible sound-sequences actually occurs in utterances,
one can identify the boundaries of words, and their relative likelihoods,
from their sentential environment; this, even if one was not told (in words)
that there exist such things as words (7.3). And when identified words
combine with each other in relatively few regular ways which are used
throughout the course of an utterance, one can recognize utterances, long
and short, in distinction to the non-occurring sequences of words; and one
can recognize within long utterances segments (sentences) identical to
otherwise-observed short utterances. This holds even if one is not told that
there exist such things as sentences (7.7). Given, then, the absence of an
external metalanguage, it is an essential property of language that the
combinations of words in utterances are not all equiprobable, and in point of
fact that many combinations do not appear at all.
It follows that whatever else there is to be said about the form of language,
a fundamental task is to state the departures from equiprobability in sound-
and word-sequences. Certain problems arise. No one has succeeded in
characterizing the departures from equiprobability by fixed transitional
probabilities between successive words of an utterance; the reason turns out
to be that the overall characterization of word-configurations is a partial
ordering of words (3.1). Furthermore, since the set of all possible word-
sequences in a language is in principle denumerabiy infinite while the
Method 33
2.3. Simplicity
' In meaning, growth may seem to have also other sources, as when it means ‘amount of
growing’. However, the other meanings can be derived from the same growing with zeroed or
implicit amount (of growing), product (of growing), etc.
Method 37
many cases by considering not just a form but also its ‘behavior’, i.e. the
environment and other factors which characterize its occurrences. It has
been noted here that the general ‘behavior’ of words is such that it is easier to
state the constraints which preclude certain types of word combination than
to state all the types of word combinations that can appear in a language.
However, simplifications in any one part of the grammar have to be
weighed against their effect in other parts of the grammar. To see this in
practice requires a detailed example such as the following: The description
of sounds is simplified when we collect sounds into phonemes, roughly on
the condition that these sounds have complementary environing sounds
(7.2.1). On this basis, English /?, which never occurs at the end of a syllable,
and the ng sound (as in sing), which occurs only at the end of a syllable, could
be considered members of one phoneme, Lng/h\ (Syllables are phonetic,
non-syntactic, restrictions on phoneme-sequences. ) But this would disturb
the phonemic similarity (7.3) of certain words when these occur in different
combinations. For example, long ends in the ng sound, but longer can be
pronounced either with ng alone or with the ng sound followed by g; and the
related word longitude is pronounced only with n followed by j. If we take ng
as a member of an ng/h phoneme, then long and longing and longer contain
the ng/h phoneme, while longer (with pronounced g) contains different
phonemes, n and g, with longitude containing n and j phonemes. If,
however, we take the ng as a sound restricted to syllable end (with no
relation to //), it can then be analyzed as a syllable-end composite of the n
phoneme followed by the g phoneme. Then the two-phoneme sequence ng
at word-end usually has the single sound ng(long, longing, longer), while the
other pronunciation of longer, and longitude, can be related to the presence
of the same n and g (or j replacing g) phonemes. The second analysis fails to
maximize the collecting of complementary sounds into phonemes, but gives
a simpler phonetic characterization of phonemes and a simpler phonemic
composition (‘morphophonemics’) to words such as long in their various
combinations. (Note that the reason against ng/h was not the lack of
phonetic similarity between the sounds but the combinatorial simplicity at
the next level, that of words.)
There are many other situations in which consideration of systemic
simplicity affects the analysis of a given form. In many cases, it is possible to
find an analysis in which both elements and the constraints on their com¬
binations are simplified. Thus the classification of phonemic distinctions into
phonemes, of word-size segments into words, of words into operator classes,
are all done not only for reasons of taxonomy, with the general simplicity
advantages that taxonomy provides, but much more, explicitly, because
these classifications plus the constraints stated on the classes (phonemes,
words, operators) are together far simpler and fewer than the constraints
that would have to be stated on the individual phonemic distinctions, word-
size segments, and individual operator words. The sets that are defined in
38 Introduction
the methods discussed here are thus characterized by relations among their
members (cf. 2.5.5).
Nevertheless, there are also forms for which any simplification in one
respect involves complicating some other part of the description. This means
that the relevant simplicity has to be measured not in respect to the analysis
of each form separately, but in respect to the whole system that describes or
predicts the utterances of the language. It includes the simplicity of the
primitive elements and their properties, and that of their operations or
relations, and of the types of domains on which these latter are defined—
and also the simplicity of the relation between the forms and their meanings,
which is an essential concomitant of them.2
The simplicity of the system is greater if any property or analysis that is
stated at one point in the description of a construction is preserved under
later developments in it. Every preserving of a property saves having to state
it separately for the different occurrences of the words. We try to find
elements and relations such that our analyzing a construction in terms of
them is not affected by further processes acting upon it.
In particular, no analysis of a construction should have to be corrected by
the analysis of a later element operating on that construction. For example,
(1) A whale is not a fish should not have to be derived from an assertion
(under sentence intonation) (2) A whale is a fish to which is added (3) not,
denying that assertion. Also, given the true (4) Lead paint is dangerous, we
would not want this assertion about paint to be derived from the false (5)
Paint is dangerous which is an assertion about paint in general, plus the true
(6) The paint has lead (or the equivalent).2 This does not mean that we
forego relating (1) to (2) or (4) to (5). Rather, we obtain the sentence (1) A
whale is not a fish from (3') One denies (or / say the denial of) operating on
the unasserted intonation-less sentential construction (2') A whale is a fish
(G321). And, for other reasons, we obtain Lead paint is dangerous from
Something, which is paint which has lead, is dangerous from (5') Something
is dangerous to which is adjoined (6') This something is lead paint from This
something is paint; this paint has lead (G124). It is a contribution to
simplicity to have each step in the making and analysis and interpretation of
a language construction be accretional, as in the primed derivations above,
and not require a reconsideration of what has been done up to that point.
A more complicated case of this parsimony of analysis is the avoidance, in
analyzing a form, of going beyond what the language itself has distinguished.
For example, in such pronouns as he, it is not possible, except in special
Since each relation has a specifiable domain (in some cases fuzzy) and each acts only on the
resultants of prior relations (down to the sounds), we can in principle measure the effect of each
relation, or each proposed analysis, in the whole grammar (10.4.5)—and even which proposed
grammar is ‘least’.
1 The truth of a sentence is a special case of its meaning, relating the information of a
sentence (11.6) to axioms or statements believed to be indeed the case (10.5.4). Unlike
meaning in general, it is not a known property for every sentence.
Method 39
the situation in which it is said, are undoubtably adequate, and essential, for
grasping the meanings of many words and short sentences. But they cannot
suffice for every word, nor for its different meanings in different combinations.
The meaning of such words as time, consider, the, of, and the secondary-
sentence meaning of the relative clause structure, cannot be adequately
gathered from the real-world situations in which they are used. Also, any
metalanguage in which their meaning could be given would have to have
already for its own words the kinds of meanings which are being explained.
Hence, these less obvious meanings can be learned—by speakers of the
language and by analysts of language—only by much experience with the
neighboring words and sentences with which they occur. Indeed, when one
learns the meaning of a word from its dictionary definitions, one is learning
from its sentential environment—the words of the definition and the examples
of usage—and not from an extra-linguistic experience with the word.
Furthermore, many words vary the combinations into which they enter, and
with that their meaning. We cannot in general learn the change in co-
occurrents from a prior change in meaning, but we can learn the change in
meaning from seeing the change in co-occurrents: e.g. meat from earlier
‘food’, fond from earlier ‘foolish’, provided and providing in their new
conjunctional meaning ‘if (11.1).
It follows that for an adequate knowledge of meanings in language, we
need to know the constructions, in addition to an unspecified amount of
semantic information garnered from the situations in which earlier-heard
words and sentences had been said. Linguists have discussed when to bring
meanings into the analysis of language—before, after, or simultaneously
with the grammar. However, such a choice does not really arise. This is so
because the meanings that are expressed via language (and even more so
those which are inexpressible in language) are not in general discrete and
definite items which can be mapped onto the words of linguistic constructions.
There are a few sets of meanings which we can know precisely independently
of language: for example, numbers, biological kinship relations, the spatial
coordinates as we experience them. In these cases, we could start with the
meanings and ask how a language expresses them. But this method cannot
be extended to the vast bulk of meanings that correspond to words and
constructions of a language. There, we cannot list a priori a set of meanings
covering the relevant experiences of a given community. We have to see first
what the vocabulary of the language is and then how it is used, in what
combinations and in what constructions. We cannot say a priori what objects
or actions or relations are included in one meaning as against another: we
have to see what words combine with chair as against bench, or with slip as
against slide, or with from as against out of.
In addition to being not sufficiently knowable independently of the word
combinations in a language, the meanings are not precise enough to specify
all combinations of a word, and are not sufficient to predict how words will
42 Introduction
combine. For example, from any purely semantic definition of full, empty,
equal, one would not expect that the language would contain fuller, fullest,
and emptier, emptiest, while more equal does not occur in ordinary English
(though it is used in George Orwell’s Animal Farm, without taking us
outside of the English language). A person, e.g. a foreigner, who selects
words for his sentences purely on the basis of even the fullest dictionary
definitions will produce many combinations that everyone would consider
wrong, and would miss many normal combinations.
Not only is it untenable to use meaning as the general framework for
language analysis, we also cannot use it as an occasional criterion in deciding
how to analyze linguistic sentence structure, for example, at points where
finding a simple combinatorial constraint is difficult. For one thing, the
decision where to use it would be arbitrary and would be different for
various investigators. For another, as Leonard Bloomfield pointed out,
appeal to meaning when formal analysis is difficult cuts off further attempts
to find a formal ‘explanation’ (in the sense of 2.2). When one perseveres in
purely combinatorial investigation, satisfactory explanations are reachable
in various cases where meaning seemed an easier way out—for example, in
the analysis of pronouns (5.3.2, 2.3) and of metaphor (4.2). More importantly,
when the whole language is analyzed consistently in terms of constraints on
combination, with no freedom for the investigator to beg off from that
method, language is found to be built out of a few virtually exceptionless
rules whose force does not appear when we look only at partial analyses of
language, or when constraints on combination are referred to only sporadic¬
ally. Also, the grammatical constructions that are obtained in a purely
formal analysis turn out to have sharper correlation with meanings than do
those recognized in ordinary partially semantic grammars (cf. the last
paragraph of 2.4).
Meanings in language, then, are meanings of words and meanings of
constructions—those words and constructions which are in the language,
and which have been known before we ask their meanings. (And indeed, the
boundaries and identities of words and sentences can be established by
statistical means, with no knowledge of the meanings.) Thereafter, we find
that the meanings are not additional properties unrelated to the syntactic
forms, but are a close concomitant of the constraints on word-choice in the
operator-argument relation and on the participation of words in various
reductions and constructions (11.1, 3).
All this is not to deny the usefulness of considering meaning in formal
investigation, let alone in studies of language use, language arts, and the
like. Meaning, and especially the perception of difference in meaning—
whether between two forms or between two occurrences of a form—may be
used as a preliminary guide where to look for possible constraints. It is clear
from all the above that in no way is meaning being excluded here from the
study of language, but that it is not being used as a criterion in determining
Method 43
2.5.0. Introduction
First is the fact that the set of utterances in a language (as describable in a
grammar) is unbounded. There is no reasonable way of specifying what
would be the longest sentence, and almost every sentence can have something
added to it, whereas the speakers and understanders of the language are
finite beings. Many things in language—what sounds are distinguished
phonetically, what phoneme-sequences are pre-set as words or morphemes,
what kinds of word combination are used—depend upon a finite body of
public experiences on the part of speakers and hearers. This means that
every language must be describable by a constructive (recursively enumerable)
system, one in which there is a finite and reasonably learnable core upon
which repetition or recursion without limit—and also change—can be
stated (6.0), with possibly a finite learnable stock of ‘idiomatic’ material
which is not subject to these unbounded operations.
The success of writing systems, all of which use discrete marks, as carriers of
language suggests that language can be represented with discrete elements.
In addition, it will be seen in 7.1 that in the continuous events and changes in
44 Introduction
speech one can find ways of making cuts and distinctions which create the
discrete elements of spoken language and of structural language change. It
has also been found that the events which remain as continuous elements do
not combine complexly with other elements, and so in effect remain outside
the grammar. Once discrete elements are reached, the choice of relevant
methods for further investigation is greatly narrowed.
2.5.3. Linearity
2.5.4. Contiguity
Talk or writing is not carried out with respect to some measured space. The
only distance between any two words of a sentence is the sequence of other
words between them. There is nothing in language corresponding to rhyme,
meter, or beat, which defines a space for poetry and music, or to the bars in
music notation which make it possible, for example, to distinguish rests of
different time-lengths. Hence, the only directly experienced elementary
relation between two words or entities in a word-sequence is that of beine
next neighbors, or of being simultaneous (especially in the case of intonations).
Any well-formedness for sentence structures must therefore require in the
base a contiguous sequence of (discrete) objects, although later permutations
and operators may intervene. The only property that makes this sequence a
construction of the grammar is that the objects are not arbitrary words but
words of particular classes. But the sequence has to be contiguous, it cannot
be spread out with spaces in between, because there is no way of identifying
or measuring the spaces.
By the same token, the effect of any operation that is defined in language
structure, i.e. the change or addition which it brings to its operand, must be
in its operand or contiguous to that operand. Of course, later operators
acting on the resultant may intervene between the earlier operator and its
Method 45
It will be seen in Chapter 7 that the search for a least redundant apparatus of
description leads to many definings of classes, carried out in such a way that a
property can be more simply stated about the class than about the individual
members. This may happen because all the members have the same property,
which we do not want to repeat for each member. Or some members may
have only some of the class properties, so that they are only in some respects
members of the class (e.g. the English auxiliary verbs, G300). And some
members may have some properties while others have others, and it may be
possible to say that two such members between them constitute a whole
(composite, or disjunctional) member of the class (e.g. go and wen- above).
In the last case, we can think of the composite member as being the true
member, with the members that are included in the composite being
regarded just as variant forms that the true (composite) member takes on in
various situations. In all these cases it is enlightening to state a class
property.
The forming of classes is thus not a ‘similarity’-based grouping of the data
but a complex way of finding the most regular way of attributing properties
to entities, that is, of defining properties and their domains (2.3). Therefore,
a consideration in forming classes is the variety, degeneracy, and other
difficulties which are met with. For example, there is the question of how
many and how unusual are the differences among the various sounds
(allophones) grouped into one phoneme: this is equivalent to asking how
similar the sounds of a phoneme are in its various environments. A compar¬
able question is how similar are the phoneme-sequences that constitute a
given word, in the various environments in which that word appears. We can
say that part is a phoneme-sequence constituting a word; and hart is a
46 Introduction
The fact that there are various languages in the world and dialects within
them, and that languages change through time, means that there are
different possible objects for linguistic description and analysis. One could
try to describe all languages together—either their intersection, namely
what is universal and common to all, or their join or envelope, namely
anything that is found in at least one language. One could also try to describe
a single language or family through time, giving the succesive forms of
particular elements and constructions. However, there is special importance
Method 47
Natural languages are open: new sentences and discourses can be said in
them. Furthermore, natural languages change in time. As far as we can see,
they do so without at any point in time failing to have a largely coherent
grammatical structure.
At any moment in the history of a language, it is possible to make as
complete a grammar of the language as we wish. No item of the language
need be left out as undescribable; any item which is not a case of existing
rules of the grammar can be listed separately or fit in (as a special case under
special conditions) to some existing rule in respect to which it can be
described. That is, we can describe a language at any time q, giving as
complete a description as we wish and necessarily including a large number
of individual facts (operations whose domains are individual words). At any
sufficiently distant time t2 we can do this again, and in all likelihood there will
be some difference in the two descriptions, in respect to some items X, due
to changes in the language in the intervening period.
Since tx and (2 are arbitrary, and since there are few if any discontinuous
points in a language history (although there may be such in the description of
language history), the description used for X at tx must be valid up to some
period fi, tx<tx<t2, and the description used for X at t2 must be valid from f,
and on, without there being a discontinuity at t,. We conclude that during the
period tn the grammatical items X must have been describable in two
different ways. Before th the item might be described in one way, to fit it into
certain features of the grammar. After th it will have changed sufficiently so
as to require a different description, fitting it into some other features of the
grammar. At th both descriptions must have been possible, i.e. at (, the
amount of change in X must have been sufficient to make X fit the t2
features, but not so great as to make X no longer fit the /, features.4 To this
4 It does not matter whether an individual changes Ins speech during /,, or whether the items
X are used differently hy people who began to speak after /, as against those who began to
speak before
48 Introduction
extent at least, some forms in language may have non-unique analyses. The
situation at f, is indeed often observable in detailed grammars, e.g. in the
case of transformations which are in progress. Thus in He identified it by
the method of paper chromatography, we have two sentences connected by
semicolon (secondary stress): He identified it by a method and The method
was of paper chromatography. In He identified it by the means of paper
chromatography, we can attempt a similar analysis into He identified it by
some means and The means were of paper chromatography. This is, however,
not very acceptable, and a more acceptable analysis would be to take by the
means of as a new preposition similar to by itself. The latter analysis is
already inescapable in He identified it by means of paper chromatography,
since Means were of paper chromatography does not exist.
The possibility of a structural static grammar at time-slices of a changing
system is due to the fact that there are similarities and other relations among
the various domains stated in a grammar and among the various operations
stated in the grammar. In terms of these higher-level systemic relations, the
exceptional domains of an operation can be shown to be extensions or
modifications of one of its regular domains.
This means that a description of language has to provide for the existence
of items which don’t quite fit into the rules for the rest of the language, but
can nevertheless be related to those rules as extensions of their domain or
small modifications of their operation.
considerations may not have great weight, and may not even be the creators
of particular forms and relations; but for purposes of approaching a theory
we must first see to what extent a single relevant method can explain the
forms, and to what (if any) extent forms due to other influences can be
domesticated—reinterpreted—in terms of the central method.
It is especially possible to consider the analysis as laying the groundwork
for a theory, when every set of elements and operations or relations that is
finally reached is structurally related to a single language-wide property,
rather than some of the sets being merely residues created by setting up the
other sets. An example of the latter case would be if the set of inter-sentence
derivations were not characterizable as reductions but were simply the set of
whatever changes one needed to obtain one sentence from another, rather
than a motivated, structured set of reductions and transformations.
The reason that the methods help to suggest a theory is that they are not
simply empirical or descriptive. They necessarily organize the constraints
into a maximally simple system, so that the description of language comes to
consist of maximally unconstrained elements and rules (2.3) T Such a system
may not be identical with the way the speaker produces his sentences or the
way the hearer figures them out. But it has a unique least status in respect to
the data of the language (2.2). And it has a crucial status in respect to the
information carried by language, in that the information is certainly related
to the departures from equiprobability in combination; and a most parsi¬
monious grammar reveals most closely just the individual departures from
equiprobability in a language (11.4).
It should be noted that the method leading to a least system is not
reductive in a simplistic sense. It organizes the constraints, but it does not
lose them. As an example: when constraints on certain entities (say, sounds)
are used to define a ‘higher’ and more powerful class of entities (in this case,
phonemes) which are freed of these constraints, the old constraints still
apply to the members themselves within the new classificatory entities, but
these old constraints do not interfere in the simpler relations that can be
stated among the new entities.
5 This is not to say that we have here a discovery or decision procedure for grammar.
However, given a particular metatheory of language—the need to map the departures lrom
randomness (2.1), which arises from the essential lack of an external metalanguage—we can
propose procedures that lead to a least grammar, which constitutes a best fit of the data in the
niven direction, though not uniquely the best fit. It is nevertheless the case that the departures
from randomness in the various languages are so massive and so similar that all grammars
exhibit a strong family resemblance, no matter in what style they are stated.
II
Theory of Syntax
3
A Theory of Sentences
It has been noted (1.0,2.1-2) that a central problem for a theory of language
is determining the departures from equiprobability in the successive parts of
utterances. It has been found that in natural languages it is not merely that
not all combinations of parts are equiprobable, but that not all combinations
occur. That is, given even very large arbitrary samples of the utterances of a
language, some word combinations have zero probability of being found.
For the given language, then, we make a distinction between the combinations
that have zero probability and those that have any positive probability. It
will be seen below that the zero probability characterizes what is outside of
syntax (3.1), while the differences in positive probability characterize the
differences in word meaning within syntax (3.2).
Hitherto, we have been considering only the directly observable data of
word combinations in a language. However, simply listing the combinations
is not only impossible because of their number, but also fruitless because it
leads to no relevant principled classification. Furthermore, any such listing
would be imprecise since, for a great many word combinations, speakers of
the language are uncertain or do not agree, as to whether they are in the
language: given that take a walk and take a run are in, are take a climb or take
a jog or take a crawl also in? Finally, since the details of word combination
change faster than anything else in language, no exact list would be correct
over any long period.
In contrast with list-making, it is possible to characterize the word
combinations by stating a set of constraints each of which precludes particular
classes of combination from occurring in utterances of a given language
(1.1). These constraints will be found to hold over long time periods. They
can be so formulated as to act each on the product of another constraint
(beginning with a null constraint, i.e. with random word-sequences), so that
a few types of constraint will suffice to preclude very many types of word
combinations and produce the others. It will be seen that the constraints
which produce the combinations found in a language can allow for many
exceptional forms and changes, by minor fuzziness in the domains over
which they are defined: e.g. why America was left by me on July 6 is odd, but
America was left by a whole generation of young writers in the twenties is
54 Theory of Syntax
natural (1.1.2, G365). The regularity of the constraints, and the simple
relations among them, fit in with a mathematical formulation of syntax; and
their direct specification of departures from equiprobability gives them a
direct informational value.
Characterizing sentences in terms of constraints on combinations rather
than of word combinations directly is a matter of method. But when the
sentences are described as resultants of constraints, one can consider the
constraints to be constructs in a theory of sentence construction.
Since sentences have not been defined at this point, these constraints are
not, in the first place, stated in respect to the set of sentences. They can be
said to hold over the set of utterances provided that, first, a small number of
frozen expressions (such as goodbye and ouch), each with unique constraints,
are excluded, and second, the period punctuation between sentences (or, in
speech, between successive sentence intonations) is considered a conjunction
roughly like and. In 7.7 a procedure for finding sentence boundaries will be
presented, permitting us to define sentences. The constraints of Chapter 3
will then suffice for sentences. Further constraints, among sentences of an
utterance, will then be presented in Chapters 9 and 10.
3.1.0. Introduction
3.1.1. Dependence
wear, but pairs: child, coat {The child wore a coat), and night, hue {The night
wore a dark hue), even tree, bark {This tree wears a heavy bark), or very
rarely child, bark {The child wore bark), but not child, eat {no The child wore
eat, or . . . wore eating), or go, bark (no Go wears bark, or To go wears
bark), or eat, because (no Eat wears because). Much the same exclusion
holds for eat {child, egg or acid, copper, etc. occur, but not the pairs go, child
or go, because: no Go eats because), although more pairs can be readily
found with eat than with wear. One might think that pairs like child, egg or
acid, copper that are not found with wear should be excluded there. But
although we do not find The child wears eggs, we find She wears a set smile.
She wears perfume. She wore roses in her hair and fruit on her hat and we may
find The child wore egg on his face. In any case, the set of words definitely
excluded under sleep, fall (namely because, go, sleep, fall, etc.) is also
excluded under wear, eat. And the words that are not excluded under sleep,
fall, either because they are found there or because they could conceivably
be found there, are similarly not excluded in both positions under wear, eat.
We thus find support for a class of words postulated to have zero
likelihood of occurring with the sleep and wear set in the shortest sentences
of the language—shortest either observably, or after correcting for the
reductions presented in 3.3. This class will henceforth be marked O (operator),
while the residue class, whose members have some positive likelihood no
matter how small under the sleep and wear classes, will be marked N because
most of them are simple (unimorphemic) nouns. The sleep set, a subset of O,
will be marked On, and the wear subset On„, to indicate that they occur with
one or two members of N respectively.
We next note that in any shortest sentence in which is probable or is
unlikely occur, they are found with a member of the O class (including the
On and Onn) and not with a member of the N class alone (except, rarely, due
to reductions). We have wear under is probable in (1) That John will wear a
tie is probable, but we don’t have John is probable. More precisely, the only
members of N that are found in those sentences (e.g. John with probable in
(1)) are the ones already accounted for as occurring with the On, O,,,,
members in question, as in the case of John, tie occurring with wear when
wear occurs with is probable in (1). The case of is unlikely, is probable
appearing only with N (e.g. A boy is probable in guessing about a birth) are
accounted for by the reductions of 3.3 (in this case from e.g. For the baby to
be a boy is probable), as are the cases of an O without its N (e.g. To win is
unlikely reduced from For us to win is unlikely or other such sentences). The
class of is probable is then marked O0, since its co-occurrent (argument) is
an O (which then brings its own co-occurrents into the sentence). Note that
the O argument under 00 may be not only On (like wear above) but also O0,
as in That John's coming is unlikely is quite probable.
Words like assert, declare, deplore are found to have two co-occurrents
with them: first an N and second an O, as in John deplores your arrival,
58 Theory of Syntax
We now consider the word classes on which a word depends, i.e. what
classes appear in argument position. The crucial finding here is that each of
these argument classes is characterizable not by a list of members (which
would be difficult, since words can be added to a class or lost from it), but by
what that argument class in turn depends on. For English, each argument
class is characterized by whether it depends on something at all (i.e. on the
presence of yet other words in the sentence) or on nothing: that is, by
whether it in turn has an argument or not. In John asserts Mary came, we say
that asserts (Ono) depends on the pair John (N), came (On); and came
depends on Mary (N). In [urn, John, Mary can be said to depend on nothing;
they can also be said to depend on nothing except the operator that depends
on them (reasons for the first formulation: see below and 3.2.1 end). Thus
the presence of asserts in a sentence depends on the ordered presence of a
word (e.g. John) that depends on nothing further, and of a word (came)
which does have something further on whose presence it depends (in this
case, Mary). No property (e.g. meaning or subclass) of the pair John, came
other than their dependence status is essential for the presence of asserts in
the sentence. The presence of a word, then, depends not on particular other
words but on the dependence properties of the other words.
The status of N is clearly different from that of all the O classes. The
question is what should be the systemic status of this difference. Here it may
be relevant that the ability of an N word to occur alone as an utterance differs
from that of all the O words. When a word occurs alone as an answer, that is
only because its environing words were zeroed as repetition of the question:
A Theory of Sentences 59
Who was there? John from John was there; What did John do all day? Study
from John studied all day. When an O word appears alone, in imperative
intonation, it can be shown to have been reduced from a particular sentence
structure (G351): Stop! from / order you that you stop. But when an A word
appears alone, with or without exclamation intonation, it is far less clear
what are the specific environing words if any that have been zeroed: e.g. the
various uses of John!, or A hawk!, or the numbers (in counting). True, these
grounds do not determine any single way of characterizing N. But they
support the preference for the first formulation above, that an N word be
defined as depending on the null class rather than on the operator over it.
Then by the side of the primitive arguments N, which require nothing, all
O are operators, with non-null requirement. All On „, e.g. On and 0„„,
require only words that themselves require null. All O 0 , e.g. O0, Ono,
Ooa, require in at least one argument position a word that requires some¬
thing, which in turn may require something or null. Such are, in English Oa
(e.g. probable), Ono (assert), Onoo (attribute, in John attributes his success to
my helping him), O00 (entail, because). There is also a class Oon (astonish, in
John's winning astonished me); and within Oon, the prepositions (e.g.
without in John left without me) as operators almost all of whose members
can also appear, due to various reductions, as if they belong to other O
classes.
depth, of A). When A>B and there is no C such that A>C>B, we say that A
is the operator on B (A covers B), and B is the (immediate) argument of A.
When A covers both B and D, we call B and D co-arguments of A. The co¬
arguments of an operator occurrence are linearly ordered: A on B, D does
not form the same sentence as A on D, B. In John expects Mary to come we
have John, come as co-arguments of expect, and Mary as argument of come.
The ordering is a semilattice; and no word-occurrence can be an immediate
argument of two operator occurrences (6.3).
If one wishes to think of the making of a sentence as a process, one can
view the semilattice, that is the dependence order of the words in a sentence,
as the partial order of entry of the words into the making of the sentence. No
word can enter into a sentence, which is being constructed out of words and
previously made (component) sentences, unless its prerequisite is available—
this means, unless its required argument is present and is (under the lattice
condition) not serving as immediate argument of any other operator in the
sentence. The prerequisite is any one word of each of these classes on which
the given word depends. Then each entry of an operator makes a sentence,
out of a non-sentence, namely N, or out of a sentence (that is, something
that has been made into a sentence by a prior-entering operator). Thus come
made a sentence by operating on Mary, while expect made a sentence by
operating on the co-argument pair John and come. The linear ordering of
the co-arguments means that expect may not be defined (and indeed is not)
on such a pair as come, John (no Mary's coming expects John). And though
both John saw Mary and Mary saw John are sentences, they are not the same
sentence, nor is one derivable from the other or paraphrastic to it. The
similarity of operators to functors in categorial grammar within logic is
evident.
3.1.4. Predication
mind independently of their expression in language, but this could not be the
case for concepts which are essentially about language. Therefore sentence-
hood can only have arisen not as a vehicle for an otherwise conceptualizable
relation of predicating, but as a concomitant of some other, more behavioral,
relation which then yielded predication as an interpretation. We may
suggest that this underlying relation was the practice of not saying certain
words of a set a except in the company of certain others of a set b, because
the a-words were not specific enough except when used only with, hence
about, a 6-word.
3.2.0. Introduction
If we consider the various words which have the same argument class, e.g.
see, eat, etc. in Onn, we find that they have different inequalities of
likelihood in respect to the individual argument words in the class: see has
greater likelihood, but eat less likelihood, on John/water than on carl
gasoline. In fine, we can state the inequalities of likelihoods of each word in
an argument class in respect to the words in the operator class over that
argument: e.g. the likelihood of water to be in the second N position under
see, under eat, and under other Onn.
The distinction between argument-requirement and likelihood is not
immediately observable in the data. It is a choice of analysis to isolate the
combinations whose frequency not only is zero in the given data, but can be
expected to remain zero in any further data, and whose distinction from the
likelihood-based combinations is expected to be useful for constructing a
theory. The dependence of an operator on an argument is thus a construct. It
is based on a particular criterion, of what arguments they in turn depend on,
which has the merit that the arguments in turn can be characterized by the
same criterion. In contrast to this construct, the likelihoods of an operator in
respect to its argument words are the direct data of word-combination
frequency within the dependence structures, in (a sample of) the sentences
of a language (3.2.3, 3.2.6). These likelihoods form a vast body of data,
which cannot be surveyed except in limited sublanguages (ch. 10).
The inequalities of likelihood of a given operator word in respect to an
argument class amount to a rough grading of likelihood, though not in
general to a specifiable partial ordering thereof. If we check the likelihoods
for sleep, for example, we find many N words which have reasonable
likelihood (much higher than the average of their class) of appearing as its
argument. These have been called, e.g. for sleep, the normal co-occurrents,
or selection, of that operator in respect to its argument. Some of these words
may occur frequently, e.g. you, I, the baby, in How did you sleep?, The baby
slept well, etc. Others may be very infrequent, as in A thrips was sleeping on
the leaf-, but even these are immediately acceptable if they are included
under a classificatory word which in turn is a reasonably frequent argument
of the given operator (see below). For other argument words the operator
sleep may have, in various degrees and respects, less than selectional
likelihood: trees, flowers, bacteria. Yet other N may be even more infrequent as
A Theory of Sentences 63
arguments of sleep, or may appear only when the sleep is under particular
further or neighboring operators. For example. The saucer slept may occur
(not even metaphorically) in a fairy-tale, in which neighboring sentences will
be of appropriate restricted types; The universe slept, The winter slept
involve metaphor, which is analyzed at the end of 4.2 as reduced from a
sentence-sequence under the operator as. And almost any N can be the
argument of sleep when sleep is under not (because it is on say, 5.5.3), as in
Rocks don’t sleep. All the latter N would not be included in the selection of
sleep.
The statuses of the various argument words under an operator do not in
general suffice for setting up any precise subsets within the argument class. It
is even impossible to define a set of ‘human’ or ‘human-like’ N as the first-
position selection under say, know, believe, expect, and other such Ono,
since we find not only I know that he left and The dog knows he left, but also
for example The virus knows when it is in an environment containing material
for replication and The dog said yes with its tail and Here are ten dollars that
say you are wrong (in laying a bet) and These rocks can expect to last another
5 billion years.
Because of its imprecision and instability, as well as its vastness, the data
concerning selection, or likelihood in general, has not been used directly in
defining the structures of language. But there are stable properties and
structures, which can be defined in terms of the likelihood inequalities or
gradings. For one thing, since reductions will be seen not to affect the
likelihood gradings among the words of a sentence, these gradings are
preserved under reductions and transformations (the latter being mostly
products of reductions, 8.1.2). Indeed, transformations can be phenomeno¬
logically defined as subsets of sentences throughout which the likelihood
gradings are preserved (8.1.3). More important, the reductions will be seen
to be based on extremely high likelihood (or, equivalently, low information)
in the operator-argument relation (3.3.2). Similarities and contrasts in
likelihood grading also yield many of the grammatical categories, semantically
important though somewhat fuzzy as to membership, that give texture to
each language grammar: e.g. durative and momentaneous verbs, indefinite
pronouns, etc. The more or less stable likelihood differences—the fine
dependence—thus constitute an additional constraint on word-co-occurrence,
beyond the dependence of 3.1.
While the likelihood constraints differ from the dependence constraint of
3.1, they do not conflict with it, because the likelihood of an operator is
postulated not to reach permanent zero in respect to any word in the
argument class. Also, the likelihood constraints do not repeat any of the
dependence constraints, which are stated on argument classes; the likeli¬
hoods are stated in respect to the argument word, as a further restriction on
what the dependence admits. While virtually everything in langifage structure
will be found to be encased in the relation between an operator and its
64 Theory of Syntax
There is a rather small set of words, e.g. say, whose arguments carry no
likelihood restriction. Thus, say can be said of any sentence, hence on every
operator, as in 1 say he’s late or He said she saw me. This holds also for deny
(source of not), ask (source of the interrogative), and request (source of the
imperative), although some stative operators such as exist may be rare under
request (G321,331,351). This set of words will be found to have the status of
metasentential operators (5.5). The further likelihoods, of operators to their
arguments, under deny, ask, request, may differ from those under say itself:
A stone sleeps. Vacuum sleeps may be unusual under say, but less so in A
stone does not sleep (from deny). Does vacuum sleep (under ask). But the
partial-order constraint of 3.1 holds throughout: Going does not sleep and
Does going sleep are about as unacceptable as Going sleeps.
There is also a small closed set of ‘indefinites’ whose union has all
operators in its selection. For example, something is in the argument
selection of very many operators, and any operator under which something
does not normally occur will be found to have somebody (and someone) in
place of something. Since they have to fit under every operator, such words
as the something-somebody set unavoidably carry the meaning of indefinites:
they do not fit one operator rather than another; hence they mean no more
than being the argument of whatever operator is on them. (Although he,
she, it seem also to constitute such a set, this is because they arise as
referential reductions of all arguments (5.3), whose union necessarily has all
operators in its selection.)
Another case is broad selection, when an operator has in its selection not
all the words in its argument but very many of them, far more than most
operators have. This is the case for some Oao (operators on a pair of
sentences) such as because, entail; also operators with mostly space-relation
meanings originally, such as near, to, before. It is also the case for certain
operators of exceptionally broad applicability, such as in a . . . manner (an
earlier English form for which was reduced to -ly as in He walked slowly. He
spoke agitatedly), or such as the condition of (an earlier English form for
which was reduced to -hood as in childhood ‘the condition of being a child’).
Also, operators of time and aspect (i.e. the onset and durational properties
of an event) occur readily on almost all operators, or on all those which are
durative, and so on.
A special kind of broad selection is seen in quantifiers, and somewhat
differently in classifiers. Words for numbers and amounts appear readily as
arguments under almost every operator (Two came late. He ate two. Much
gets forgotten), where they seem first to adjoin and then paraphrastically to
replace the words whose count or measure they are giving (Two men came
late, He ate two apples. Much that is known gets forgotten). But in addition
these words have a selection of their own, such as plus, more than: Two plus
two equals four. Much is more than little. Words for sets, containers, and
fragments are similar: The team played well from The players (in the team)
A Theory of Sentences 67
played well, but in addition The team was formed last year, also The whole
room applauded from The audience (in the room) applauded, but in addition
The room is dark; also This piece is too sweet from This (piece of) cake is too
sweet, but in addition This piece is only one-sixteenth of the cake (G260).
Somewhat differently, classifiers are words each of which has (with certain
modifications) the selection of the union of a unique subset of other words
(its classificands). Thus the selection of operators over snake includes the
selections of operators over viper, copperhead, etc. In addition, however,
there is a small selection of other environments (not applicable to their
classificands), as in Snakes constitute a suborder. When all these classificatory
words are analyzed it is seen that they are co-arguments, under quantifying
and classificatory operators, of the words whose selections they can take
over (e.g. A viper is a snake).
In addition to unrestricted and broad selection, there is ‘strong’ selection,
which is seen when a word occurs exceptionally frequently with a particular
operator over it or argument under it: e.g. verbs under particular prepositions
(as operators on them), in wash up as against the less frequent wash down
and the rare wash under; also the frequent stop in and stop over as against the
rare stop out; and, differently, the frequent occurrence of hermetic with seal
as argument (hermetically sealed from The sealing was hermetic in manner).
An extreme case of strong selection is the narrow or quasi-idiomatic, as
seen for example in the operator loom, where we have by the side of loom
itself (as in It loomed out of the dark) also It loomed large (roughly derivable
from an artificial It loomed into its being large) but not It loomed small nor It
loomed dangerous, etc. We can consider as strong selection also those words
that do not combine grammatically with anything, and so occur only under
metasentential and metalinguistic operators (ch. 5): e.g. ouch, hello in He
said Ouch, Hello is an English word.
The major case of narrow selection is seen in idioms and frozen expressions.
These may have unusual word combinations as in hit upon (He hit upon a
new idea), or unusual reductions as in make do or The more the merrier, or
unusual meaning of the whole expression as in old hat or kick the bucket in
the general sense of ‘die’ (these are often due to metaphor (4.2), which is a
product of reductions).
There is also a strong ‘negative’ selection. Comparable to the demand that
certain words or repetitions appear under a given operator is the demand
that certain words not appear, that is, the rejection of certain argument
choices. Thus while one can readily say He is truly coming. He is certainly
coming. He is undoubtedly coming, one cannot say He is falsely coming, He
is uncertainly coming, and perhaps not He is doubtfully coming. The reason
is that He is truly coming is a reduction of the sentence-pair He is coming;
that he is coming is true; hence (1) He is falsely coming would be a reduction
of (2) He is coming; that he is coming is false, which would generally be
rejected. The difficulty lies not in That he is coming is false, which can readily
68 Theory of Syntax
be said, but in conjoining the two sentences, which is not likely when one
sentence denies what the other sentence asserts. The contradictory com¬
bination can physically be said, as in (2) above; but because of its unlikeli¬
hood (negative selection) it does not participate in the reduction to adverb,
falsely in (1) above (G306).
The types of likelihoods seen above, from unrestricted, broad, and strong to
narrow and negative, are all observable directly in the data. There are
also other situations, in which the regularization and simplification of
the description of the data (2.3, 2.6) make us reconstruct certain word-
occurrences which in turn show that certain word combinations had high
likelihood in the reconstructed source.
For example, as noted above, expect in most cases has the argument pair
N,0, as in / expect Mary to win, where the On word win has Mary as its
argument in turn. When we find a shorter second argument, apparently just
N, as in I expect Mary, the problem is to derive it from the more common O
second argument, since in almost all cases when a word has two forms of
argument one can be derived from the other (2.3). In this case it is found that
any further likely operators on I expect Mary are the same as are likely on /
expect Mary to come (or to be here). Also the likelihoods of the various N to
appear alone as object of expect are graded in about the same way as the
likelihoods of those N when they are subjects of to come (or to be here)
(alone, or when these last are objects of expect): / expect time is less likely
than / expect Mary, as / expect time to be here (or just Time is here) is less
likely than / expect Mary to be here (or Mary is here). We conclude that I
expect Mary is a reduction of / expect Mary to be here, etc. In the source
forms, once every N second argument under expect is reconstructed to N to
come (or N to be here), the locally synonymous set come, be here, etc., now
has greater likelihood as second argument of expect than any other word or
locally synonymous set in that position (G161).
Another example of high likelihood in reconstructed sentences is seen in
the operators that are called reciprocal verbs. We consider John and Mary
met, by the side of (1) John met Mary and Mary met John, whereas there is
no John and Mary saw by the side of (2) John saw Mary and Mary saw John.
We note first that when a sentence like (2) exists in the language then a
sentence like John and .Mary saw each other also exists, with each other
being taken as a reduction of Mary and John respectively. Then from (1) we
have John and Mary met each other. What is special about met is that the
each other is zeroable. This happens under all operators of a subset Orec
(reciprocal verbs) of Otw, which are definable by the condition that given (3)
/V, Orec N2 (where /V, and N2 are in the argument selection of the Orec) in a
A Theory of Sentences 69
text one can always add (4) and N2 Orec A, without affecting the likelihoods
of the further operators, i.e. without affecting what the rest of the text says
about (3). This means that we can reconstruct every occurrence of (3) N\
Orec N2 into an occurrence of (3, 4) A, Orec N2 and N2 Orec A, without
skewing the structural description, or distorting the information, in the
utterances of the language. With Orec operators, the contribution of (4) is
implicit; introducing it does not add information and is indeed not commonly
done (hence the rarity of (1) John met Mary and Mary met John). Every
occurrence of John met Mary (and so for every Orec sentence) can then be
considered as a case of John met Mary and Mary met John, which is
transformable to John and Mary met Mary and John respectively, which is
always reducible to John and Mary met each other. The each other is thus an
alternative which is available on every occurrence of John and Mary met,
and is zeroable (3.3.2). Note that when (4) is not likely, as in John met his
doom, where doom is not in the first-argument selection of met (i.e. not a
normal subject of met), so that His doom met John is most unlikely (though
one cannot claim that it has no meaning), we rarely if ever have the
reduction to John and his doom met each other, and hardly ever the further
reduction to John and his doom met (G314).
A major type of high likelihood in reconstructed sentences is to be found
in the reconstruction of word-repetition. A simple example is seen in
repetition under and, or, but. To see it, we first consider the conditions for
-self. If we compare / introduced myself, You introduced yourself, which
occur in English, with 1 introduced yourself. You introduced myself, I
introduced me. You introduced you, all of which do not, it is clear that -self is
adjoined to a pronoun (which is a reduced repetition of a neighboring
‘antecedent’ argument) as object (second argument) of an operator if and
only if that pronoun repeats the subject (first argument) of that operator:
-self is adjoined to / introduced me and not to / introduced you. If now we
wish to derive (1) / came up and introduced myself, contrasted with the non¬
occurring I came up and introduced yourself, simply by extending the
conditions for -self, we would have to say that the antecedent could also be
the subject of the verb preceding the verb whose pronominal object receives
the -self. This is not a simple extension, because many other words can
intervene between the antecedent subject and the pronoun with -self.
Instead, we can reconstruct a subject for the second-operator (introduced)
by assuming this subject to have been zeroed on grounds of its having
repeated the first argument (I) of the first operator (came up). The recon¬
structed sentence would then be (2) I came up and l introduced myself,
where no extension of conditions is needed to explain the -self.
This reconstruction is justified independently. In all sentences consisting
of a subject followed by two operators connected by and, or, but, both
operators have the class of that subject as their required subject class (e.g.
no NO,, and O0: John phoned and is probable)-, and both operators have the
70 Theory of Syntax
given subject word in their subject selection. If the second operator does not
have the first subject in its own selection the unlikelihood of the sentence is
immediately noticeable, as in John is unmarried but pregnant. Put differently:
the likelihood grading of operators after John is unmarried but ... is
approximately the same as after John is unmarried but John . . . (e.g. is
pregnant would be equally unlikely in both cases). Description of data such
as the occurrence of -self and the likelihood of second operators above
would require additional grammatical statements unless we assume that the
missing subject of the second operator is a zeroed repetition of the subject of
the first operator. By the criterion of least grammar (2.2) we reconstruct this
repetition, obtaining (2). When we do this, (1) / came up and introduced
myself is a case of (2) / came up and l introduced myself. Partly similar
analyses can be made for missing second operators (e.g. plays in John plays
piano and Mary violin), or objects of the second operators (e.g. the new keys
in John misplaced, and Mary found, the new keys), under and, or, but. When
these missing words are reconstructed as repetitions of the correspondingly
placed word before the and, or, but, we obtain filled-out second (component)
sentences after the conjunctions, without any missing words. In the set of
sentences, including component sentences, which have no missing words, it
is then seen that under these conjunctions it is more likely that at least one of
the positions after the conjunction is filled by the same words as in the
corresponding position before the conjunction, than that every position
after the conjunction differs from the corresponding position before. This
property, of expecting some word-repetition in corresponding positions, is
part of the selectional characteristics of and, or, but. As such, it contributes
to the meaning of these conjunctions, which are readily used to collect the
various acts or events done by a given subject or to a given object, or the
various subjects and objects which participate in a given act (G139).
Another type of repetition preference, not restricted to corresponding
positions, holds under many other conjunctions. When we consider bi-
sentential operators (0„„), we find what seem to be semantic differences,
between arbitrary-sounding or nonsensical sentences such as I sharpened my
pencils because the snow was soft on Tuesday and ones that sound ordinary,
like People are corrupted by (their) having access to power. It is notable that
in many cases the latter, more ‘acceptable’ kind of sentence has a word
repeated in its two component sentences (e.g. people and they) while the
former kind does not. The relevance of this observation is supported when it
turns out that the former lose their eyebrow-raising effect when a third
component sentence is adjoined which creates a word-repetition bridge
between the original component sentences: e.g. / sharpened my pencils,
having promised him a pencil sketch of a snowy tree, because the snow was
soft on Tuesday. True, there are many bi-sentential sentences which lack
repetition but are nevertheless immediately acceptable, such as 7 will stay
here when you go. But in such cases we can always adjoin a third component
A Theory of Sentences 71
object of the higher operator. One could say that this is just a reflection of
the meanings of these higher operators. But almost all selection can be
considered a reflection of meaning. Nevertheless, if we start with the
meanings—especially given our lack of an effective universal framework by
which to formulate such subtle meanings—we could not reach the particular
classification of words which we have found here, in respect to particular
grammatical processes such as the position of omission: how zeroings under
offer differ from under order, and how French defendre has the repetition
likelihoods and zeroings (and meanings) of both defend and prohibit in
English.
and ad hoc entities into English grammar, and would violate the fundamental
partial ordering of 3.1. Alternatively, we may say that can and the others are
Ono operators of the prefer, want, type above, in which the subject of the
lower operator is omitted when it repeats the subject of the higher operator.
Then the only serious difference is that under can, etc., for the subject of the
lower operator to be different from the subject of the higher operator has
not just a lower probability but zero probability: one does not say I can for
John to play violin as one can say / prefer for John to play violin and even /
want for John to play violin. In earlier periods of English, some of the
auxiliaries indeed could have different subjects for their lower operator, so
that we have here not a principled restriction on these higher operators but a
lowering of probability to zero (G298). The result in modern English is the
same whatever the history, but the history provides support for saying that in
addition to the massive zero-probability constraint that creates the operator-
argument classes (3.1), a few words within an established operator class may
lower to zero the probability for part of their argument class (this being then
viewed as a limiting case of the likelihood constraint).
The fluidity and secondariness of this zero grade in the likelihood con¬
straint, in contrast to the large-domain zero-probability constraint that
underlies the argument relation, is seen in a few words such as order, merit,
which approach the zero grade seen in can, but fall short of the definite
grammatical exclusion that we find under can. The fluidity of the can
situation is also seen in the attempts to circumvent the exclusion, as in the
case of try. Many speakers of English would consider subject-repetition to
be required under try: / will try to shave myself, but only questionably (1) /
will try for him to shave himself; however, the questionable form is circum¬
vented by a virtually meaningless use of have: I will try to have him shave
himself can mean the same as (1) in the sense of l will try that he should shave
himself, aside from its other meaning of T will try to arrange for him . . (the
latter from I will try on l will have him shave himself).
The various special selections noted above yield certain subsets of words,
mostly such that the words in a subset have selections that are similar,
complementary, etc., but also of those words which together comprise the
selection of particular interesting operators (3.2.2). For example, there are
operators whose argument selections are virtually identical. Such are the
various operators expressing duration, onset, and the like, such as continue,
last, throughout, which appear on approximately the same selection of
arguments (which are themselves operators, such as sleep, exist, grow): He
slept throughout the night but not He arrived throughout the night (G77).
A Theory of Sentences 75
This situation has the effect of creating a set, very fuzzy in the case of
English, of durative verbs.
There are other words whose selection is similar to (mostly, has a large
intersection with) the selection of some other word, or includes the other
selection, e.g. classifier words as compared with their classificands. Also,
there are words whose selection contrasts with, or is virtually complementary
to, the selection of certain other words: sleep, grow, are said normally under
was throughout, lasted, etc., and not so readily under was sudden, occurred
at, etc.; but arrive, depart, say etc., are readily arguments of was sudden,
occurred at, and rarely of was throughout, lasted. Such contrastive selections
may partition the words of a class into two (or more) selectional subsets
(e.g. in some languages imperfective or durative as against perfective or
momentaneous verbs, G275). There are also words such as speak, think,
which occur as normally with lasted as with was sudden. Hence ‘aspect’ in
English is a graded property, which does not easily create subsets of
operators, and does not partition them as having sharply either durative or
else momentaneous selection.
In almost all cases the selectional subsets are fuzzy—imprecise or in
various degrees unstable as to membership; some words have some but not
all the properties of the subset, or new words enter the subset, and so on.
Thus the pair something, somebody (or someone) can be considered as
variants of a single indefinite noun; the word people has some of the
properties of somebody but not all. The auxiliaries can, may, etc., seem at
first like a well-defined English subset; but, for example, need, ought have
only some of the properties common to the set (G300). The property of
falsely above (3.2.2 end) is shared by uncertainly, improbably, doubtfully,
and a few others, but not with equal definiteness. Setting up such subsets is
not necessarily precluded by such complexities of conditions. For example,
an operator which is not in the selection of throughout may enter that
selection if it has a plural subject (first argument): there is no The guest
arrived throughout the afternoon, but there is The guests arrived throughout
the afternoon. This does not mean that the ‘momentaneous’ operator arrived
has become ‘durative’, but that a momentaneous operator with a plural
subject {The guests) can constitute a durative sentence. Durativity is selec¬
tional; it is thus a property of a word together with its arguments; the
definition of selection allows it to be affected by lower parts of the argument
chain. But the collecting of operator words into (fuzzy) subsets is still
possible if we choose controlled conditions with maximal distinction: English
durativity would be defined on operators with subjects in the singular (with
certain durativity restrictions on their objects too, such as between the
continuous or ‘imperfective’ line and the completive or ‘perfective’ circle as
objects of draw).
All the word-class names used above (other than the N, On, Oa, etc.
defined in 3.1) are definable in respect to selectional (3.2) or morphological
76 Theory of Syntax
(3.3) properties on the N, <9„, etc. This holds, for example, for metasentential
operators, indefinites, pronouns, adjectives, prepositions, auxiliaries, re¬
ciprocals, and durative-momentaneous verbs.
The difficulty in establishing word subsets shows that in general languages
have few if any definite subsets in respect to likelihoods (3.2). In science
sublanguages (10.3-5) it will be seen that selectional subsets are possible
and of great importance, when subject matter is restricted.
semantic factors that we can distinguish are more varied yet, and are not
organizable in any objective way, and not measurable. Among such factors
in determining frequency are: the informational usefulness of particular
word combinations and the non-language occasions for them to be said, the
particular aspects of experience that it is useful or customary to note (time,
place, evidentiality, etc.), euphemisms and avoidances of particular expres¬
sions, expressive and communicational exaggerations (e.g. a preference for
stronger terms for time, such as may have led to development of the
periphrastic tenses, e.g. from took to has taken), preference for argot and in¬
group or colorful speech, and so on. Aside from these more or less semantic
factors determining likelihood and the change and extension of likelihood,
there is the massive effect of analogic extension and change in selection
which is determined by complex grammatical factors (of similarity, etc.),
only part of them semantic.
3.3. Reduction
3.3.0. Introduction
In reaching the syntactic theory presented here, the core of the work was to
find a small set of relations between sentences in terms of which we can say
that one observed sentence has been derived in some explicit and non ad
hoc way from another. Here, to ‘derive’ is to find a constant difference
between two formally characterizable sets of sentences. The intent in
establishing such derivations is that the meaning be preserved in corresponding
sentences (or sentential segments) of the two sets: the meaning of John
phoned and wrote is the same as in John phoned and John wrote, and the
meaning of John phoned is the same in the sentence John phoned as in I deny
that John phoned. It was found that such derivation can be effected by
reductions and transformations which have an important property: if we
undo these, the residue, consisting of the pre-reduction (‘base’) sentences
(4.1), has the transparent structure of dependence and likelihood (3.1-2).
Furthermore, it was found that the reductions which suffice to produce this
residue are not simply an arbitrary conglomeration of inter-sentence relations
which had been so chosen as to leave this residue of transparently constructed
sentences. Rather, most of the reductions fall into a coherent and not very
large set of reductions, merely in the phonemic composition of words, which
take place when those words contribute little or no information to their
sentence. The fact that language is the resultant of two coherent systems,
each with its own structural definition (and not one being j ust whatever is left
as the residue of the other) gives credibility to each of the systems. It is
therefore important not merely to note a few examples of reduction, but to
80 Theory of Syntax
The most widespread reduction is to zero. In John plays violin and Mary
piano or in John left, and Tom too it is clear that plays has been zeroed after
Mary, and left after Tom; similarly I will go if you will is produced by zeroing
go after you will. If we compare I plan to see for myself with / plan for Tom to
see for himself it is clear that for me had occurred after the first I plan and had
been zeroed there (3.2.3). It can be shown that Never send to ask for whom
the bell tolls is in principle from Never send someone to ask . . . ; and I don’t
have time to read from I don 7 have time for me to read things (since elsewhere
read has a second argument, its object, as in He reads comics). The zeroing
of to come, to be here under expect, as in I expect John from / expect John to
be here, has been noted in 3.2.3. Complicated forms, such as metaphor, the
passive (1.1.2), and the zero causative (3.3.2, as in He sat us down from He
made us sit down), can be derived from attested kinds of zeroing. That these
are indeed zeroings, not merely 'implicit meanings’, can be seen in languages
which have gender agreement. Thus in a cafe one asks for un blanc, from un
verre de vin blanc, but in a wine shop for une blanc, from line bouteille de vin
blanc.2
Another widespread reduction is to affixes. These are short morphemes
attached to words. In English and in many languages most prefixes have the
meaning of an operator whose second argument is the word to which they
are attached, their 'host’: He is anti-war comparable to He opposes war. In
some cases the prefix is visibly a reduced form, as in mat- (in maltreat) from a
word meaning ‘badly’, or be- (in because, beside) from by. As to the suffixes
of English, almost all have the status of operators, or rarely of co-arguments
on their host, as in earthly from earth-like, from like earth (leaving aside here
the specialization of meaning in earthly). For most English suffixes, there is
no evidence of their having been historically reduced from free-word
operators; but some suffixes are visibly such reduced forms, as in sinful, or in
kingdom from Old English dom 'law, jurisdiction’, or in childhood (older
child-had) from Old English had ‘state, condition’. Note also the adverbial
-ly, as in slowly, from the early lice (the like above) in dative case, meaning
originally ‘with form, with body’. Reduction to affix involves reduction in
the stress relation to the host and in phonemic distance from it (called
grammatical juncture), as in -dom, -hood, or in childlike from like a child.
(For the position of affixes see below.)
Less common in English than the above are shortenings of exceptionally
widespread words. This is seen in the prepositions, e.g. of from off, and
beside from by (the) side of, about from earlier on plus by plus out. Most
English prepositions have been exceptionally short words throughout their
known history, but some have been reduced, even in recent times. Other
recent colloquial shortenings are, for example, will in He'll go soon, is in he's
here, has before verbs in He’s taken it (and before nouns, only marginally in
America, in He’s two jobs), going to before verbs in They’re gonna make it
this time.
Another kind of reduction is seen in the pronouns. These have been
considered to be words whose meaning refers the hearer to the meaning of
some other word in the discourse. But such a characterization would
constitute an addition to the grammatical apparatus, these meanings being
different in kind from those of non-pronominal words. For a simpler system,
we seek to derive pronominal meaning from the kinds of meaning found in
other words. To this end, we note that in any case we have to recognize the
existence of ‘metasentential’ words (ch. 5), and among these such as cite
something in the same discourse in which the metasentential words occur;
this holds both for words like latter which refer to a location in the discourse,
and for words like say followed by quotes, which refer to the material in the
quotes. The pronouns can be derived from words of this kind. As noted in
5.3, we would consider, say, l met John, who sends regards to be a reduction
from, roughly, (1) I met John; John—the preceding word has the same
referent as the word before—sends regards. Here John—the preceding word
has the same referent as the word before— is reduced to who (or rather to -o,
the wh- being a connective, 3.44). The word referent here simply carries its
dictionary meaning, and is not itself a referential. Of course, no one ever
said (1). This complex sentence is not the historical source of the pronoun.
However, (1) shows that the referential relation of who to the preceding
word (the first John) can be represented by a repetition of that preceding
word, plus a secondary metasentential ‘sameness statement’ (between the
dashes above). All the apparatus needed for (1), namely for the secondary
sentences, and the metasentential words and sentences, is attested else¬
where in the language (5.2-3).
All the above are reductions in situ: prefixes are in many cases reductions
of operators which had preceded the host word (the host being their second
argument). Most suffixes can be obtained as reductions or equivalents of the
last word in a compound word (G233); in many cases the suffix is originally
82 Theory of Syntax
3.3.2.0. Introduction
state all the information, within its grammatical universe, in the sentences of
the language, including such categories as time and number which are not
expressed overtly in operator form. In any case, these categories, which
have very broad selection, are expressed by short morphemes attached as
affixes to the words on which they act.
As to the other affixes, it is relevant that almost all have one or another of
just a few very general meanings, such as the verbalizing set including -ize,
-ify, -ate, and en- in enrich (all of which mean ‘cause to be’) or the nominalizing
set including -ness, -ment, -ion, -ure, -hood, -ship (all of which mean
‘condition’ or the like). In the case of the past and plural, it is generally
considered now that the various forms of each are morphophonemic variants of
a single morpheme: a single past in leaned, went, stood, sang, cut (here,
zero) and one plural in books, children, sheep (here, zero).
In -ness, -ment, -ion, etc., it is commonly thought that there are various
nominalizing morphemes. However, it is very hard to state a difference in
meaning, or in host domain (which words get which affix), or in sentential
environment (i.e. further operators) that would distinguish all occurrences
of one of these from all of another. Even in the relatively few cases of a word
taking two different affixes of a set, yielding two different meanings, as in
admission and admittance, the general meanings of the two affixes (in all
their occurrences on admit) are close. The main difference is the gram¬
matically uncharacterized choice of word per affix: advancement, improve¬
ment, but progression, admission-, or purify, solidify but realize, materialize.
The selection in respect to higher operators is roughly the same for all affixes
within a set, except for special problems as in -ion, - ance on admit. Hence we
can consider -ify and -ize, for example, to be complementary variants of a
single variant-rich morpheme A whose combined selection (i.e. all the stems
on which all variants of A occur) is much broader than the selection of any
one verbalizing affix. Seen in this way, virtually all English affixes either
themselves have exceptionally broad selection of host (their argument), or
else are variants of a suffix-alternation (e.g. the plural or the verbalizing
above) which in total has such selection. It is this broad selectional property
that gives the variant-rich morpheme a low information status in the sense of
3.3.2.0, which justifies its reduction to affix (in varying forms). Many problems
nevertheless remain in analyzing the selection and source of the individual
affixes.
The broad-selection word set and its synonyms can also be zeroed in
certain situations, when they occur with words for their members. For
example, (1) Gilbert and Sullivan wrote operettas, cannot be formed from (2)
Gilbert wrote operettas and Sullivan wrote operettas, since anyone acquainted
with the matter would not say (2); this, differently from Mozart and
Beethoven wrote operas. However, from (1) we can reconstruct (3) The team
(or set) containing Gilbert and Sullivan wrote operettas, from A team wrote
operettas; the team contained Gilbert and the team contained Sullivan. In (3),
A Theory of Sentences 87
it is the same as the word in the corresponding position of the first sentence:
John plays piano and Mary violin.
A less obvious case of zeroing the repetitions in fixed position is seen in the
bi-sentential operators (O00) whose selectional preference is that in their
arguments there be some word in the first sentence occurring also in the
second. It is seen in 3.2.3 and 9.1 that such conjunctional sentences which
seemed nonsensical became acceptable in an ordinary way if we interposed a
bridging sentence that related a word from the first component sentence to
one from the second: if in I sharpened my pencil because the snow was soft we
interpose, say, since / had promised a pencil sketch of a snowy tree. These
observations suggest that when we see conjunctional sentences that are
acceptable despite their lacking word-repetition we should postulate an
interposed bridging sentence that was common knowledge to speaker and
hearer and hence reconstructible and zeroable: e.g. for I'll stay if you go we
reconstruct an intervening component sentence like which means, I won’t
go-
There are various fixed positions in English in which we find zeroing of a
particular word which may be the most frequent and is certainly the most
expected, in that it, or in certain cases also synonyms to it in the given
position, can be readily reconstructed by hearers of the sentence in agree¬
ment with the speaker. Among the examples, presented below, are: which
is; also for and its synonyms before time and distance words; come and its
synonyms under expect; each other under ‘reciprocal’ verbs.
When a sentence under a secondary intonation (which can be marked by
semicolon) follows a preceding sentence, or enters as interruption after a
word in the preceding sentence (3.4.4), the first word of the interruption is
frequently which, who(m), etc., as below. This wh- word is a pronoun (i.e. a
repetition) of the preceding noun. The great frequency of the wh- pronouns
means that more interruptions had begun (before reduction) with a repetition
of the word or sentence before them than with any other word. This high
likelihood explains why the repetition is reduced to a pronoun, and also why
the which, whom, is zeroable when not separated from its antecedent by a
comma: A boy whom I knew ran up to A boy I knew ran up. One verb, is,
occurs much more commonly in interruptions than does any other verb, and
the sequence which or who plus is (even if introduced by comma) is
zeroable: The list which is here is incomplete reduced to The list here is
incomplete.
Given that the zeroability of which is arises as noted above, the decision as
to whether a particular occurrence of which is is zeroed depends on an
interesting further consideration of likelihood, which creates the ordering of
adjectives before a noun. The consideration here is that given A„ . . . A{N,
i.e. a noun with ordered preceding modifiers (mostly adjectives or adjectival
nouns) that have accumulated before the noun, a following which is is
zeroed only if the adjective following which is has good likelihood of
90 Theory of Syntax
There are various other cases in which words with highest expectability
in respect to their argument or operator are zeroed. For example, operators
expressing duration have exceptionally high likelihood on arguments which
refer to periods of time and are then zeroable in appropriate situations:
He talked during (or for) a full hour is reducible to He talked a full hour,
and I saw him during one morning (or this morning or mornings) to / saw
him one morning (or this morning or mornings); but I saw him during the
(or a) morning is not reduced in English. Similarly for space-measuring
operators on measured-space arguments, which is a high-likelihood or
preferential combination: He ran for a mile reducible to He ran a mile, and
He paced through the halls to He paced the halls', but He circulated through
the halls is not reduced, since circulate is less likely to be here a measured
activity.
More specifically, certain words are conventionally the favored or ‘ap¬
propriate' arguments or operators of certain others, and are zeroable when
they occur with those others. An example is come, be here, or the like, under
expect discussed previously. Another example given in 3.2.3 is the zeroing of
each other under ‘reciprocal' verbs, as in John and Mary met each other
(where if John met Mary it is almost certain that Mary also met John), but
not elsewhere, e.g. not in John and Mary saw each other (which is not
reduced to John and Mary saw). Also scale-operators (G245) are zeroable in
the presence of scale-measurements: e.g. These chairs begin at $50 from The
costs (or prices) of these chairs begin at $50, or This chair is $50 from This
chair costs $50 (where the operator cost is zeroed, leaving the tense -s which
is then carried by be). Such favored-word zeroing can also arise marginally in
nonce-forms, colloquialisms, or special subject-matters: e.g. at the chemist's
from at the chemist's place or the like, She’s expecting from She's expecting a
baby. Coming? from Are you coming?, and Coming\ from I'm coming (the
latter two despite the fact that the subject of a sentence is not otherwise
zeroed in English). These words are reconstructible by the hearer from the
environment, whether or not they are the most frequent there. Their
meaning is assumed even in their absence, hence their presence contributes
no information.
A different kind of low-information zeroing is seen in sentence-segments
which engender in the sentence an event that expresses their information,
making them superfluous in the sentence. This zeroing is seen in the
performative segments Isay (but not / said ox He says) on a sentence, where
the saying of the sentence carries the same information as / say on it (5.5).
There are many kinds of evidence that a sentence as said is derived from its
form plus a preceding / say or the like. In the case of other performatives,
such as I ask, where again saying I ask on a sentence constitutes asking it,
one can consider ask to be derived from say plus as a question or the like,
with I say zeroable and as a question imposing the interrogative intonation.
Note that the performative is zeroable only in its performative occurrences:
92 Theory of Syntax
knew that Mary came: in the latter cases, quickly cannot refer to came. The
only simple and non ad hoc explanation is that the permuting of quickly
(from Mary came quickly to Mary quickly came and Quickly Mary came)
took place before knew operated on the pair John, came; hence this quickly
could not move into the then non-present John knew segment.
Furthermore, John slowly knew that Mary would recover is rare, and if it is
said its slowly refers to John's knowing, and not to Mary’s recovery; it is not
a permutation of John knew that Mary would slowly recover or John knew
that slowly Mary would recover. That is, if slowly is said on (1) John knew
that S, (where S, is Mary would recover), it can be permuted among the
arguments and operator in (1) but cannot enter inside an argument of (1).
Reductions can thus be located between fixed points in the semilattice of
word-entry—i.e. of operator-argument relations—in the sentence (6.3,
8.3).
Since reductions are carried out before any further operator acts on the
sentence, any further operator acts on the reduced sentence. This does not
affect the content on which the further operator is acting, since the reduc¬
tions are substantively paraphrastic, and indeed any zeroed material is
reconstructible and can be considered to be still present though in zero form.
But it may alter the formal conditions for the further operator, and indeed
some operators cannot act on particular reduced forms. If there are cases in
some language of an operator not being able to act unless its operand
sentence has been reduced, that would compromise the ability of the base to
carry all the information in the language (4.1, 11.3).
As to how reductions are ordered with respect to each other:
Two or more reductions may be applicable as a result of a single operator
entry. For example, when and operates on the pair John phoned Mary, John
wrote Mary (with sameness statements, as in 3.3.1, on both/0/2/7 and Mary),
it simultaneously makes both John and Mary zeroable in the second sentence,
yielding John phoned and wrote Mary. These two zeroings are independent
of each other; neither, either, or both may take place, with no indication of
their being ordered in respect to each other. Somewhat similarly, in (1) John
came home, which was because John was tired (with sameness statement on
John) we have available a pronouning of the second John (to John came
home, which was because he was tired), and independently of this a zeroing
of the which is (to John came home because John was tired or . . . because he
was tired), followed by a possible permutation to Because John was tired
John came home or Because he was tired John came home. In the former
case, pronouning of the second John is still available, now yielding he in a
different position: (2) Because John was tired he came home. (Pronouning
the first occurrence, as in They were as yet unsure of themselves, when the
doctors first suspected a viral cause, is so limited as to require more stringent
conditions than the second-occurrence pronouning discussed here.) It is
not necessary to state a time order for these events, which would require that
96 Theory of Syntax
pronouning be able to take place both before and after the permutation. A
least-restriction grammar can obtain the existing forms by saying merely that
both the pronouning and the which w-zeroing plus permutation are avail¬
able in (1) above until some further operator acts on the whole of (1). Such a
further operator is seen, for example, in Johns coming home, which was
because he was tired, interrupted our conversation. In (1), the permutation, if
it takes place, creates a new pronouning-possibility in John comes home, if
pronouning had not already taken place in because John was tired, yielding
(2). All we need say, then, is that the conjunction which brought together
John came home and John was tired, expressed in (1) by commas, admits two
independent reductions which are available, unordered among themselves,
until a further operator enters the sentence.
In contrast with these, there are cases of reductions which are ordered,
simply because the conditions for one of them to take place require the result
of the other. An example is moving an adjective to before the host, after its
which is has been zeroed, as in costs which are allowable to costs allowable to
allowable costs; the permutation requires zeroing of which is (there is no
which-are-allowable costs). Another example is zeroing a high-expectability
operator, or a classifier noun, after it has become the last (weak-stressed)
member of a compound: a quick drinking of coffee to a quick coffee-drinking
to a quick coffee (in Let’s stop for a quick coffee)-, a fly like a blue bottle to a
blue-bottle-fly to a blue-bottle.
A reduction is described as altering the shape of a word, even if to zero,
but nothing is said about its altering the presence of the word in the given
occurrence, and indeed the word continues to be present in the sentence.
This is evident in the meaning, where a blue-bottle is a fly. It is also evident in
various grammatical features, such as the presence of modifiers of zeroed
words, as in the momentarily of I expect John momentarily from / expect
John to come momentarily, momentarily is clearly on come.
In view of all this, it is not merely a figure of speech to say that reduction
does not affect the basic syntactic status of the words which have entered
into the partial order that creates a sentence. When we consider reductions
as variant forms of a sentence, then, the reductions do not add constraints on
equiprobability except, slightly, in the case of permutations. But if we want
to account directly for the observable combinations of words that constitute
sentences, then pronouning (which replaces many words by the same he, or
it, etc.), and especially zeroing (no matter what traces are left), have
complex effects on the departures from word-equiprobability. In any case, a
transformed sentence has to be described by describing its source sentence,
plus the change; if we try to describe the transformed sentence de novo, we
have to repeat for it almost everything that was said for the source sentence.
A Theory of Sentences 97
3.4. Linearizations
3.4.0. Introduction
In all languages, speaking is an event in time only, and thus overtly linear,
even though the linear form can contain simultaneous components, and can
even contain long components which are simultaneous with a sequence of
short ones/ The partial order established in 3.1 is descriptively prior to the
linear order; that is, it is independent of the linear order in a description of
language structure, whereas the linear order is not useful toward a structural
description without reference to how it embodies the partial order (or the
grammatical relations, which can be defined in terms of that partial order).
This is not to say that the partial ordering exists in time or in human activity
prior to the linear. There is no evidence against saying that sentences are
constructed as word-sequences, with the partial ordering bringing in an
additional degree of freedom (and so an additional dimension) in the choice
of words, recognized now by educated speakers as ‘grammatical relations’
among the sequenced words. Within the sequence, the partial order is
realized by two linear orderings: that of an operator to each part of its
argument, and that between co-arguments {John ate a fish being not the
same sentence as A fish ate John).
The relation of linear to partial order differs for speaker and hearer.
The speaker chooses certain words in a partial order which satisfies their
dependences, and says them linearly in time. Since much of the partially
ordered set, particularly the arguments of operators said so far, is available
to him when he is saying the linearly earlier words, there is room for his
making the kind of slip called anticipatory contamination, such as Spoonerisms
(as notoriously in our queer old dean for our dear old queen). The partial
order, which underlies the linear, suggests that slips which are unintentional
should appear chiefly in operator words drawing upon their linearly later
arguments, or in argument words drawing upon their linearly later co¬
arguments; this is indeed found to be the case. We would not expect slips in a
word to draw as frequently upon linearly later words that are also higher in
the partial order, for these higher words are not necessarily in the speaker’s
mind when he says the earlier word. Intentional (jocular) contaminations do
not show this distinction, since the speaker is then drawing upon a complete
sentence or segment.
The hearer receives the words in linear order of time or writing and has to
reconstruct their partial order as grammatical relations. In this he meets the
well-known problems of degeneracies, especially those due to homonymy
and zeroing. Upon occasion he also has to reinterpret an earlier word on the
3 Z. Harris, Methods in Structural Linguistics (pb. title: Structural Linguistics) AJnw. of Chicago
Press, Chicago, 1951, pp. 45, 50, 92, 117, 125, 299.
98 Theory of Syntax
basis of its linearly later operators or arguments, as in She made him a good
husband because she made him a good wife. Here the expectation that the
second made has the same partial-order relation to its following him as does
the first one is counteracted by the gender of wife: the first made him is from
made him into, the second is from made for him. Somewhat similar is
K. S. Lashley’s example, 'Rapid righting with his uninjured hand saved from
loss the contents of the capsized canoe’, where the hearer normally assumes
rapid writing until he hears the last words.4 In reconstructing the partial
order, as grammatical relations, out of the linearly ordered words, the
hearer uses the residual traces of each reduction which has taken place in the
partially ordered material. In addition, certain changes are made in respect
to the linear order: phonetic assimilation and dissimilation in neighboring
words.5
The fact that there is a normal linearization of the partial order does not
preclude there being alternative linearizations of it at different points or in
certain situations. Indeed, one of the advantages in describing sentences in
terms not of a direct linear ordering, but of a partial order which is then
represented linearly, is that a limited freedom in the linear representation
can account for important differences in sentence structure without con¬
stituting a difference in the grammatical relations which characterize the
sentence: In many languages certain parts of a sentence which are normally
not at its beginning have an alternative linearization at the front of the
sentence, without changing the grammatical relations: This I will say (or I
this will say) instead of / will say this. There is no evidence that this
alternative linearization (‘fronting’) is a permutation taking place after the
normal one has been formed; rather, it may be simply a different linear
representation of the partial order (here say as operator on I, this), used for
stylistic reasons or to make the front word stand out as the ‘topic’.
That which is subject to alternative linearization (i.e. that which is
linearized) is not simply particular words of the sentence, but rather
particular items of the partial order, as will be seen. Furthermore, certain of
these items have their adjuncts (modifiers, 3.4.4) and even their second
arguments with them in any alternative position. In English, a noun always
has its adjuncts (Long books I cannot read-, never Books I cannot read long);
an operator optionally (Read I will immediately, Read immediately l will).
100 Theory of Syntax
Prepositional operators always have their second arguments (Near the lab he
has a house, never Near he has a house the lab)-, verbs optionally (Sign I will
any and all petitions, Sign any and all petitions I will). The distant items do
not conflict with contiguity: arguments and their operators and their co¬
arguments all remain contiguous (except for later inserts). Since adverbs are
secondary-sentence operators on their host verbs, they remain next to either
the verb or the arguments of the verb (which were present with the verb
when the adverb acted on it).
The main alternative linearization in English is the front position of non-
first arguments and of tenseless operators: specifically, the second argument
(This I say), the third argument (as in On the table I put the book), or the
operator with its non-first arguments (Explain this l will not. Tired he was).
There is a rarer placing of the second argument before the operator (/ this
doubt), and a limited interchange of the two arguments without losing their
identity in the partial order (i.e. in the grammatical relations, as in This say
we all). In longer sentences there are certain partial-order positions which
do not have the alternative available to them: they are too far grammatically
(too complexly down in the partial order) to appear in the front. This applies
to certain positions in sentences which are second arguments of certain
conjunctions: for example, the last argument (Hiroshima) does not have the
choice of occurring up front in I didn't mention Nagasaki because of your
detailed description of Hiroshima-, but it has that choice when it is deep in a
chain of arguments, as in I remember his describing how they made the
decision to bomb Hiroshima, where one can say, if awkwardly, Hiroshima
I r^nember his describing how they made the decision to bomb.
Another set of alternative linearization has certain adverbs, especially
those (such as scarcely) which operate on the metasentential I say (5.5.3),
appear at the front of the sentence, followed by the tense: Scarcely had I
taken it. Slightly different is the order in Tired though he was, and in Here
arises an important question, and also in the moving of tense when certain
front words are dropped: Were he there l would go as compared with If he
were there l would go, and I ask: Will he return? from I ask whether he will
return f
3.4.3. Permutation
6 Both fronting and interruption (3.4.4), together with their semantic effects, can perhaps be
obtained without assuming a special added freedom by deriving them from adverbs on say
(5.5.3): e.g. This I will do from / say of this that / will do this, with zeroing of the second this
(3.3.2) and of Isay (5.5.1).
A Theory of Sentences 101
sentence which has been shortened can move to before a linearly preceding
item from the partial ordering (with any adjuncts, and possibly with its
arguments) usually on condition that the preceding item is longer, and above
all is not shortened to a pronoun. Thus John clamped the lid down, where
down is reduced from a secondary sentence John's clamping the lid was
down(ward) (secondary to John clamped the lid), has an alternative order
John clamped down the lid, provided the lid is not pronouned to it (no John
clamped down it). In contrast John slid down the pole, where down is simply
an Oon operator on the pair slid, pole, has no permutation for down, and
permits pronouning of pole (to John slid down it).
Somewhat differently, when the car which is new (where which is a
pronoun for the preceding car) has its which is zeroed, the residual new
moves necessarily to before car: the new car. Permuting the residue after
zeroing which is holds for all uni-morphemic adjectives and for stated other
words, including the compound-word structure (after zeroing the charac¬
teristic/or or the like, as in books for school to school-books, and grey like
stone to stone-grey). When which is is zeroed in / have decided to leave now,
which is because my time is up, we obtain I have decided to leave now,
because my time is up; and I have decided, because my time is up, to leave
now; also Because my time is up I have decided to leave now. Another
moving of a shortened conjoined sentence can follow zeroing under and: in
John phoned and Mary phoned, zeroing the second phoned (yielding John
phoned, and Mary) permits (and virtually requires) and Mary (the residue of
the second sentence) to move to a point where all that follows it is the
antecedent for the zeroing (namely, phoned): hence, John and Mary phoned.
This condition accounts for the other a/id-residues, e.g. John composes, and
Mary plays, light piano music; and it explains why there is no permutation in
John plays piano and Mary violin (because the above stated point, which has
to be before the antecedent plays but not before the non-antecedent piano,
does not exist).
A slightly different length-permutation permits a final verb to move to
before the interruption (3.4.4) on its subject: My friend whom I told you
about is coming to My friend is coming, whom l told you about. And under
some higher verbs, a verb can move to before its subject: He let the book fall
to He let fall the book. Calling these permutations is a descriptive convenience;
their histories can be different. And some apparent permutations are not
that at all (e.g. the passive, 1.1.2).
Language Structure:
The System Created by the Constraints
4.0. Introduction
The methods of 3.1-2,4 yield a base set of sentences from which the other
sentences of the language are derived. The dependence of 3.1, together with
the linearization of 3.4.1-2,4, characterizes the set of possible sentences; it
eliminates what cannot occur in the language. The likelihood gradation of
3.2 creates not-well-defined sets of ‘normal’ and marginal sentences, on
which the reductions of 3.3 operate. Separating out the different constraints
that contribute to the departures from equiprobability in the word-sequences of
all sentences distinguishes the base subset of sentences, in which reductions
have not taken place. This subset is structurally recognizable, because the
operator-argument relation in it has not been obscured by zeroing, reduc¬
tion to affix, length permutation, and other transformations. It is also purely
syntactic, without having morphology carry part of the syntactic functions,
since all affixes that can be derived from free words have been separated off,
leaving the vocabulary to consist of syntactically indivisible words. That is
not to say that one may not see internal construction, as in the con-, per-,
-ceive, -sist of conceive, perceive, consist, persist or even the -le of dazzle,
settle, etc., but that such constructions cannot be usefully derived from
separately functioning morphemes or words, joining in the operator-argument
relation.
Since reductions are optional, except as in 3.3.3 (where the missing
sources in the base can be created), the unreduced sentences remain in the
language by the side of their reduced forms: / prefer for me to go first and
I prefer to go first. And since reduction is paraphrastic, changing only the
shape of a word but not its existence or its operator-argument status in the
sentence, the sentence that results from it does not have different substantive
information than the sentence on which it acted. Hence the information of
reduced sentences is present in the unreduced ones from which they have
been formed. It follows that the base set of syntactically transparent
unreduced sentences carries all the information that is carried by the whole
set of sentences of the language.
106 Theory of Syntax
In the base, the syntactic classification of vocabulary is into the N, On, Ono,
etc. categories of 3.1. The syntactic operations in it are the operator-
argument partial ordering, with optional interruption by secondary sentences
and optional front-positioning of words. In English, operators carry operator-
indicators (the ‘present tense’ -s), which are replaced or overruled by
argument-indicators (-mg, that, for. . . to) when they become the argument
of a further operator. This holds for all sentences. Similarly, all sentences
can carry the metasentential I say with certain appended material on it such
as not, and, or, in query, in command, and also the ‘aside’ intonation which
attaches a sentence as secondary to its primary one. When an operator is
defined as acting on an operator (as its argument), it acts on the whole
sentence that the lower operator had created, since all operators in a
sentence carry their arguments with them (by 3.1), so that each creates
a sentence of its own. Given destroy on the pair bomb, atoll (yielding (1)
Bombs destroyed the atoll), then think on the pair we, destroy, yields We
think bombs destroyed the atoll (for plural, tense, the, cf. 8.2). Further
operators and transformations on the higher operator (here, think) do not
enter into the lower sentence, which remains as an unchangeable component
sentence in the larger one. Thus in Secretly we think the bombs destroyed the
small atoll completely, the completely, operating on destroy, can move only
in the contained sentence (1), and secretly, on think, can move only outside
the contained sentence. The completely cannot move into we think (no We
completely think the bombs . . . ), for any moving had to be done right after
completely entered its sentence (the component one), and we think had not
yet acted upon that sentence then.
In the base, few words are multiply classified, i.e. belong to more than one
of the N, On, etc. classes; most of the apparent multiple classification arises
in the derived sentences, and is due to zeroing. And few base sentences are
ambiguous, other than for homonymy and for unclarities in the meaning-
ranges of words; the great bulk of ambiguous sentences are due to degener¬
acies in reduction—to two different sentences being reduced to the same
form.
It follows from the above that this relatively simple grammatical apparatus is
adequate for all the substantive information which is carried in language.
This would not be so if the meaning of further operations depended on
whether the operand had been reduced, for then the meaning that required a
reduced form would not be available in the (unreduced) base. But this
apparently does not happen.
Certain situations present difficulties for the base, but can be resolved.
For example: as has been seen, the auxiliaries in English differ from all other
operators in that they act only on a subject-less verb: John can go first, but no
Language Structure 107
John can for him to go first, and no John can for her to go first, in contrast to
John prefers to go first by the side of John prefers for her to go first. The
description of can in the base was regularized in 3.2.4 by saying that like
other Ono operators it operates on a whole sentence, but one in which the
subject must be the same as that of can, and must then be zeroed: a non-
extant but well-formed John can for him to go first is then necessarily
reduced to John can go first. The cost of this derivation is that the source
sentence does not remain in the language, hence is not in the base, since its
reduction is required; the information of the reduced sentence cannot then
be expressed in the base. A resolution of this difficulty can be reached by
finding some word or sequence which is approximately synonymous to can,
and selectionally close to it, which we can consider a variant of can in this
form, and under which the above zeroing is not required. We can consider
that some such sentence as (1) John is able for him to go first, which is in the
base, occupies there the place of the non-extant (or no longer extant) (2)
John can for him to go first, while the reduced forms of both, John is able to
go first and (3) John can go first both remained in use. Such a resolution is
supported by the fact that historically some auxiliaries indeed were able to
have a repeated subject under them. A more general support is that change
in word use consists in principle not in a word being dropped or replaced
within a sentence, but in sentences which contain that word competing with
approximately paraphrastic sentences containing another word, and losing
out to them. Given (3) we have to posit a source (2), and if (2) does not exist
it must have been replaced by an informationally roughly equivalent sentence
such as (1).
This situation is particularly common in the case of affixes, which are
analyzed, to the extent possible, as being reductions from free words, mostly
operators, in the base. Even when such a reduction is historically evidenced,
the free word may have meanwhile been lost; affixes are less amenable to
historical replacement than free words. Thus, as noted, the -hood of
childhood was once a compound form of a free word had ‘state, condition’
which was lost from the vocabulary, whereas the affix occurrences of it were
not lost. But had was not simply lost. Sentences containing some other word,
such as state, came to be used more than corresponding sentences containing
had. When free had was no longer said, some word such as state remained in
its place, like a suppletive variant of it, so that the unreplaced -hood is now
the affixal reduction of state rather than of the old had. Such established
historical cases are rare. For most cases in English, saying here that a given
affix is a reduction of a particular free word in a particular position is not a
claim of history but a statement that the operator-argument relation of that
affix to its host is the same as that which the given free word would have in
the stated position, and that the selection (relative likelihoods) of the affix,
as to host and to further operators on it (and so its meaning), is approx¬
imately the same as that of the given free word in the stated position.
108 Theory of Syntax
From the base sentences, the reductions (3.3) create the other sentences of
the language, which can thus be considered to be derived from the base.
These reductions are changes in particular word-occurrences, not recastings
of the whole sentence. As noted, they are made on the basis of the likelihood
of the word in respect to its argument or operator in the sentence, and they
are made as soon (in constructing the sentence) as these conditions are
satisfied. Reductions are thus located at specified points inserted in the
semilattice of the source sentence which they contain (8.3). Since reductions
are only changes in shape, the unreduced sentence is not merely a source,
but is actually contained in the reduced sentence. Although the types of
reduction are similar in many languages, the specific reductions differ; and it
is here that we find the main difference among individual language structures.
110 Theory of Syntax
When a reduction is optional, each sentence it creates is an image of a
source in the base. When a reduction is required, each sentence claimed to
result from it (e.g. one containing can) has a source in the base only if we can
assign to the reduced word (e.g. can) a suppletive word in the base (e.g. is
able to). There are also cases where the reduced form and its source both
exist, but intermediate reduced forms do not. Thus He rotated the globe on
its axis (by the side of The globe rotated on its axis) can be derived regularly
from a source He caused the globe to rotate on its axis, going through length-
permutation to He caused-to-rotate the globe on its axis and compounding-
permutation (3.4.3) to the non-extant He rotate-caused the globe on its axis,
where zeroing of the broad-selection last element of the compound (cause)
yields He rotated the globe on its axis. In such cases we need only say that
intermediate steps in the succession of transformations are required, in this
case the zeroing after the compounding; the compounding itself was not
required. That is, what is optional here is the product of steps, not each step
alone.
The reductions leave room for degeneracy, producing ambiguities: two
different base sentences, each undergoing different reductions, may end up
with identical visible word-sequences. For example, the ambiguous John has
shined shoes is obtained from two sources: (1) John has the state of John's
shining shoes, via repetitional zeroing of John's and reduction of the state. . .
ing to -ed, this being roughly the historical source of the ‘perfect’ tense
(G290); (2) John has shoes; these shoes are in the state of one’s shining them,
via zeroing of the indefinite one's and of the repetitional them, and reduction
of the state of.. . ing to -ed (this is offered as the source of the passive, 1.1.2),
and reduction of semicolon plus these shoes to which (yielding John has
shoes which are shined) followed by zeroing of which are and permutation of
the residual shined to before the host. All these are widespread and well
attested English reductions. The two derivations admit different reductions
of further operators: for example, (1) appears in John has shined his shoes;
(2) in John has well-shined shoes.
If a given sentential word-sequence has more than one source from which
it can be derived by known reductions, then for each derivation the given
sentence has the meaning of the corresponding source. The ambiguity of
such a sentence is not a matter of vagueness, nor of a broad range of
meaning, but of a choice between two or more specific meanings, those
of the specific source sentences.
There may be other disturbing situations in deriving sentences from the
base. One is that the reduced sentences may lose their semantic relation to
their source. This can happen if a reduced word, for example a word with an
affix, has come to be used with different likelihoods for the operators on it,
no longer the likelihoods (selection) of its component words: e.g. romance
‘story’ from ‘in Roman (language)'. To the extent that this happens, the
reduced sentences can carry information that is unavailable in the base
Language Structure 111
1 The sentence-generating system ofch. 3 does not suffice directly for an analysis (‘recognition’)
procedure, which requires an orderly accounting of all similarities among the generated forms.
A computer program using a detailed grammar of English, based on the contiguity of
grammatical constructions, is in operation, analyzing scientific texts in English and some other
European languages, and transforming them into informational data bases: N. Sager, Natural
Language Information Processing, Addison-Wesley, Reading, 1981 An algorithm, with the
dictionary written in the form of the algorithm, has been produced by D. Hiz. An algorithm for
the operator-argument recognition of sentence structure has been developed and implemented
by S. B. Johnson.
112 Theory of Syntax
This is metaphor, which can be derived from two sentences under the
conjunction as, provided everything beyond what is needed for one sentence
is zeroable (indefinites or classifiers). It is seen for example in / can see his
wanting to go (where see appears as Ono instead of its base status as Onn)
from (roughly) I can treat his wanting to go as one’s seeing things (via the
artificial / can treat-as-seeing his wanting to go, G405); in the source, see is
correctly Otm. Such a derivation explains for example how one can say The
car smashed into the wall without getting smashed (or: without smashing): we
obtain The car smashed into the wall from The car moved into the wall as a
thing's smashing (via the non-extant The car moved-as-smashing into the
wall), to which without (the car) smashing can be adjoined. It also explains
how one may say He peppered his lecture with asides but not He peppered his
lecture, though one can say He peppered his soup (from He put pepper into
his soup similarly to the zero causative, 3.3.2). The source is He put asides
into his lecture as one's putting pepper into things, which is reducible to the
non-extant intermediate He put-as-putting-pepper asides into his lecture,
thence into He peppered asides into his lecture, whence He peppered his
lecture with asides (G369). (All these are elsewhere attested zeroings in
English.) The fact that as is involved in the derivation can be seen in such
forms as the French Le juge a ravacholise X, where one would not say
(except perhaps as a joke) (1) Le juge a ravacholise Ravachol: we see that
ravacholiser is ‘to treat as one treated (thfe executed anarchist) Ravachol’
(which cannot be said of Ravachol himself as in (1)); it is not ‘to do what was
done to Ravachol’ (which could be said to Ravachol, and which therefore
would permit (1)). The source here would be in English The judge treated X
as one's treating (i.e. as in the treatment of) Ravachol (G406).
contains second- level operators in addition. The base words of the language
(excluding whole-sentence words such as Hello) fall into a three-level
partial order in respect to what they require in a sentence. And the words of
base sentences constitute a semilattice in respect to their arguments in their
sentence (6.3, 8.3). The reductions in a sentence are locatable as additional
points in that semilattice.
Since operators appear in sentences only given the presence of their
arguments (before reduction), it follows that when a second-level operator
appears with its argument, which is itself a first- or second-level operator,
the arguments of that first- or second-level operator are also present. Hence
the sentence which is made by the first- or second-level operator is imbedded
in the sentence made by the higher second-level operator. Thus every
sentence consists ultimately of one or more elementary sentences, possibly
with second-level operators on them. And since reduction and transformation
only alter the phonemic shape or the position of words, a reduced sentence
contains its (usually longer) unreduced source. The contained sentence is
the same as it is in its independent occurrences, since the further operators
on the words and thus their meaning is largely unchanged—aside from the
fact that the meaning of certain words, and of certain reduced word-forms,
differs under different operators (11.1). Various word-sequences may remain
as idioms: irregular reduction (the more the merrier), specialization of
meaning (kick the bucket), etc.
It should be understood that saying that every non-base sentence contains, or
is derived from, a base sentence does not mean that such a base sentence was
in use first and then reduced. / like tennis is derivable from I like for me to
play tennis or from I like people’s playing tennis (by zeroing play as high-
frequency operator on tennis, and people as indefinite argument, or me as
repeated argument). This is so not in the sense that the base forms were said
first, or that the speaker thought first of the base form, but in the sense that /
like tennis says only what the two base forms say more regularly (unless the
environment supports other zeroings). And to say the four-color problem
does not imply that anyone first said the problem of four colors, only that the
latter is sayable in the language with the same meaning, and that if the
former was not indeed reduced from the latter then it was built on the model
of compound words which are reduced in this manner (a four-door sedan
from a sedan with four doors).
All sentences created by the partial ordering and reductions can in
principle be said in the language. Some are extremely rare, either because of
very low likelihood of a word combination, or because a reduction is heavily
favored as against its source (e.g. / want to win as against / want for me to
win). Since the constraint of likelihood (3.2) is defined on the resultant of
3.1, it follows that no matter how low is the likelihood of a particular word in
respect to a given operator on it, the likelihood is presumed never to reach
116 Theory of Syntax
zero;2 for if it did, we could not say that the occurrence of words in sentences
depends only on the dependence properties of other word-occurrences
there. As was argued in 3.2.1, this accords with the data about language,
since even those members of an argument class which are least likely to
occur under a particular argument (say, vacuum under cough) cannot be
entirely excluded from what is possible. (But outside the base, required
reductions may preclude the visibility of words, as under the auxiliaries,
3.3.3.)
Certain other properties of language are due to the reductions. First, the
base, i.e. unreduced, sentences have the least constraints on word com¬
binations. This arises from the fact that each reduction, defined as operating
on some operator-argument or sentential condition, has as domain either
all sentences containing that condition or else a proper subset of these,
where stating a proper subset constitutes a restriction on word-occurrences.
Hence the set of sentences resulting from a reduction cannot be less
restricted than the set existing before the reduction is defined.
Second, the phonemic possibilities in reduction leave room for the
creation of morphology, as a structure (and hence a restriction) beyond what
is found in the base. In many but not all languages, certain reduced words
are attached as affixes (mostly operators attached to their arguments), thus
creating pluri-morphemic ‘derived’ words, which do not exist in the base.
Third, reduction of certain operators to zero makes their arguments seem
to take over the operator’s status (and vice versa): e.g. the apparent noun
take in today's take, derived from (roughly) the amount of today’s taking
things, or the apparent verb piece in to piece together, reduced from
(roughly) to put the pieces together. Such reductions also cause an apparent
‘lexical’ shift in the meaning of the words (as between the take and to take, or
between to piece and a piece), although the word can be considered to have
the same range of meanings in both environments, with the difference being
merely the meaning of the zeroed word (amount, put)—zeroed in its
phonemics but still present in grammar and meaning.
Lastly, many reductions create degeneracies (ambiguity) in sentences,
when two reductions, acting on two sentences which differ in form and
meaning, produce the same sequence of words.
There are also certain properties which follow from linearization. The
alternative linearizations create the syntactic relation of modifier. As noted
in 3.4.4, if we interrupt a sentence after a particular (‘host’) word, and if the
interrupting sentence begins with the same word, the two occurrences of the
word are successive (neighbors) so that the second can be pronouned
As noted in 3.2.4 and 4.1, there may be cases of an argument which is excluded under a
given operator (e.g. can). The generality of 3.1 may then be salvaged by finding synonymous
words in the base which are free of this exclusion and which serve as variants of the restricted
word.
Language Structure 117
position of higher operators are generally the same as for lower operators, as
noted above (so that, for example, the subject precedes the verb both in The
refugees crossed the border and The refugees' crossing of the border angered
the government); and (2) a secondary sentence, given that it contains an
argument that has already appeared in the primary, can interrupt the
primary right after the repeated word, so that it creates a modifier (relative
clause) that is contiguous to the antecedent (host) word (so that, for
example, the phrase refugees from El Salvador occupies the same position as
the single word refugees in The refugees crossed the border, The refugees
from El Salvador crossed the border). In contrast, the interruptions that do
not form relative clauses do not merge into the previously existing sentence
structure, as in Tomorrow—so I hear—will be cold, Tomorrow will—so
I hear—be cold.3
A fourth condition met in languages, which is related to the least-structure
condition, is that in various respects the structural possibilities of a language
are not fully utilized. For example, both the partial ordering and the
linearization make it possible to state an address for each word in a sentence.
However, languages do not use complete addressing systems, and the only
cases in which the location of a word is cited are those where the location can
be identified with the aid of a few simple addressing words such as ‘prior’ (for
relative pronouns), ‘co-argument’ (for -self, as in John washed himself),
‘same partial-order status’ (for corresponding-position zeroing, as in John
plays piano and Mary violin). Another example is the fact that every
language admits only a small choice of reductions, leaving unreduced many
a word that could have been reduced, as having little information in respect
to a given operator: thus to be here or the like is zeroable under expect (/
expect John to be here), and books or the like is zeroable under read (John
reads a lot), but clothes is not zeroable under wear (no The natives wear).
A fifth condition, which may be due to the independence of selection from
phonemic form (5.4), is the possibility of various similarities among words,
in their meaning-range, phonemic form, or reduced form, sufficient to
create structural details in various languages, beyond the major structure
created by the fundamental constraints. There are frequently used reductions
(or combinations of reductions) in particular kinds of word-sequences. Such
situations create specialized sentence types: e.g. those that result in the
passive form, or in metaphoric uses of words (4.2, end), or in new conjunc¬
tions such as because,provided,providing (below). Similarities in treatment
of I, you, he, etc., or of secondary sentences that indicate time relations, can
create in many languages ‘person’ (subject, object) and ‘tense’ affixes which
get attached to operators. Such constructions as tenses and subject-object
affixes in verb conjugations (or possessive affixes on nouns) can become
' A widespread condition is the saying of incomplete sentences. These fragments generally
turn out to be not merely arbitrary combinings of words, but specifiable segments of sentence-
construction.
Language Structure 121
required combinations (on ail verbs, or on all nouns) and thereby take on
great importance in the grammar of a language.
The high likelihood of particular operators to occur with particular time
modifiers can lead to subclassification of operators into verb, adjective,
preposition (4.2), and within verbs into perfective (durative) and imperfective
(momentaneous) verbs (3.2.5)—morphologically distinguished in some
languages but not in others. Similarly, prepositions, quantity words, and
other frequent additions can be reduced to such common affixes as cases and
plural on elementary arguments, and even on higher-level arguments,
creating nouns and ‘nominalized sentences’ (sentences appearing as argu¬
ments). Here too differences of likelihood in respect to plural, etc., can
make a distinction in some languages between ‘mass’ and ‘count’ nouns.
There are also similarities and differences in the (morphophonemic) form
of these important affixes on various verbs, or nouns, yielding different
conjugational and declensional subclasses of verbs and nouns; some may be
associated with meaning-differences (e.g. declensions for gender), others
may have no semantic association (e.g. conjugations). In many languages,
these morphophonemic similarities do not cover all verbs and nouns, leaving
various exceptions (‘strong’ verbs, etc.). Such syntactic constructions can
have great weight in the structure of an individual language or language
family.
Finally, two processes are found in language which are quite different
from those involved in chapter 3. One is analogy: the saying of a word in an
operator-argument combination on the basis of another word, which has
some correspondences to the first, appearing in such a combination. This is a
widespread process, restrained by its own conditions, and almost always
dependent on the prior existence of a grammatical construction on the
model of which the new form is made. Because of this dependence, many
forms which historically have resulted from analogy can be analyzed as
though derived by regular reductions from existing sentences: the only cost
may be extending the selection of a word, or extending the domain of a
reduction.
The other process is change through time. All languages change. These
changes are primarily of certain types. One type is a slow shift in the basis of
articulation, which changes certain sounds, and in particular situations
changes their phonemic relations. Another type is of a word or affix
extending its selection, sometimes into that of another, so that it competes
with the other for the same selectional and semantic niche, and may replace
it. Yet another type is extension of the domain of a reduction (i.e. which
words are subject to it), or of its conditions or frequency of application. And
there are changes due to the borrowing or invention of new words. These
changes are largely related to the structure of the existing language, so that
major alterations in the structure of language are rare indeed.
The change in selection, i.e. in which words have higher than average
122 Theory of Syntax
Metalinguistic Apparatus
within Language
5.0. Introduction
Two features underlie this capability of language. One is the fact that
phonemes are types and not merely tokens; that is, the set of word-sounds is
effectively partitioned (by distinctions, which create types) into phoneme-
sequences (7.1-2). The other is the lack of any relation, in respect to
meaning, between phoneme and words; that is, the meaning of words is not
referable to any meaning of their phonemes.
As to the first: the status of phonemes as types means that every occurrence
of a word-sound can be named, as being a case of one known phoneme or
another. We should not take this property for granted, as something which
holds for every system that could seek to represent (as language does) the
information of the world. The objects and events of the world could be
referred to not only by words but also by, say, pointing at them, or drawing a
picture of them. However, the latter kinds of representation cannot be
reused, to represent the representations themselves. True, we can point at
(someone) pointing, or draw (someone) drawing; but we do not have a
general way of uniquely pointing at all pointings which are specifically at a
given set of things, or drawing all drawings of a specific set of things. In
contrast, we can take all word-occurrences that name a given set of things
(say, the word chair for certain seats or the word bench for others, or walk
for certain perambulations or stroll for others) and we can uniquely identify
124 Theory of Syntax
The basic metalinguistic sentences, from which all others can be constructed,
are made of the names of phonemes with operators on them. The operators
specify the sound-differences among the phonemes, their location relative
to each other in utterances of the language, and which phoneme-sequences
constitute words in the language. These metalinguistic sentences are not
about sounds as such, for individual sounds (tokens) are in general evanescent
events which cannot be named—that is, cannot appear as words—unless
they have been classified into fixed types. But the names of the phonemes
and their classes (e.g. the word phoneme itself), and the names of those
phoneme-sequences which satisfy the conditions of being a word, are all
words or word-sequences, of zero level (i.e. primitive arguments), and they
occur under a particular selection of operators. We then have sentences such
Metalinguistic Apparatus within Language 125
as: Speech sounds with property A (or differing by A from other speech
sounds) are called occurrences of em; Em is a phoneme, written m; Some
phoneme-sequences constitute words', Mary is a word'. The letter-sequence m,
a, r, y constitutes the word Mary. The metalinguistic sentences are thus
sentences of the language itself: C precedes a in the word cat (or: Cee
precedes ay in the word composed of the letters named cee, ay, tee) is not
structurally different from The band precedes the marchers in the parade.
The metalinguistic sentences are a separate subset, and a part of a meta¬
language, only in that their elementary arguments refer to segments of
utterances, ultimately to phoneme-sequences. Thus the arguments used in
metalinguistic sentences are mentions (i.e. names) of the words (and
phonemes) used in the sentences of the language. Other than this the
metalinguistic sentences are themselves sentences of the language.
Given sentences which identify particular phoneme-sequences as con¬
stituting words, we can have sentences whose arguments are the words thus
constituted. Such sentences can state (morphophonemically) that different
phoneme-sequences constitute the same word, as alternative forms of it
(e.g. is and am, or initial e and initial iy in economics), or that a single
phoneme-sequence can, in different occurrences, constitute different words
(homonyms). Such sentences can also state the location of particular words
in respect to each other in utterances of the language, and also their
meaning, in terms of other words, either in isolation or in particular
combinations.
A phoneme is defined for the utterances of a given language (or possibly a
specified set of languages), but—crucially—not for sounds outside language
or in other languages. This restriction of domain holds necessarily for
metalinguistic sentences, which have been described above as sentences
whose arguments ultimately are phonemes, or phonemic distinctions. When
the domain of utterances is unspecified, the metalinguistic statements can
only assert the existence of phonemes, words, and their combinations.
When the domain is all utterances of a language—all that are observed or all
that are predicted—or a specified subset of them, we can state regularities of
form and of occurrences within the domain. These regularities may be
of phonemic shape, as in similarities or transformations among word-
occurrences; or they may be of word combination, as in defining operators
and arguments. We thus reach statements about phoneme classes, word
classes, grammatical structures, and types of meaning for words and sentences.
A new situation arises when the metalinguistic statement is made about a
particular utterance. Such statements can be used freely, as in John is
brash—and that last word is an understatement. The utterance in question
has to be identified—that is, cited—within the metalinguistic statement; the
utterance cannot be referred to unless it (or a pronoun for it) is grammatically
included in the metalinguistic statement itself. This means that the utterance
in question and the metalinguistic statement about it form together a larger
126 Theory of Syntax
hearer knows (3) or the like, so that for him (3) can be zeroed. Indeed, any
two sentences can be acceptably conjoined it we add a repetition-supplying
sentence. We can now consider a similar treatment for the structural
properties of single sentences. Clearly, all the properties of a given sentence,
including definitions of the words, can be stated in metalinguistic statements
about that sentence which are adjoined to it: e.g. (1) John bought a book,
where the word John is first argument of the Om, word bought and the word
book is second, with John being the name of a man, bought meaning
purchased, and book meaning a printed volume. (An n-fold ambiguous
sentence would have n alternative sets of metalinguistic statements adjoin-
able to it.) When the grammatical status and the definition of each word is
understood by the hearer, the conjoined metalinguistic sentences which
state them are zeroable. Evidence for their presence and their zeroing may
be seen in the following observation: virtually any arbitrary phonemic
sequence satisfying English syllabic constraints becomes an English sentence,
acceptable and understandable to the hearer, if it carries adjoined meta¬
linguistic statements identifying its parts as words satisfying their argument-
requirement: e.g. stating that its initial sub-sequence is the name of some
foreign scientists, its final sub-sequence is the name of some new gas, and the
sequence between these (which would have to end in 5, or ed, or t) is a
scientific term for some laboratory operation perhaps derived from the
name of the scientist who developed the operation. In this last case,
differently from (1), the grammatical and definitional statements are pre¬
sumed not known to the hearer and hence not zeroed, as they can be in (1).
It follows that all sentences can be thought of as originally carrying
metalinguistic adjunctions which state all the structural relations and word
meanings necessary for understanding the sentence, these being zeroed if
presumed known to the hearer. Every sentence, such as in (1) and (2) above,
is thus a self-sufficient instance of what is permitted by the metalanguage
(the grammar and dictionary definitions of the language). We can thus
append to a sentence in a language all the metalinguistic statements necessary
for accepting and understanding it, with the whole still being a sentence of
that language. This is possible because the grammatical and dictionary
descriptions of a language are stated in its metalinguistic sublanguage (10.2),
somewhat as in computer programs from von Neumann’s original plan and
on, the instructions have the same computer structure as the data on which
they operate.
One might ask where these metalinguistic sentences about a sentence are
adjoined, since many of them disappear in the course of constructing the
sentence. The observations of 3.3.4 and the considerations of 8.3 suggest
that they enter into the making of a sentence as soon as the words they are
speaking about have entered the sentence (since the reductions which some
of these engender are carried out then). This clearly holds for those
metalinguistic adjunctions which are made known to the hearer only by their
128 Theory of Syntax
being carried out, namely permutations and reductions, including those that
are based on a sameness statement (e.g. pronouning): these adjunctions are
replaced morphophonemically by the act of carrying them out. In the case of
the relevant grammar and dictionary statements, which are in most cases
known to both speaker and hearer, the issue of their location is moot since
these are zeroed on grounds of being known.
5.3. Reference
5.3.0. Introduction
5.3.1. Repetition
Repetition may occur in a position where it has high likelihood but where
the first occurrence (the antecedent) is not in a fixed favored position in
respect to the repetition. In such cases, the repetition can be reduced to a
pronoun rather than to zero. For example, given Oao under which repetition
is expected, but without relative positions being specified (9.1), we find e.g.
The mail did not come though everyone was awaiting the mail reducible to. ..
though everyone was awaiting it (but not to everyone was awaiting). And
under and, where repetition has highest expectancy in the corresponding
grammatical position, any repetition in other positions is reducible only to
pronoun: The play was good and the audience liked it. But when repetition
occurs in the fixed high-expectancy position in respect to the antecedent, it
can (and under some operators must) be reduced to zero. Under and this
happens when the grammatical position of the repetition is the same in the
second sentence as that of the antecedent in the first: The play was good and
the play was applauded is reducible to The play was good and it was
applauded and also to The play was good and was applauded. Under O00,
everything after the tense is zeroed in the second sentence if it is the same as
in the first sentence: / will support John for the job if you will support John for
the job is reducible to I will support John for the job if you will. Under O,,,, (as
above), if the second argument has the same referent as the first, it is
reducible to pronoun plus -self: The man heard himself on the tape (from
heard the same man). But under particular Onn where sameness of the two
arguments may be especially frequent, it is also reducible to zero: The man
washed himself and The man washed. In the case of the Ono, each operator
has a particular relative position for zeroing repetition (3.2.3). For example,
/ promised him to go is from for me to go, but I ordered him to go is from him
to go, while 7 offered him to go can be from either. But under Oon, nothing is
reduced to zero: the second argument of surprise is not zeroed either in His
meeting her surprised him or in His meeting her surprised her.1
antecedent and its repetition are already next to each other, we must judge
that what makes it possible to reduce the repeated word to a wh- pronoun is
its nextness to its antecedent, where the fact of repetition is most obvious
and the naming of the antecedent location is easiest: the preceding word.
Somewhat similarly, in the zeroings, under Oao, Ono, and Onno (above),
which have a fixed position for their antecedent, that position is easy to
recognize and state in terms of the basic grammatical relations (the partial
order of dependence): the corresponding position under O00, and a fixed
semilattice-relation of lower to higher argument under O,lo and Onno.
Thus the only referentials whose antecedent is always at a fixed relative
position (-self, relative pronouns, and zeroed repetitions) are ones for which
the antecedent position is easily indicated, linearly or in the partial order.
This suggests that the ability to reduce the repetition in these cases was based
on this ease of locating the antecedent. That means that in the formation of
referentials, i.e. in the reducing of repeated material, the development of
the language and also the recognition by the hearer do not use the general
partial-order and linear addressing systems that are inherently possible in
language. If either or both of these addressings were usable over their full
range by the speaker and the hearer, every phoneme- and word-occurrence
could be identified and referred to, including the position of any antecedent
relative to the referential. There would then be no reason for referentials
with fixed antecedent to occur only in situations where the location of the
antecedent relative to the referential is especially easy to state and to
recognize.
Conversely, we must judge that the other referentials, whose antecedent
is not limited to a single position, indeed do not refer to any particular
position (which, as seen, could not be addressed or named within the use of
language as we see it). For example, in John prefers that he be appointed
now, the he can be a repetition of or refer to the preceding John or any other
human masculine singular noun in a nearby preceding sentence. One might
think that these free pronouns refer to a disjunction of all preceding
appropriate nouns: either to John, or else to the masculine singular noun
before it, or to the one before it, and so on. But each one of these
alternatives (except the first John) would have to be identified in a complex
address utilizing the apparatus of a general addressing system. Since no fixed
antecedent uses the (in any case, non-existent) large addressing apparatus,
we have to judge that the non-fixed antecedents are not simply a disjunction
of fixed antecedents. All this means that we have to understand such free
pronouns as he and it to be not a reduced repetition of one or another
preceding noun but a word meaning 'aforementioned (human) male' (or:
‘male mentioned nearby’), 'nearby-mentioned thing’, etc. This, even though
the speaker knows which word is the antecedent, and the hearer may guess it
on some grounds.
Metalinguistic Apparatus within Language 133
5.3.3. Sameness
Up to this point, 5.3 has analyzed the cross-reference of words and zeros
(word-absences) referring to word-occurrences in the same discourse. There
remain a few pronouns which are demonstrative or deictic, i.e. refer overtly
to an object or situation not mentioned in the text but present to the speaker
or hearer: e.g. That is a beautiful tree, This tree is beautiful. These can be
derived from the already-established intra-text cross-reference pronouns by
assuming a zeroable indefinite noun attached to the zeroable metalinguistic /
134 Theory of Syntax
say (5.5.1): e.g. I say of something here that this is a beautiful tree (where this
is a cross-reference to something here), or / say of something at which we are
looking (or of which we are speaking) that that is a beautiful tree.
Such words as here and now can be derived from at the place in which I am
speaking and at the time at which l am speaking. And /, you can be
considered reductions of speaker, hearer when these words occur in a
discourse as repetitions from a zeroable The speaker is saying to the hearer,
which one might presume to introduce every discourse (5.5.1): / will phone
you as though derived from a reconstructed The speaker is (hereby) saying to
the hearer that the speaker will phone the hearer.
Before analysis, the kind of meanings of all these words differ from the
kinds of meanings of the other words of the language. In the other words, the
meanings or referents may vary with the environing words (the meaning of
child in a child of his time as against a child prodigy), but they do not vary
with the time or place in which the sentence is said. But in the words
discussed in 5.3.4, the referents vary with the particular bit of speaking:
which tree is this and who is meant by I depends on the speaking situation.
The information of demonstratives, including the tenses (below), is for this
reason unique also in not being transmissible beyond face-to-face com¬
munication. The derivation from an Isay component underscores, and does
not eliminate, this property. Nevertheless, it is a gain for the compactness
and simplicity of grammatical description if we can eliminate the new kind of
meaning required for 5.3.4; and this can be done by assuming a zeroable I
say of some thing that that... (as source of demonstrative that, this), or The
speaker is saying to the hearer at the place and time of speaking that.. . —» here
and now. It will be seen in 5.5.1 that there are many grammatical reasons for
assuming such an introduction to be implicit in each discourse or sentence
that is said (not merely composed). Note that in sentences that are composed
but not really said (e.g. in grammar exercises), I, now, and that tree, and the
like do not have specific referents, as they do in each said sentence.
The derivability of tenses from before, after connecting a said sentence to
the I say operator on it (G265) means that the tenses are equivalent to
demonstrative time-adverbs.
5.4. Use/mention
We now consider the second feature that underlies the internal meta¬
linguistic capacity of language (5.1), namely, the fact that the meaning of a
word is not referable to any meaning of its phonemes. The independence of
word meaning from its phonemes is necessary, for language, for if the
meanings of words were composed regularly out of the meanings (or the
effects upon meaning) of their phonemes, the number of distinguishable
word meanings would be limited by the combinations and permutations that
one could construct, in reasonably short sayable sequences, out of the
meaning contributions of the twenty-odd phonemes that the language has.
Given this necessary independence, what we-say about a word (i.e. its
selection) when it is taken as indicating its referent differs from the selection
of operators which appears on the phoneme-sequence of that word: e.g.
Mary phoned me, but Mary consists of four phonemes. Note that one cannot
pronoun a word’s occurrence under the one selection from its occurrence
under the other (5.3.3): we do not pronoun John and Tom to make / saw
John and Tom whose second phonemes are identical, but we do in l saw John
and Tom, in whose names the second phonemes are identical.
The sameness statements do not support pronouning when the selections
and meanings of a repeated word are sufficiently different, as in play of 5.3.3
or heart below. Going beyond that, when certain occurrences of a word have
a coherent selection totally different from that of other occurrences of
that word, we say that those two sets of occurrences belong to different
homonymous words: e.g. the occurrence of sound in the senses of ‘noise’ as
against ‘strait’ and ‘healthy’; but no homonym for the occurrences of heart in
the senses of ‘organ’ and of ‘center (as of lettuce)’ (which we consider to be
different meanings of a single word). In the case of the two sets of operators
on Mary, we hesitate to say that, for example, (1) Mary in Mary has (or:
consists of) four letters, which is not pronounable from (2) Mary in Mary
phoned me, is simply a homonym of (2) in the way that applies to the various
occurrences of sound, for then every word would be a homonymous pair.
136 Theory of Syntax
That is, we do not want to say that Mary is written with a capital m and Mary
phoned have two homonymous words Mary. A more special treatment is
clearly required here.
Aside from the difference in selection between the phoneme-sequence of
a word, and a word as bearer of a referent, there are occasional other
differences. On the one hand, there are cases of different occurrences of the
same phoneme-sequence constituting different words as referent-bearers:
the homonyms, above. On the other hand, there are cases of different
phoneme-sequences constituting the same word. This happens not only in
what are recognized as suppletive variants of a word (e.g. is, am as variant
forms of be) but also in what are considered alternative pronunciations of a
word. For example, the semantically single word written spigot consists of
two alternative phoneme-sequences, one with medial phoneme g and one
with medial phoneme k (and with older spelling spicket). Both this problem
and that of the homonymous pairs can be avoided if the underlying sentences
are taken as, for example. The phoneme-sequence that constitutes the word
Mary consists of (or has) four letters, reducible to Mary has four letters, as
against The object referred to by the word Mary phoned me (or, for that
matter, has four letters to mail), reducible to Mary phoned me. (The
evidence that what was reduced was indeed the phoneme-sequence is that the
predicate on it, consists of four letters, is in the selection of phoneme-
sequence.) In the reduced forms, the distinction is expressed by saying, in
Quine’s terminology, that in the latter case the word Mary is ‘used’ in the
sentence, while in the former case it is only ‘mentioned’: ‘mentioning’ a
word is ‘using’ its phonemes. These useful terms isolate and name the
distinction; to derive or explain the distinction grammatically, we need
underlying sources as above, while noting that the zeroing of the phoneme-
sequence that constitutes leaves quotation marks as its trace (in writing:
‘Mary' has four letters). This derivation explains why a mentioned plural
noun is singular: Bookworms is on p. 137 in this dictionary (from The
phoneme-sequence that constitutes the word bookworms is on ... ); contrast
Bookworms are all over in this dictionary (the latter from The objects
referred to by the word bookworms are . . . ). The derivation above also
explains why one does not say Mary has a little lamb and four letters, which
would be a possible zeroing from Mary has a little lamb and Mary has four
letters, but not from The object referred to by the word Mary has a little lamb
and the phoneme-sequence constituting the word Mary has four letters. The
importance of this explanation is that it remains within the universe of word-
occurrences and their reductions, without appealing to such considerations
as whether Mary is ‘taken’ as a meaning or as a spelling. When the source
sentences are reconstructed, the use/mention distinction, and the Bourbaki
‘abus de langage’, becomes redundant.
As to the failures in one-to-one correspondence between phoneme-
sequence and word-referent, they can be covered by appropriate underlying
Metalinguistic Apparatus within Language 137
5.5.0. Introduction
A distinction related to use/mention appears under the word say, which has
a unique status in language structure. In various languages, say appears with
two forms of second argument (object): He said (or: Isay) ‘Mary is here’-, He
said (or: I say) that Mary is here. There are syntactic differences between the
two forms. Under the bi-sentential operator but, which requires in principle
two contrasts between its arguments, we can find I said 'Franklin Roosevelt
died in 1945' but he said ‘President Roosevelt died in 1945’; but we would not
normally find (even with contrastive stress on President) I said that Franklin
Roosevelt died in 1945 but he said that President Roosevelt died in 1945. We
ask if these two forms of the argument of say can be derived in a way that
would explain why the two forms differ in respect to but. With an eye to the
use/mention sources of 5.4, the obvious sources to propose are: for the first,
1 said the words (or word-sequence) Franklin Roosevelt died in 1945 but he
said the words President Roosevelt died in 1945 (with the words replaced by
intonation of quotation marks); and for the unlikely second, 1 said words
which mean that Franklin Roosevelt died in 1945 but he said words which
mean that President Roosevelt died in 1945 (with zeroings of words which
mean), where any form of that name and indeed any synonym of it or of died
could have been what was actually said. In the second source, but is clearly
less likely than in the first. In both of the proposed sources, the second
argument of say is word, thus eliminating the problem of having two forms of
the argument (above).
The relation between a quoted sentence and that plus the sentence is also
seen in the difference between Yesterday in Boston the candidate said to the
strikers 7 have come to support you here and now' and ‘ Yesterday in Boston
the candidate said to the strikers that he had come to support them there and
then. Such differences as here/there hold not only underlay, but also under
tell, state, repeat, assert, claim, insist,promise, think, agree, and other words.
138 Theory of Syntax
Among all these there is one, Isay or One says, in the present tense, that has
a peculiar property which serves to explain the relations noted above, and
also several other exceptional forms in language.
This property is a metalinguistic analog to the performatives discussed in
logic, as when a judge says / (hereby) sentence you to . . ., in which his saying
that he is sentencing constitutes the act of sentencing. For a speaker to
compose the sentence I say ‘John left' or I say that John left (or possibly One
hereby says that John left) is the same as for him to actually say John left.
Hence, the I say or One hereby says contributes no information, and is
zeroable. Indeed, the sentence John left, and any sentence that is said (aloud
or to oneself) or written (except e.g. as an example of a sentence) can be
considered a reduction from: I say, plus that or quotes (or equivalent marks
such as a colon), plus the sentence in question. The Isay is not zeroed when
it is not used performatively, as in I say it’s spinach and Isay to hell with if, or
when it is used contrastively, as in I say John left but Mary denies it, or I say
that he left, but I don 7 mean it. In such cases, it may even be repeated, as in I
say that I say that he left, in order to make it clear. But a sentence cannot be
suspected of carrying an infinite regress of zeroings of / say, from an
infinite repeating of the performative I say, because that zeroing takes place
upon the occasion of the saying, and once the sentence has been said, it
cannot repeat the performative I say and its zeroing.
5.5.1. I say
There is varied evidence of the original presence of I say or One hereby says
and their subsequent zeroing. To any sentence one can add such implicitly
metalinguistic phrases as to tell the truth, or to be specific, or not to make too
fine a point of it. The zeroing of the subject in these phrases means that there
must have been an earlier occurrence of that missing subject (of tell, etc.)
which had served as repetitional antecedent for the zeroing of that subject.
This other occurrence of the subject could not have been present except as
an argument of some operator; and the combination of that earlier subject
with its operator had to have been zeroed on some grounds, since they do
not appear. The only candidate for such zeroing that would have been
possible in every sentence and that would have left no trace (except for the
very saying of the sentence) is the performative I say or One hereby says; no
other subject and tense of say would make it performative, nor would any
other verb with different meaning.
In many cases there is direct evidence of a metalinguistic I, as when the
insertable metalinguistic phrase is not to repeat myself (where myself could
arise only from for me not to repeat me), or to make myself clear-, also in I
hasten to add or at least I think so or and I say again. Here, the repeat, say
again, add, and at least each implies in different ways an I say or One
Metalinguistic Apparatus within Language 139
(hereby) says on the primary sentence: John left and, 7 hasten to add, not
secretly from / say John left and I hasten to add. . .; John left, at least l think so
from 7 say John left, at least I think so; He's wrong. I say again: He’s
absolutely wrong from 7 say he's wrong, I say again ....
The widespread zeroing of / say explains certain forms where the analysis
shows that I say has been zeroed by mistake. For example, sentences such as
He's back, because his car is in the garage, which do not make sense as said,
clearly have been reduced from / say he’s back, because his car is in the
garage, where the 7 say should not have been zeroed, because it operated
only on the first sentence and is in turn operated on (as first argument) by the
because. Another example is in sentences such as The Times says that our
stupid mayor will be speaking tonight; since the newspaper said neither
stupid nor our, the source here has to be 7 say that the Times says that the
mayor who I say is our stupid one . . . Here, zeroing the second 7 say left the
who is stupid to appear as modifier on mayor, which is incorrect since mayor
is under the Times says. Consider also the sentence in which a reporter
writes, about a woman speaking to her husband. She said Mary, their
daughter, would go to the police station. Had the woman said to her husband
Mary, our daughter, . . ., this would have been reported as She said Mary,
their daughter,. . .; but of course she did not say our daughter. She must have
said Mary will go in which the reporter inserted something like who I say is
their daughter, where 7 say should not have been zeroed because (as with
stupid above) it is one speaker’s insert in reporting another person’s remark.
other word category. They are also unique in language as being basic
operators of set theory and logic. In addition, not and the interrogative do
not preserve the selection of the words under them, whereas operators in
general preserve such selection: thus the stone spoke and I want the stone to
speak are rare and peculiar combinations, but The stone didn't speak and
Did the stone speak? are much less so. Instead of disregarding these
peculiarities and also the logic status of these words, we can to some extent
explain them by deriving the words from modifiers on the I say (e.g. for each
one in the source of the interrogative, 5.5.2, and in the source of any,
below). This can be expressed in artificial restructured sources, such as I say
negatively (or I deny, gainsay, contradict, disclaim, claim as false) that John is
here, reduced upon zeroing of I say (or -diet or claim) to John is not here-, I
say both (or jointly, or co-state) Mary is here, John is here, where / say both
operates on a pair of sentences and is reduced upon zeroing of I say or / state
to Both Mary is here and John is here with both then zeroable; also I say
either (or alternatively) Mary is here, John is here, reduced to Either Mary is
here or John is here with either then zeroable. (Possibly, also, the intonation
that marks a secondary sentence, written as semicolon or dash, could be
obtained from /say with an aside operating on a sentence-pair: e.g. Mary is
coming, She phoned, yielding Mary is coming; she phoned or Mary—she
phoned—is coming, ultimately Mary, who phoned, is coming.)
As to any: in Anyone can go the word means roughly ‘every’, whereas in 1
didn’t see anyone go or in Will anyone go? it means ‘even one’. Also, there is
the peculiarity that any rarely occurs with only a single operator. In spite of
the popularity of phrases like Anything goes, the more customary form has
two operators, as in Anything small will do, or Anyone who can goes there.
All these forms and distinctions can be obtained within the existing grammar
if we take any to be an adverbial form of one, in particular (one) by one
(which is roughly its etymological source) operating specifically on the say.
Then Anyone can do it is from, roughly, / say for each one that this one can
do it (hence ‘everyone can’); and / didn't see anyone do it is from I deny for
each one that / saw this one do it (hence ‘not even one did'); and Will anyone
do it? is from / ask for each one whether this one will do it (hence ‘will even
one do it?’). The selectional preference of any for two operators is thus in
order to relate the two predicates in respect to the one-by-one matching:
For each one if he can then he goes there, above (G327).3
Finally, one of the most important grammatical advantages of recognizing
the metalinguistic say operator is the explanation which it offers for the
tenses. If we consider a sentence such as A person who registered as late as
' There are also less elear cases of words whose meanings or positions can he explained by
saying that they operate on the performative say. One such is seen in hardly, scarcely, whose
occurrence in many sentences such as John hardly deserves it, John scarcely deserves it would be
more understandable if the hardly, scarcely referred to the saying of the sentence: that is, it
would be hard for the speaker to say that John deserves it.
Metalinguistic Apparatus within Language 143
4 Note that since a relative clause is a secondary sentence, the lower operator registered
above receives -ed from being ‘before’ in respect to the primary operator able to vote. The
higher able to vote receives will from being ‘after’ in respect to the time of saying the sentence,
i.e. in respect to the say on the sentence.
5 There are many subtle grammatical considerations in the syntax of tense and of before and
after which support the convoluted derivation offered here. In addition, taking tense not as an
ad hoc primitive construction (outside the operator-argument framework) but as a derivation
from time or order conjunctions between a sentence and the statement that it is being said
leaves room for various non-temporal uses of tense, deriving from non-temporal uses of those
144 Theory of Syntax
In sum: say and the like can carry certain adverbial modifiers (as false,
both, either, demand, querying, 0/something (5-3.4), o/ze by one, before,
a/ter, etc.) which can impose respectively not, and, or, imperative and
interrogative forms, deictic pronouns, a/zy, tense, etc., on the operand
sentence. Of these, both and either (the latter includes the interrogative
whether) require the operand to be a sequence of sentences rather than a
single sentence. Uniquely, the and and or differ from the two-sentence O00
operators of grammar in that they are not restricted to being binary, and in
that they are semantically associative and commutative: He left because she
phoned him because it was late means different things in different binary
groupings, but He went and she phoned and I stayed does not; also She stayed
because he left and He left because she stayed mean different things, but She
will stay or he will leave and He will leave or she will stay mean substantially
the same, though not in nuance. Also the disjuncts of the interrogative are
commutative. All of these adverbial modifiers relate the operand sentence
to the performative operators on them. In particular, not, both, either are
relevant to logic and set-theory because they deal with the saying of
sentences.
Most generally, some of the operators on sentences are metalinguistic in
that they can operate on the words of the sentence as phoneme-sequences
and not only on those words as indicators of particular referents. Of these,
one operator, say, or a small set {say, ask, etc.), with I or one as subject, has
performative effect and is therefore zeroable upon the saying of the operand
sentence.
As indicated at the beginning of 5.5.3, the adverbs on say require more
investigation before they can be adequately formulated in a theory of
language structure. The least of the problems is that the reconstructed
sources are clearly non-historical. Whereas the imperative and interrogative
can be seen to be related to their proposed sources, the sources for tenses
and and/or/not are convoluted beyond what would be said in natural
language. What remains relevant, however, is, first, that all of these (and
such words as any in English) have combinatorial properties and levels of
meaning that differ from the rest of language, and, second, that these
grammatical and semantic properties can be obtained, as suggested above,
from entities and relations of a syntactic theory that suffices for the rest of
the language. The serious problems arise in accounting for all the environ¬
ments in which these proposed adverbs on say occur and do not occur, and
above all in tracing how these derivations function together in the kinds of
complex sentences that many languages have.
ordering-conjunctions: e.g. using the past for ‘contrary to fact’, as in If he were here l would
know it; or using the future for reducing the speaker's responsibility for what he is saying, as in I
think what you want will he in the fourth aisle. These considerations are surveyed in Z. Harris,
Notes du cours de syntaxe, ed M. Gross, Editions du Seuil, Paris, 1976, p. 158.
6
1 The expectation of useful mathematical description of the data of language stems from
developments in logic and the foundations of mathematics during the first half of the twentieth
century. One main source was the growth of syntactic methods to analyze the structure of
formulas, as in Skolem normal form and Lowenheim’s theorem, and in the Polish School
of logic (as in the treatment of sentential calculus in J. Lukasiewicz, and the categorial grammar
of S. Lesniewski and later K. Ajdukiewicz, cf. ch. 1 n. 6), and in W. V. O. Quine's Mathematical
Logic (Norton, New York) of 1940. Another source is in the post-Cantor-paradoxes con¬
structivist views of L. E. J. Brouwer and the Intuitionist mathematicians, and in the specific
constructivist techniques of Emil Post and Kurt Godel, in recursive function theory, and from a
somewhat different direction in the Turing machine and automata theory. Cf. S. C. Kleene,
Introduction to Metamathematics, van Nostrand, New York, 1952, and Alonzo Church,
Introduction to Mathematical Logic, Princeton Univ. Press, Princeton, 1956. In linguistics, the
‘distributional’ (combinatorial) methods of Edward Sapir and Leonard Bloomfield were
hospitable to this approach. Cf. also Nelson Goodman, The Structure of Appearance, Bobbs
Merrill, New York, 1966.
146 Theory of Syntax
2 This is called the autonomy of syntax, i.e. that no phonetic or semantic property of words is
considered in determining what constraints on word combination create sentences. The syntactic
structure nevertheless conforms to meaning-differences, but the phonemic structure of words
does not in general.
On the Mathematics of Language 147
3 Z. Harris, [Methods in] Structural Linguistics, Univ. of Chicago Press, Chicago, 1951,
pp. 26-33, 60-4, 70-1.
148 Theory of Syntax
4 For the distinction between such repetition and imitation, cf. Kurt Goldstein, Language
and Language Disturbances, Grune and Stratton, New York, 1948, pp. 71, 103.
On the Mathematics of Language 149
(h) Contiguity. It was seen in 2.5.4 that all relations and operations in
language are on contiguous entities. This is unavoidable, since language has
no prior defined space in which its phoneme- or word-occurrences could be
located. This condition of contiguity does not preclude successive segments
of an operator from being placed next to different (but already present)
parts of its operand (as in the case of discontinuous morphemes, 2.5.4), or
between parts of its operand as in the nested sentence example of (f) above.
(i) Finite length. There is nothing to indicate that utterances, i.e. occur¬
rences of speaking or writing, must in principle be finite in length. However,
the sentences, whether these are defined structurally by some relation
among segments of the utterance, or stochastically by sequential regularities
of the segments (7.7), are necessarily finite in length; for until a segment of
an utterance is finished we cannot be sure that it has satisfied throughout its
length the conditions for being a sentence. In actuality, sentences are
reasonably short except for tours de force of various kinds.
(j) Number of sentences. For a given number of phonemes, and also for a
given number of words, the set of sequences is recursively denumerable. But
for the subset of these sequences that constitutes a language, the question as
to their number is more complex, and depends on how we characterize the
criteria for their being in the subset/ In any case, the set of possible
sentences as fixed in the grammar is not known to be finite, since to every
sentence one can in principle add without limit some further clause.
(k) Finitary grammar. Aside from the non-finiteness and recursivity of
language as a defined object, one should also consider the finiteness of the
speakers and hearers. The phonemes and vocabulary, and the ways of
combining words into sentences, all have to be used by the speakers of a
language in a way that enables them to understand each other. The users,
and their experiences with the language, are all finite. Hence it must be
possible to characterize a language by a finitary grammar: a finite stock of
elements and also of rules (constraints) of combination, with at least some of
the rules being recursively applicable in making a sentence.
We now consider the number of event types (as against tokens) in a
language, i.e. of occurrences which are linguistically distinguishable from
each other. The number of items that have to be described is affected by the
fact that languages change. In particular, new words and new combinations
enter the language, and some go out of use. Over a very long time, with
successive generations of speakers, the vocabulary of the changing language
may be large to an extent that we cannot judge. However, there are no doubt
limits—barely investigated so far—on the growth of the vocabulary, and on
the rate of change in the structure of language. The number of elements and
of rules of combination available at any one time is not very large. It will be
seen in 12.4.5 that in respect to an appropriate theory of language one can in
principle count everything that a user has to know in using a language.
(1) Computability. It follows, from the discreteness, linearity, contiguity,
and finitary grammar, that the structure of sentences is computable. That
is, finite algorithms can be created that can synthesize, and much more
importantly analyze, any sentence of the language. Although constructing
such a language, especially for analysis (recognition), is a daunting task, it
has not proved impracticable (ch. 4 n. 1).
It was seen in 6.1 that many properties of language data are amenable, with
various exceptions, to description in mathematical terms. This makes it
desirable to regularize the properties not only for efficiency and elegance of
theory, but also in order to reach a description of language in terms of such
elements and relations as permit interesting decompositions, mappings, and
constructions that are closed over relevant domains. The criterion is that any
mathematical structure which is posited should apply in a specified way to
observables in the data and to their stated relations, and not to abstract
concepts whose precise relation to the regularities of the data has yet to be
established.
Such regularization means finding effective ways to treat each ‘exception’
as a non ad hoc special case of the regular relation. This may require a
modified formulation of the relation. For example, not all words need to be
pre-set, i.e. known to the hearer, in order to be understood: a sentence can
contain an unknown word whose boundaries are given by its neighbors and
whose meaning can be guessed from the rest of the sentence. Or the
regularization may be accomplished by finding special conditions that
together create the apparent exception. Thus in linearization of phonemes,
the nasalized vowel in the French don (‘gift’)—overtly a single sound—is
linearized as vowel plus n (in the absence of following vowel), in order to
agree with the vowel plus n sequence when the word appears with following
vowel, as in donner (‘to give’). Also, in defining words (or morphemes) as
being each a sequence of contiguous phonemes, allowance has to be made
for cases where the phonemes are not contiguous. An extreme case is in
what is called ‘agreement’ (‘government’, ‘concord’) in grammar: e.g.
plural-agreement in The stars rise as against The star rises. Instead of
defining here an ad hoc grammatical relation between the verb and its
subject, additional to the partial-ordering relation which creates sentences,
we can say that in English the plural morpheme (and also and) on the subject
152 Theory of Syntax
In a regular way here means in the same relative position in all sentences, including the
shortest, after correcting for any reductions (3.3) and linearization differences (3.4.2-4).
On the Mathematics of Language 155
(and only on) any pair of words which in themselves do not depend on any
words (equivalently, that depend circularly only on the first-level operators),
namely two ‘zero-level arguments’ N\ ‘second-level’ operators (e.g. because)
act on (and only on) any pair of words that in turn depend on other words.
Every word which depends on any stated dependence properties can form a
sentence, by combining with words whose dependence properties satisfy it.
Every sentence is a sequence which satisfies the dependence-on-dependence
of every word in it; and every such sequence is grammatically a possible
sentence, even if nonsensical (e.g. Vacuum chides infants) or artificially
complex (e.g. the proposed restructured source for tense, as in / state his
leaving; his leaving is before my stating it as source for He left).
As noted previously, the dependence relation creates the meaning ‘pre¬
dication’ (11.2), or carries that interpretation, which holds for every system
having this dependence, such as mathematics, even if the system lacks the
other features of language (e.g. likelihood differences). The predicational
meaning does not require any of the other features of language, nor does it
require the particular physical content (sound, ink marks) or meanings of
the words involved. The dependence on dependence is defined on arbitrary
objects, not necessarily words at all; and the abstract system it creates is thus
a system of categories.
(c) This mathematical object is not the whole of language, but it underlies
language in that the other sets and relations which are essential to language
are defined on its products. To obtain the structure of natural language we
need two additional relations: the differences in frequencies of co-occurrence,
and reductions in phonemic content.
The distinction between zero and non-zero frequency of combination, in
terms of which the dependence was defined, leaves open the question of
randomness, or stable or variable differences, in frequencies of each word
relative to the various words on whose set it is dependent. What gives
language its unique informational capacity is that there are indeed such
differences and that they are roughly stable, although marginally change¬
able over time; and they correlate with—and to some extent create—the
individual meanings of each word involved (3.2). These stable differences
can be expressed as inequalities, or even a grading of likelihood (or
probability), of each operator in respect to its possible arguments (and vice
versa).
The reductions are constraints whereby word-occurrences in a given
sentence which have highest likelihood (or otherwise no informational
contribution) in respect to their argument or operator there can be reduced
in phonemic content (3.3). The reductions are relations between sentences
because every occurrence of an operator on an argument constitutes a
sentence, and the reduction changes that into a physically (phonemically)
different sentence. There may also be some stated other grammatical
156 Theory of Syntax
7 The common uses of the term ‘tree’ include cases that are not applicable to language. For
example, in genealogical trees, the relation of two cousins to their six instead of eight
grandparents does not occur in the dependence relations among words of a sentence: as noted
below, a word-occurrence cannot be the argument of two different other word-occurrences.
The sameness statement (5.3.3) presents a problem here, because it refers to two words
which are already the argument of other operators. The problem is avoided by a solution
proposed by Danuta Hiz, which excepts the metalinguistic statements from the lattice of the
sentence (8.3).
On the Mathematics of Language 157
operation; one can consider the null operator as identity. The object here is
not the operator itself, but the application of it to its argument. For present
purposes, given A applied to (operating on) an argument X, and then B
applied to A (with its argument), we consider B applied to A without
argument as a complex operator BA applied to the argument X. A product
of two operator-successions in the semilattice is itself an operator-succession;
the multiplication is associative.
The binary second-level operators, i.e. those whose arguments include
precisely two operators, form a set of binary compositions on the set of
sentences. In the base set, each binary second-level operator can act on
every pair of sentences, although its likelihood of occurrence is higher on
sentence-pairs which contain in their base form a word in common. In the
whole set of sentences, certain reduced forms (e.g. question) do not appear
under certain binary operators: not in I am late because will you go?', this
may be resolved by defining here some required transformations which
would replace these sentences by their non-prohibited source, in this case I
am late because l am asking you if you will go. Products of these binaries are
in general not associative in their semantic effect, and correspondingly in the
history of making the resultant sentence. Here the structure of the base
semilattice, which is contained in the word-and-reduction semilattice of the
resultant sentence, is crucial.
(g) The most important algebraic structures in the set S of sentences are
those which arise from equivalence relations in S in respect to the particular
base (operator-argument) semilattice in each sentence, and in respect to the
highest operator (the upper bound of all words in the semilattice), or to
the word-and-reduction semilattice. These equivalence relations identify
the informational sublanguage (the base) and the grammatical transforma¬
tions, as will be seen below.
We note first that the resultant of every operator is a sentence. Every
unary second-level operator acts on a sentence (and possibly also an N) to
make a further sentence; and every binary second-level operator acts on two
sentences (and possibly an N) to make a sentence. Every reduction acts on a
sentence to make a (changed) sentence. All of these, in acting on a sentence,
preserve the inequalities of operator-argument likelihoods in the operand
sentences (except in statable circumstances). The unary second-level operators
are a set of transformations on the set of sentences: each maps the whole set
S of sentences into itself (specifically, onto a subset of sentences which have
that second-level operator as their latest entry, i.e. their universal point);
and the binaries map SxS into S. The reductions and other transformations
are a set of partial transformations on S, each mapping a subset of S
(sentences whose last entries are a particular low-information operator-
argument pair) onto another subset of S (sentences containing the reduction
on a member of that pair).
On the Mathematics of Language 159
Grammatical Analysis
*.
7
Analysis of Sentences
The theory of syntax presented in the preceding chapters has words as its
primitive elements. Since one might hesitate at a theory whose primitives
are undefined or established only circularly, it is necessary to consider
whether these words are determinable by objective and effective methods,
independently of the syntactic theory. To do so we have to see how linguistic
methods have segmented speech to obtain phonemes as classes of sound
distinctions, words as sequences of phonemes, and sentences as sequences
of words.
Traditional grammar was an informal description of the shape and meaning
of words and sentences. In contrast, descriptive and structural linguistics
recorded the shape of successively larger segments of sentences. The search
for coherent and regular analyses of sentences in terms of their segments was
satisfied by successive steps of classification and sequencing. At various
stages, the sets of entities obtained were partitioned into equivalence
classes. The set of equivalence classes then constituted a new set of 'higher'
entities whose behavior in respect to the sentence-segments was far more
regular than that of the earlier entities. Thus: the phonemically distinct
sounds are collected (as complementary variants, or free variants) into
phonemes (7.2); the word-size segments are collected as complementary or
free variants into words (‘free morphemes’) or bound morphemes (7.4); and
words are collected, by similarity of environments, into word classes (7.5).
Particularly constrained sequences of the entities that have been obtained
are defined as constituting new ‘higher’ entities, which have new regularities
in respect to each other. These are phoneme-sequences constituting word-
size segments (which have the status of words or bound morphemes in their
sentences, 7.3), word-class sequences constituting either constituents or else
strings (by two different styles of grammar, 7.6), and string sequences (or
constituent sequences) constituting a sentence as a regular segment of
utterances (7.8). In certain important cases the sequences can be characterized
by stochastic (hereditary) processes that proceed through them, stating for
each first n members of a sequence what are the possibilities for the n+ 1th
position in it. Such methods can define certain kinds of sequences as
constituting next-higher entities (words, sentences, 7.3, 7).
164 Grammatical Analysis
It will be seen that all the entities can be identified by objective repeatable
procedures, and defined on the basis of the relations in which they participate,
without appeal to other properties. It is of interest that all the major
segmentations are recognizable in writing, in particular in alphabetic writing.
Phonemes correspond roughly to letters of the alphabet. Words are marked
by space, and bound morphemes can be considered word forms in which the
bounding-space has been zeroed. Virtually all punctuation will be seen to
mark the boundaries of sentences; the non-sentence segments marked by
comma and dash will be seen in 8.1 to be reduced forms of sentences. But
many sentences, reduced or not, which are contained in larger sentences are
not bounded by punctuation: e.g. small in A small stone fell. We sleep in He
works while we sleep.
For the methodological considerations of 2.3, it may be noted that at each
stage the classifications allow for simpler statements of the sequencing
constraints (on the domain of the equivalence classes) than would have
applied to the domain of the entities themselves, before the partitions. And
in the case of special problems, such as the exceptions and special forms so
common in language, the peculiarities are moved down from the domain of
the sequencing relations to the domain of the partitions where they can be
treated much more easily.
The use of language takes place not only in speech but also in writing and
thinking; but these are in different ways secondary to speech, and in
particular writing is relatively recent. A search for the ultimate minimal
elements of language, therefore, necessarily turns to the sounds and sound-
features that play a role in speech. Phonemic analysis holds that in every
language certain sets of sounds, called phonemes, are distinguished from
others by users of the language, with each language having a fixed set of
phonemes.1 The existence of hearers' linguistic distinction of sets of sounds,
and the actual list of sounds thus distinguished in a language, can be seen in
the following simple test with users of the language.
We begin with two speakers of the language, and two utterances, prefer¬
ably short and preferably similar to each other, e.g. heart (an organ) and hart
(a deer), or heart and hearth (a fireplace), or economics with initial e and
with initial iy. We ask one speaker to pronounce randomly intermixed
repetitions of the two words—the organ and the deer, or the organ and the
fireplace—and ask the other, as hearer, to guess each time which word (or
1 Sec chiefly Ferdinand dc Saussure, Cours de linguistique generate. Paris, 1910; Edward
Sapir, ‘Sound Patterns in Language', Language 1 (1925) 37-51, reprinted in David G.
Mandelbaum, Selected Writings of Edward Sapir, Univ. of California Press, 1958, pp. 33-45;
Leonard Bloomfield, Language, Holt, New York, 1933.
Analysis of Sentences 165
pronunciation) was being said. It is found that the hearer’s guesses are
correct about 50 per cent of the time for some pairs of words, such as heart-
hart, and about 100 per cent for other pairs such as heart—hearth, or for the
two pronunciations of economics-, and the results are much the same for any
users of the language. In the cases where the hearer’s guesses are correct
close to 100 per cent of the time we say there is a phonemic distinction
between the two words (heart, hearth, ox the two pronunciations of economics);
otherwise not. In some cases the results of this pair test are problematic, and
the decision as to whether a phonemic distinction holds is adjusted on the
grounds of later grammatical considerations. But the direct results of this
pair test provide a starting-point, a first approximation to the set of ultimate
elements adequate for characterization of language.
The ability to distinguish utterances in the pair test does not depend on
any recognition or agreement by speaker and hearer in respect to the
meaning of the utterances, or even just whether the meanings are the same
or not-—the more so as a single utterance may have more than one
meaning, and there are cases of homonyms (e.g. heart and hart), which are
considered different words with different meanings, but are phonemically
identical; also, the two pronunciations of a single word, economics (with
initial e or else iy sounds), are phonemically different and are distinguished
in the pair test. Rather, then, the distinguishability depends upon, and
illustrates, the behavioral fact that when a person pronounces a word he is
not imitating its sound but repeating it (ch. 3 n. 4), that is, saying his own
version of the phonemic distinctions which he has heard. The role of
repetition as against imitation will be discussed in Chapter 12, but the
clustering of pair-test responses near 50 per cent and 100 per cent shows that
utterance-recognition is based on repetition: an utterance either is or is not a
repetition of another. The ability to recognize what is a repetition means
that the phonemic distinctions of a language are pre-set by convention. They
differ from language to language but can be acquired—in many cases
perfectly—when a person learns a new language. Even a predisposition to
recognize phonemic distinction in the abstract, or repetition as a relation
between behaviors, need not be assumed as an inborn or a priori preparation
for a person’s being able to recognize phonemic distinctions when he meets
them. It may be simply that various pronouncings are maintained at such
distances or contrasts from other pronouncings that a person learns to
maintain the distance or contrast even if his own pronunciation (or that of a
new person he hears) is somewhat different.2 3 This is an unusual relation in
human behavior, so that the search for analogs to phonemic distinction in
other human behavior has not been fruitful.
2 The pair test will equally well distinguish tez from taz, where no meaning or meaning-
difference is present.
3 Cf. E. Sapir in n. 1 above. Phonemic distinctions are distinctions among timbres, i.e. as to
which overtones have particularly high amplitudes.
166 Grammatical Analysis
7.2.0. Phonemes
4
Cf. Z. Harris, /Methods in] Structural Linguistics, Univ. of Chicago Press, Chicago, 1951.
Analysis of Sentences 167
word—such as a bull-head (’alpu) used for the glottal stop a house (betu)
for b, a camel (gamlu) for g (the greek gamma), etc.
That the original alphabet was only the application of a technique, rather
than a full understanding of sound-segments, is seen in the fact that it did not
develop marks for the vowels. The acrophonic reading of the pictographs
yielded only the sounds that were word-initial, which were always consonants
(in Semitic as in Egyptian). The marks were not thereupon extended to
vowels. This was not because of the usual explanation, namely that ‘vowels
are unimportant in Semitic’—they are essential for distinguishing gram¬
matical forms—but because the acrophonic method which was used to
obtain the alphabet could not yield the vowels, since no Semitic or Egyptian
word began with a vowel. Thus the acrophonic basis of the original alphabet
explains not only the introduction of alphabetic marks but also their original
vowellessness. The Greek borrowing of the alphabet, which retained its
acrophonic property by keeping the Semitic letter-names (alpha, beta, etc.),
yielded the vowels indirectly because Greek disregarded the Semitic laryngeal
consonants, leaving their marks for the vowels which followed the unheard
laryngeals. (Intentional additions of vowels—and additional consonants
and even consonant clusters—took place later in certain extensions of the
original alphabet, and at the end of the Greek alphabet.)
To summarize: the alphabet was not made by any phoneme-like de¬
composition of words into their successive sounds: such decomposition does
not appear as a normal final stage of writing development. It was made by
utilization of phonemic distinctions in word-initial position. The successive
sounds of each word—to the extent that pictographs were available for
them—were written with the pictograph for a word whose beginning was
that sound; in the original alphabet, this supplied only the consonants of the
word.6 Pictographs became established to the extent that they had distinct
beginnings under repetition; and the retention of their names (alpha, beta,
etc.) was essential to their acrophonic use, which is why they were retained
in Greek.7 The vowels, in the words being written, were not written,
because there was no picture whose word began with those sounds. Com¬
plementary variants of phonemes were not come upon, because they did not
occur in the initial position. But as many marks were established as there
were phonemically distinct word-initials in the language. The fact that
successive segments of a word were marked by the beginnings of various
other words was thus not a new recognition of phonetic segmentation: the
Sinai writers were doubtless doing only what they thought the Egyptians
6 We cannot tell whether the early users of the alphabet assumed that the required vowels for
a given word could be drawn from the various vowel values of the Egyptian signs (which did not
hold for the specific Semitic letter-names 'alpu, betu, gamlu, etc.), or whether they simply
wrote for a word as many relevant marks as were available to them, which were only those that
represented word-initial sounds (consonants).
7 Later borrowings of the alphabet lost the acrophonic names while keeping much of the
original order: e.g. the Latin alphabet and the Arabic.
Analysis of Sentences 171
were doing. And the Egyptians continued with their non-alphabetic system,
just writing some long words with the successive marks for short words
whose sounds were similar to successive segments of the long words.
The alphabet was thus neither an invention nor a discovery, but a
development whose product was novel because existing materials had
unused potentialities: the apparent acrophony of marks for one-syllable
words. This development has a certain distant similarity to aspects of the
evolution of language (12.4.4).
FIG. 7.1. Word boundaries in a sentence. Dogs were indisputably quicker. Dots
have been inserted between the phonemes, to show where a morphemic segmentation
would have been made on syntactic grounds. These dots were, of course, not
involved in the test.
Such results have been obtained by this procedure, going both forward
and backward, in various disparate languages. Similar results were also
obtained for the more difficult case of morpheme boundaries within one
word at a time, each initial sequence being matched for its next-neighbor
variety, both forward and backward, in a sampling from Webster’s unabridged
English dictionary, together with various science dictionaries, alphabetized
both forward and backwards. In this case, spelling was used unavoidably
instead of phonemic sequence, but the results remained in good accord with
grammatical segmentation, as can be seen from Figure 7.2.
s The figures, and various sections of the text, in 7.3-8 and 8.1 have been taken from
Z. Harris, Mathematical Structures of Language, Interscience Tracts in Pure and Applied
Mathematics 21, Wiley-Interscience, New York, 1968, where more details are given.
Analysis of Sentences 173
1 2 1 1 2 20 5 13 25
1 s * t u r b • a n c e
15 24 24 8
2 2 4 2 1 1
1 1 4 3 18 15 11 25
i s • e m • b o d y
15 24 24 11 5 6 5 2
4 7 1 1 5 24 12 26
i • s u 1 f • i d e
15 24 24 5 2 4 2 1
1 4 2 2 9 19 17 25
d e • f o r m • i t y
15 26 9 5 4 4 3 1
5 9 24 26
apple
26 14 7 5
phonemes and utterances (or, rather, as will be seen, sentences) is that the
constraints on freedom of combination which we seek as characterizers of
utterances, and the meanings which these constraints carry, can be stated on
a far larger body of elements—the morphemic segments (including words)
—than just the twenty to thirty phonemes that each language has. This
makes it possible to expect a much simpler system of constraints than if
everything language does had to be stated in terms of the few phonemes it
has. By the same token, the segmentation means that phoneme combination
short of the peaking constraint (e.g. in syllables), is not directly relatable to
the main, and information-bearing, constraints of utterances—those of
Chapter 3. That is to say, the phonemic composition of morphemes is
arbitrary, as are phonemes themselves, in respect to most of the rest of
grammar; and phonemes do not carry meaning. The peaks of v which
determine the segmentation result from two facts: that not all phoneme
combinations are utilized in making morphemes, and that the number of
morphemes in a language is some three orders of magnitude larger than the
number of phonemes. The boundaries of a morpheme are therefore points
of greater freedom for phoneme combination.
We thus have not only a distinguishing property of those phoneme-
sequences which constitute initial segments of sentences, but also evidence
that sentences (differently from non-sentences) contain a certain segmentation
into ‘morphemic’ sub-sequences of phonemes; and we have the boundaries
of these morphemic segments.
For the user of a language, the morphemes are known not, of course, from
this comparison of phoneme-sequence successors, but from being learned,
some in isolation and some by their regularity of combination in utterances.
They are thus pre-set as the user becomes able to speak or understand the
language, except insofar as the meaning of an unknown morphemic segment
in an utterance can be guessed from knowledge of the environing ones.
Various adjustments can be made in this procedure, particularly in order
to get a morphemic segmentation that has simpler syntactic regularities.
One adjustment eliminates the occasional case in which the rise, which
marks the end of the morpheme, is gradual rather than abrupt as it should
be. For this, we take v not as the number of phonemic followers to each
initial sequence ., p,„ (where pm is some phoneme a and pm_x is a
phoneme b) but as the ratio of that number to the number of different
phonemes that follow the phoneme a (or the sequence of phonemes ba) in all
neighborhoods in which a (or ba) occurs.9 In this ratio of hereditary to
Markov chain dependence, we obtain the hereditary effect more sharply;
i.e. the restriction on the followers of pm=a due to that particular pm being
'' We may consider the pair ba, rather than just the last phoneme, because many syllable
restrictions depend on pairs of neighboring phonemes. The way in which the phonetic property
of phonemes restricts their successor does not depend on the predecessors back to the
beginning of the utterance.
Analysis of Sentences 175
10 In the examples above, the two cases in which v rises gradually to the peak, 9-14-20 for dis
and 11-6-9-28 for kwik, would thus be replaced by decreasing ratios. Fewer phonemes can
follow a consonant, especially a pair of two consonants, than can follow a vowel. Thus 9 is a
smaller percentage of the total followers of i than 6 is of the total followers of w.
176 Grammatical Analysis
To go from words and bound morphemes of 7.4 to the set of all possible word
combinations in utterances is a large step. This is especially so in view of the
great number of words, with almost every one being unique in some of its
combinations and transformations, in keeping with its unique meaning and
its stylistic character. In 3.1-2, it was found that this large step could be
decomposed into two crucial successive stages for each word. The first stage
determined what word-occurrences the given word never combines with in
all base utterances including the shortest (where observation is easier);
this creates the argument-requirement word classes. The second stage
characterizes the graded likelihood of each word in respect to the various
other words in whose environment it has probability >0 in the shortest
utterances.
Meanwhile, in traditional grammar, an approximation to these word
classes is obtained by related if less global considerations. The traditional
Analysis of Sentences 181
into the kind of constituents described above (e.g. what is the status of that
he go in It is important that he go?).
A more adequate and more regular decomposition of sentences into sub¬
sequences is to find in each sentence a minimal component sentence (the
'center string’: generally the tensed verb with its subject and object) and to
show that everything else in the sentence is adjoined as ‘adjunct string’ to a
‘host string’ (word or sentence) which is in the sentence (i.e. to the center
string or to a previously entering adjunct). In each sentence, the strings
consist of sequences of one or more words in the sentence. In a theory of
sentence structure, the strings are defined as particular sequences of word
classes, each originally continuous, which can adjoin particular other types
of strings at stated points—before them, after, or next to particular host
words in the string, i.e. in all cases contiguously at the time it adjoins. Each
adjunct string is defined as capable of adjoining any host string of a given
type, the resultant being again a string of the class of that host string. Thus A
is (in English) a left adjunct string of a host N (within a string containing that
N). We consider the A as a string adjoined to the string containing the N,
rather than as an adjunct to the N alone. D (adverb) is a left adjunct string of
A or D (or an adjunct of V in some string) producing DA as an A string
(entirely empty), DD as a D string (almost entirely), as well as DV (almost
fell). The DD on A is an A string, as is D on DA (both as in almost entirely
empty)-, AN and DV are not in themselves strings, since the N and V are
merely words within a string.
That the adjunctions are to a string and not to a word in it can be seen in
the cases when an adjunct referring to a word is at a distance from the word,
but not at a distance from the string of which the word is part: e.g. in Finally
the man arrived whom they had all come to meet, where the adjunct whom . . .
depends on the occurrence of man (i.e. occurs only in the presence of certain
N including man)-, or in Latin, where the adjective (adjunct) that agrees with
and refers to a noun can be at a distance from the noun. In both cases, the
adjunct is always contiguous to the string of which that noun is a part. Thus
adjunction is a relation between strings, even though the grammatical
dependence and semantic connection can be between one string and a host
word in the other string.
The main string types sufficient for English utterances are found to be:
Center strings (S): chiefly noun plus verb plus tense plus the object
required by that verb; also the interrogative and imperative forms of
this;
Adjuncts on center strings, entering at right or left or at interior points of
the host string and often separated by comma: e.g. D (perhaps,
certainly), preposition-plus-noun PN (in truth), certain idiomatic strings
(no doubt), and C conj unction followed by a center string (because they
184 Grammatical Analysis
rioted) or P followed by a 'nominalized’ center string (upon the army's
arrival, with defeat certain);
Right adjuncts on virtually any string or string-segment, consisting of and,
or, but, or comma, followed by a string or string-segment of the same
type: e.g. N or N (pen or pencil), S but S (He went but l didn't).
Left adjunct on host nouns: e.g. A (wooden expression), N (woodstove)-,
or on host adjectives, prepositions, adverbs: e.g. Don A (very wooden), D
on P (completely under), D on D (almost entirely)'.
Right adjunct on nouns: e.g. PN (of matches after a box), relative clauses
(whom we met), APN (yellow in color after a box)', on adjectives: e.g.
PN (in color after yellow)-, on verbs: e.g. PN (with vigor after oppose)'.
Left or right adjuncts on verbs: e.g. D (vigorously before or after oppose).
(By right or left adjunct on a noun or a verb is meant an adjunct to the
string that contains that noun or verb, the adjunct being placed to the
right or left respectively of the host noun or verb in the string.)
These strings are not merely constructional artefacts, but an aspect of the
total structure of utterances. This is seen in the fact that many grammatical
phenomena are describable in terms of the internal structure of strings or in
terms of a host and its adjunct string. This holds for the restrictions as to
which word subclass (within one class) occurs with which other word
subclass, or which individual word of one class occurs with which individual
word of another, or the locating of a grammatical agreement on a pair of
classes. Examples are the restrictions between the members of a string (e.g.
plural agreement between its N and its V) or beween part of a string and an
adjunct inserted into that string (the restriction of a pronoun to its antecedent,
or of an adjunct to the particular word in the host string to which the adjunct
is directed), or between two related adjuncts of a single string (as in People
who smoke distrust people who don't where the zeroing of smoke in the
second adjunct is based on its parallel relation to the first adjunct).
The string property of language arises from the condition that language
has no prior-defined space in which its sentences are constructed (2.5.4).
Within this condition, there is room for a certain variety. For example, a
string b\b2 could be inserted at two points of its host a{a2 instead of at one,
yielding e.g. axbxa2b2 instead of the usual bxb2axa2 or axbxb2a2 or axa2b\b2,
this happens in the rare intercalation He and she play violin and piano
respectively. But one adjunct c of a host axa2 could not be inserted into
another adjunct bxb2 of that host: we would not find axbxcb2a2 or the like
(unless c was an adjunct of bxb2, or an interruption between dashes). And a
part of a string cannot be dependent on an adjunct to that string, or on
anything else which is outside that string itself. Any sequence of classes
which satisfies the definitional conditions above is called a string; and what is
empirical is that it is possible in language to find class-sequences (of rather
similar types in the various languages) which indeed satisfy these conditions.
Analysis of Sentences 185
What string analysis characterizes directly is not the set of all utterances of a
language, but the set of all sentences, these being a segmentation of
utterances. The reason string analysis produces only sentences is not that we
started intentionally with sentences—we started with whatever was a short
utterance—but that we did not include sequences beginning with a period
among the adjunct strings: such a sequence as and perhaps she left is an
adjunct string (as in He left and perhaps she left) but period + Perhaps she left
I I
Proposed by Henry Hiz.
186 Grammatical Analysis
(as in He left. Perhaps she left) is not. If we look only at the succession of
these sequences, we may see little difference, e.g. between He left and
perhaps she left and He left. Perhaps she left. But if we check for restrictions,
we find important differences. For example, under and there is a rare
zeroing to He left and perhaps she, but hardly under a period (He left.
Perhaps she. is dubiously accepted). More importantly, sequences under
and can be the subject or object of further words, but not sequences under a
period: His leaving and her leaving are clearly related, but no His leaving.
Her leaving are clearly related; similarly That he left and she left surprised us,
but no That he left. She left surprised us. For such reasons, the period is not
included in the set C of conjunctional strings. Hence, although the minimal
‘center’ strings include any minimal utterance, the result of string adjunction
is not the set of all utterances but the set of certain segments of utterances
called ‘sentences’.
The existence of sentences as a unique grammatical construction is shown,
differently for different languages, by various considerations such as above.
In a more general way, string analysis can be used for a recurrent stochastic
process which locates the sentence boundaries within a discourse. Just
as the process proposed in 7.3 shows that a segmentation into morphemes
exists (and locates the specific boundaries), so the process proposed below
shows that a segmentation into sentences exists (and locates the sentence
boundaries).
We consider the word-class sequences which constitute sentences, and we
think in terms of distinguishing in them all positive transitional probabilities;
from those which are zero. We can now state a recurrent stochastic depend¬
ence between successive word classes of each sentence form in respect to the
string status of each of those words.12 In this hereditary dependence of the
nth word class, w„, of a discourse on the initial sequence wq, . . ., w„_,,
sentence boundaries appear as recurrent events in the sequence of word
classes. This is seen as follows: if a discourse D is a sequence of word (or
morpheme) classes and jc, y, are strings (as in 7.6) included in D, then:
a. the first word class of D is: (la) the first word class of a center
string,
or (2a) the first word class of a left
adjunct which is defined as
able to enter before (la), or
before (2a);
b. if the nth word class of D is: the m,h word class of a string x con¬
taining p word classes, p>m
then the n+lth word class
of D is: (3) the m-f- 1th word class of x
or (4) the first word class of a right
adjunct which is defined as
able to enter after the mth
word class of x,
or (2b) the first word class of a left
adjunct which is defined as
able to enter before (3), or
before (4), or before (2b);
c. if the nth word class of D is: the last word class of a left-adjunct
string x, where x is defined as enter¬
ing before the m,h word class of a
string y
then the n+lth word class
of D is: (5) the m,h word class of y,
or (6) the first word class of a left
adjunct defined as able to enter
before the mtb word class of
y, and such as is permitted to
occur after x,
or (4c) the first word class of a right
adjunct defined as able to enter
after x,
or (2c) the first word class of a left
adjunct defined as able to enter
before (6), or before (4c), or
before (2c);
d. if the /zth word class of D is: (7) the last word class of a right-
adjunct string x which had
entered after the mlh word
class of a string y which con¬
tains p word classes, p>m,
or (7') the last word class of a right-
adjunct string x which had
entered after (7)
then the n+lth word class
of D is: (8) the ra+lth word class of y,
or (6d) the first word class of a right
adjunct defined as able to enter
after the mth word class of y.
188 Grammatical Analysis
Thus we are given two possibilities for the string standing of the first word
class of a discourse; and given the string standing of the nth word class of a
discourse, there are a few stated kinds of possibilities for the string standing
of the n+lth word class. The possibilities numbered (2) are recursively
defined. This necessary relation between the nth and n+lth word classes of a
sequence holds for all word-sequences that are in sentences, in contra¬
distinction to all other word-sequences.
We have here an infinite process of a restricted kind. In cases le and If, 2e'
and 2f', 9e and 9f, the nth word class of D is the end of a sentence form,13 and
the // + 1th word class of Z), if it exists, begins a next putative sentence form.
The transitions among successive word classes of D carry hereditary depend¬
encies of string relations. But le, f and 2e', f' are identical with la and 2a,
respectively. The dependency process is therefore begun afresh at all points
in D which satisfy le, f or 2e\ f'. These points therefore have the status of a
non-periodic recurrent event for the process.
In this way, sentence boundaries are definable as recurrent events in a
dependency process going through the word classes of a discourse; and the
existence of recurrent events in this process shows that a sentential segmenta¬
tion of discourse exists. Given a sequence of phonemes, we can then tell
whether the first word is a member of a first word class of some sentence
form, and whether each transition to a successor word is a possible string
transition. That is, we can tell whether the phoneme-sequence is a sentence
—is well formed in respect to sentencehood. In so doing we are also stating
the string relation between every two neighboring words, hence the gram¬
matical analysis of the sentence in terms of string theory.
13 This holds also if the n+ 1th word class of D does not satisfy any of the possibilities for cases
e, f, since only 6e, 4e, 2e, and 4f, 2f can continue a sentence form beyond a point where it could
have ended. In this case the n+ 1th word class of D is part of no sentence form. Aside from this, if
the mth word class does not satisfy the table above, then D is covered by sentence-form sections
only up to the sentence-end preceding m.
190 Grammatical Analysis
(obtaining AAN1 (e.g. dark ominous sky) in the class TV1); and NlL=N2 if
one cannot substitute N'L for Nl throughout (we cannot put a plural noun
N'L for Nl before L to obtain NLL, i.e. a noun with double plural). At each
point where recursion does not hold, we raise the power of the host. The
hierarchical system of equations presents several stages of substitutability:
the combining of morphemes to take the place of single words (mostly level
1), the repeatable adjoining of modifiers (remaining at the same level), the
non-repeatable expansions (e.g. plural, the, and the compound tenses)
forming levels 2 and 3; and it keeps track of how a whole sentence is pushed
down into being a modifier (e.g. relative clause). For a given sentence, the
formula which is obtained, e.g. the sequence N4V4 (in English), points to a
particular ‘central' elementary sentence within it (the level 1 material which
is contained in that N4, V4). And the successive equations which reach this
formula for the sentence point to the way N1 and V1 of that elementary
sentence were expanded to make the given sentence. In this manner, it is
possible to house all structural sub-sequences of a sentence in one relation,
and also to show that each sentence contains a central elementary sentence
underlying in an orderly way the hierarchies of expansion and replacement.14
In string analysis, these expansions are then found to constitute a regular
system of contiguous adjunction of originally continuous strings. Each
sentence is obtained from its sentential string plus various adjunct strings on
it or on previous strings.
It is also possible to construct a cycling automaton with free-group
cancellation to determine sentence well- formedness. For this purpose, each
word is represented by the string relations into which it enters. We then have
an alphabet of symbols indicating string relations, and each word of the
language is represented by a sequence of these new symbols. On any
sequence of these new symbols we can, by means of a simple cycling
automaton, erase each contiguous pair consisting of a symbol and its
appropriate inverse, in the manner of free-group cancellation. Then each
sequence which can be completely erased represents a sentence.
There are three string relations (i.e. conditions of well-formedness) that
hold between words of a sentence: they can be members of the same string;
or one word is the head of an adjunct and the other the host word to which
the adjunct is adjoined; or they can be the heads (or other corresponding
words) of two related adjuncts adjoined to the same string. Furthermore,
since sentences are constructed simply by the insertion of whole strings into
interior or boundary points of other strings, it follows that the above
relations are contiguous or can be reduced to such. For each string consists
of contiguous word classes except in so far as it is interrupted by another
string; and each adjunct string is contiguous in its first or last word to a
distinguished word of its host string (or to the first or last word of another
adjunct at the same point of the host).
This makes it possible to devise the cycling automaton. In effect, we check
each most-nested string (which contains no other string within it), i.e. each
unbroken elementary string in the sentence. If we find it well formed (as to
composition and location in host), we cancel it, leaving the string which had
contained it as a most-nested, i.e. elementary, string. We repeat, until we
check the center string.
In view of the contiguity, the presence of a well-formed (i.e. string-
related) word class B on the right (or: left) of a class A can be sensed by
adding to the symbol A a left-inverse b' (or: a right-inverse 'b) of B. If the
class A is represented by ab\ the sequence AB would be represented by
ab'b; and the b'b would cancel, indicating that the presence of B was well
formed. For example, consider the verb leave in the word class Vna (i.e. a
verb having as object either NA, as in leave him happy, or NE, as in leave
him here). It would have two representations: va'n' and ve'n'. Then leave
him happy could be represented by va'n'.n.a,15 which would cancel to v,
indicating that the object was well formed.
Specifically: for any two word classes (or subsequences of word classes) X,
Y, if (a) the sequence XY occurs as part of a string (i.e. Y is the next string
member to the right of A'), or (b) Y is the head of a string which can be
inserted to the right of X, or (c) Y is the head of a string inserted to the right
of the string headed by X—then, for this occurrence of X, Y in a sentence
form, we set X—>y' (read: Xis represented by y', or: the representation of X
includes y') and Y—>y, or alternatively A"—>x, and T—>'x.16 Herey' is the left
inverse of y, ’x is the right inverse of x, and the sequences y'y, x'x (but not,
for example, xx') will be cancelled by the device here proposed.17
It should be stressed that what is essential here is the fact that the inverse
symbols represent the string relations of the word in whose representation
they are included. When a most-nested symbol pair is cancelled, we are
eliminating not a segment of the sentence but a string requirement and its
satisfaction.
It follows that if a word class Z occurs as a freely occurring string, i.e.,
without any further members of its own string and without restriction to
particular points of insertion in other strings, then its contribution to the
well-formedness of the sentence form is that of an identity (i.e. a cancellable
15 The dots separate the representations of successive words, for the reader’s convenience,
and play no role in the cancellation procedure.
16 Correspondingly, if Y is the end of a string inserted to the left of X, etc., then Y—>x\ X^>x;
or Y—>y, X—>'y.
17 In this notation, it will be understood that (xy)'=yV, '(xy) = 'y'x and (x')'=x='('x).
192 Grammatical Analysis
TAN
t a n
’ta ’tn
'an
Analysis of Sentences 193
Then the above examples would cancel down to n in the following sequences
of representations: n, t. 'tn (the star), a. 'an (green star), t. 'ta. 'an (the green
star)- but no representations could cancel green the star (ATN).
Permuted elements require separate representation. Thus saw as a verb
requiring N object is represented vn'; but since the object can be permuted
(as in the man he saw) the verb also has the representation ’nv.
Some consequences of language structure require special adjustment of
the inverse representation:
Delays. Sequences of the form x'x'xx will cancel only if scanned from the
right. If the scanning is to be from the left, or if the language also yields
sequences of the form yy'y'y which can only be cancelled from the left, then
it is necessary to insert a delay, i'i, to the right of every left inverse (and to
the left of every right inverse) which can enter linguistically into such
combinations. We obtain in such cases x'i'ix’xx and yy'yi'i’y, which cancel
from either direction. This occurs in English when a verb which requires a
noun as object (and the representation of which is vn'), or certain string
heads like which (in a representation ending in n') meet a compound noun
(the representation of which is n’.nn). The representation vn', for example,
is therefore corrected to vn'i'i, so that, e.g.
take book shelves
vn'i'i n 'nn
cancels from the left (as well as from the right) to v.
Conjugates. There are also certain rare linguistic situations (including
intercalation) which yield a noncancellable sequence of the form z'x'zx. An
x' which can enter linguistically into such an encapsulated situation has to be
representable by its Z conjugate zx'z', which enables the x' and z to permute
and the sequence to cancel. If a string AB, represented by a’a, encircles X
(i.e. X is embedded within AB), once or repeatedly, then the relation of
encirclement requires each A to be represented by xax': then, e.g., AAXBB
yields xax'.xax'.'a.'a, cancelling to x. (Dots are placed to separate the
representations of each capital letter.) In some cases, a particular word X
(such as markers discussed below) requires the presence of particular other
words or classes Y at a distance from it, i.e. with intervening Z, after all
cancellable material has been cancelled: YZXoccurs, but not ZXwithout Y.
The representation of X then has to include 'z'yz so as to reach over the Z
and sense the presence of Y. Thus the representation for than will check for a
preceding -er, more, less', the representations for neither, nor will check for a
preceding not (or negative adverb like hardly) or a following nor: More
people came than I had called-. He can hardly walk, nor can he talk.
Exocentrics. Another problem is that of a string XY which occurs in the
position of a class Z, i.e. which occurs where a Z was linguistically expected:
here the X would be represented by zy', so that zy'.y would cancel to z.
(This XY includes grammatical idioms.)
194 Grammatical Analysis
£ <U Q
(A
^ <
§ *
<U
JZ
c
CT3 <3
JZ -5!
<L>c_
>
Analysis of Sentences 195
lx In XY and XY, it suffices to provide for Xand X, since each Xcancels its Y. 4- represents
and.
8
8.0. Introduction
1 A crucial fact here is that the set of all relative clauses is the same as the set of all sentences
(including resultants of ‘fronting’, 3.4.2), except that an initial argument (or one statably near
front position) has been replaced by a pronoun (3.4.4). The same applies, with certain
modifications, to some subsidiary clauses. These can therefore all be best analyzed as bound
transforms of free sentences.
198 Grammatical Analysis
boundaries (7.7), but also from the fact that transformations, which are a
further property of sentences, can be located in respect to the string
structure of sentences (as is seen in 8.1.4 in the case of English).
However, all word-class components of sentences fall short in certain
respects, as noted in 8.0. For one thing, they do not recognize important
similarities in word class, word-choice, and meaning between certain word-
class sequences. For example, neither string nor constituent analysis provides
in a principled way for the similarities between certain adjunct (modifier)
strings and center strings: e.g between AN and N is A (the long article and
The article is long), or between a relative clause and a sentence minus a noun
(as in which he gave to me yesterday and This he gave to me yesterday). For
another, word-class sequences as components produce not sentences but
only sentence structures: to produce the actual sentence would require
characterization of the specific word-choices, which has always been con¬
sidered a body of data so vast and unstable as to be beyond the powers of
grammar. Finally, many problems concerning the meaning of sentences
were touched upon only tangentially: for example, paraphrase and ambiguity.
To introduce this system of inequalities, we note first that not every word-
sequence is either definitely a sentence of the language or definitely not one.
Many word-sequences are definitely sentences, e.g. / am here, and others
are definitely not, e.g. ate go the. But there are also many word-sequences
which are recognized as metaphoric, or as said only for the nonce, i.e, as
2 Large sections of 8.1.2-3 and 8.1.5-6 are revisions from Z. f tarris. Mathematical Structures of
Language, Intcrscience Tracts in Pure and Applied Mathematics 21, Wiley-Interscience, New
York, 1968.
Analysis of the Set of Sentences 201
extended by the present speaker on the basis of ‘real’ sentences, e.g. Pine
trees paint well, perhaps analogized from such sentences as Spy stories sell
well. Other word-sequences are recognized as containing some grammatical
play; and some are pure nonsense sentences (but may be accepted): The cow
jumped over the moon.
It is also easy to find a word-sequence for which speakers of the language
disagree, or readily alter their judgement, or cannot decide, as to whether it
is a grammatical sentence of the language or not: e.g. He is very prepared to
go; He had a hard crawl over the cliff. There can be various degrees and types
of marginality for various word-choices in a form: e.g. the gradation from He
gave a jump. He gave a step to dubious He gave a crawl, He gave a walk to
objectionable He gave an escape.
Alternatively, instead of speaking of different acceptabilities for word-
sequences as sentences, we can speak of different sentential neighborhoods.
For each one of the marginal sentences can be an acceptable sentence in a
suitable discourse, or under not; whereas the completely nonsentential
word-sequence, i.e. those which do not belong to sentence forms (such as go
wood the), appear without acceptability ordering, and only as subjects of
special predicates such as the metalinguistic ones (e.g. in this case,. . . is not
a sentence). For extending the grammar, the neighboring sequences in the
discourse can supply the source of the extension: While you had a walk along
the valley, he had a crawl over the cliff. For nonsense, the neighborhood
might be nonsensical in the same way, as in a fairy-tale or nursery rhyme.
Aside from the obvious cases of marginal sentencehood, many sentences are
to be found only in particular types of discourse, i.e. with particular
environing sentences; and many of these would indeed be but dubiously
acceptable sentences outside of such discourses. Thus The values approach
infinity is sayable in mathematical discourse but rather nonsensical outside
it.
Acceptability does not depend on the truth of the sentence: The book is
here is a grammatically acceptable sentence even if the book is in fact not
here. It does not even depend in any simple way on meaning (as in The cow
jumped over the moon). While there is a certain connection between
meaningfulness, in the colloquial sense of the term, and acceptability, the
connection is merely that a sentence which is acceptable in a given neighbor¬
hood has meaning in that neighborhood. We cannot assert that it is acceptable
because it has meaning (i.e. because the combination of its word meanings
makes sense), for in many cases a sentence is acceptable not on the basis of
the meanings that its words have elsewhere. If Meteorites flew down all
around us is acceptable, it is not because this is a meaningful word com¬
bination, since flew means here not ‘flew’ in the usual sense but some
movement distantly similar to it. Nor can we say that acceptability exists for
every word combination which makes sense in any way, for there are word
combinations which can make sense but are not acceptable sentences: Man
202 Grammatical Analysis
sleep; Took man book. The air swam around me may have low acceptability,
although it can have meaning.
The standing of a word-sequence (carrying the required intonation) in
respect to membership in the set of sentences is thus expressible not by two
values, yes and no, but by a spectrum of values expressing the degree and
qualification of acceptability as a sentence, or alternatively by the type of
discourse or neighborhood in which the word-sequences would have normal
acceptance as a sentence.
The complicated and in part unstable data about acceptability can be
made available for grammatical treatment by means of the following pair
test for acceptability. Starting for convenience with very short sentence
forms, indicated by ABC, we make a particular word choice in each word
class except one, in this case A: say p in B and q in C, written as Bp, Cq. Then
for every pair of members An Aj of A we ask how the sentence formed with
one of the members, At in (1) A,BpCq, compares as to acceptability with the
sentence formed with the other member, Aj in (2) A]BpCq. If we can obtain
comparative judgements of difference of acceptability, we have for each pair
Ah Aj either (1)>(2) or (1)<(2) or roughly (1) = (2).3 In this case the relation
is transitive, so that if A,BpCq>AjBpCq and AJBpCq>AkBpCq it will always
be the case that AlBpCq>AkBpCq. We would then have an ordering of the
members of A in respect to acceptability in ABpCq. If the judgements which
we can obtain in this test are not quantitative but rather in terms of
grammatical subsets (metaphor, joke, etc.) or language subsets (fairy tales,
mathematics, etc.), then some reasonable classification of the subsets could
be devised so as to express the results in terms of inequalities (At being
acceptable in the same subset as Ar or in a subset which has been placed
higher or lower). The results of the test might also take the form of some
combination of the above. It is possible that we would obtain a relation in the
set A such that for each pair either equality or inequality holds in respect to
ABpCq, but without the relation being transitive. The system of inequalities,
obtained from such a test, among sentences of the set ABpCq will be called
here a grading on the set ABpCq\ each sentence A,BpCq which has a grading
in respect to the other sentences of the set will be called a (sentential)
proposition. As seen in note 3, there may be more than one grading over the
A in ABpCq, if there is more than one transformational source for ABC.
Other gradings can be obtained for all members of A in respect to every
We may obtain more than one answer as to the acceptability difference between A,BltCcl
and AjBpCq. For example. The clock measured two hours may be judged more acceptable than
The surveyor measured two hours, in the sense of measuring off, i.e. ticking off, a period of
time, but less acceptable than the latter in the sense of measuring something over a period of
two hours. In all such cases it will be found that there are two or more gradings (in the sense
defined immediately below ) over all members of A in ABpCq\each grading is associated with a
sharply different meaning of ABrCtl and, as will be seen later, a different source and
transformational analysis of ABpCq. Thus the sense in which surveyor is more acceptable than
clock here is the case where measured two hours is a transform of measured for two hours.
Analysis of the Set of Sentences 203
word-choice in BC. In the same sentence form ABC we can also obtain a
grading over all members of B for each word-choice in A and C, and so for all
C for each word-choice in A B. We may therefore speak of a grading over the
n-tuples of word-choices in an n-class sentence form as the collection of all
inequalities over the members of each class in respect to each word-choice in
the remaining n—1 classes.
Given the graded 77-tuples of words for a particular sentence form, we can
find other sentence forms of the same word classes in which the same /?-
tuples of words produce the same grading of sentences. For example, in the
passive sentence form N2t be Ven by (where the subscripts indicate that
the N are permuted, with respect to N]tVN1), whatever grading was found
for A dog bit a cat, A dog chewed a bone, A book chewed a cup is found also
for A cat was bitten by a dog, A bone was chewed by a dog, A cup was chewed
by a book. And if The document satisfied the consul may be found in ordinary
writing, but The set satisfies the minimal condition only in mathematical
writing, this difference in neighborhood would hold also for The consul was
satisfied by the document and The minimal condition is satisfied by the set.
This need not mean that an 7?-tuple of words yields the same degree of
acceptance in each of these sentence forms. The sentences of one of the
forms may have less acceptance than the corresponding ones of the other,
for all 72-tuples of values or for a subset of them. For example, Infinity is
approached by this sequence may be less acceptable than This sequence
approaches infinity; but if the second is in mathematical rather than colloquial
discourse, so would be the first.
In contrast, note that this grading is not preserved for these same 72-tuples
in N2tVNi (i.e. a sentence structure obtained by interchanging the two N): A
cat bit a dog is normal no less than a dog bit a cat; A cup chewed a book is
nonsensical no less than A book chewed a cup; but A bone chewed a dog is
not normal, whereas A dog chewed a bone is.
To summarize:
1. Inequalities. Using any reasonable sense of acceptability as sentence,
and any test for determining this acceptability, we will find, in any sentence
form A(X, . . . Xn) of the variables (word classes) X\ . . . Xn, some (and
indeed many) 77-tuples of values for which the sentences of A do not have the
same acceptability. Alternatively, given this A, we will find some 77-tuples of
values for which the sentences of A appear in stated different types of
discourse.
2. Preservation of inequalities. Often, we can find at least one other
sentence form B(Xt . . . X„) of the same variables, in which the same
/i-tuples yield the same inequalities; or alternatively in which the same
n-tuples show the same difference as to types of discourse.
The degree of acceptance, and sometimes even the kind of acceptance or
neighborhood, is not a stable datum. Different speakers of the language, or
204 Grammatical Analysis
different checks through discourse, may for some sentences give different
evaluations; and the evaluations may change within relatively short periods
of time. Even the relative order of certain /7-tuples in the grading may vary in
different investigations.4 However, if a set of //-tuples yields sentences of the
same acceptability grading in A and in B, then it is generally found that
certain of the instabilities and uncertainties of acceptance which these
/7-tuples have in A they also have in B. The equality of A and B in respect to
the grading of these /7-tuples of values is definite and stable.
In this way, the problem of unstable word-choices is replaced by the stable
fact that the same (perhaps unstable) word-choices which order the accept¬
ability of sentences in one sentence form A do so also in some other one B.
A sentence may be a member of two different gradings, and may indeed
have a different acceptability in each (e.g. The clock measured two hours,
below and n. 3 above). It will then have a distinctly different meaning in
each. Some sentences have two or more different accepted meanings;
certain ones of these have normal acceptance as members of two or more
different graded subsets, in which case they will be called transformationally
ambiguous. Consider the ambiguous Frost reads smoothly (in the form
NtVD). We have one grading of a particular subset of sentences, one in
which Frost reads smoothly and Six-year-olds read smoothly are normal,
while This novel reads smoothly is nonsense; this grading is preserved in the
form N tried to VD: Frost tried to read smoothly, Six-year-olds tried to read
smoothly, but not This novel tried to read smoothly. In another grading, of
another subset of sentences. Frost reads smoothly shares normalcy with This
novel reads smoothly, while Six-year-olds read smoothly is dubious; this
grading is preserved in One can read Frost smoothly or Reading Frost goes
smoothly, One can read this novel smoothly, and only dubiously One can
read six-year-olds smoothly.
The relation between two sentence forms in respect to preservation of
acceptability grading of/7-tuples is not itself graded. We do not find cases in
which various pairs of forms show various degrees of preservation of
acceptability grading. The preservation of the gradings is a consequence of
this. In most cases, over a set of //-tuples of values, two sentence forms either
preserve the acceptability grading almost perfectly or else completely do
not. The cases where grading is only partially preserved can be reasonably
described in terms of language change, or in terms of different relations
between the two forms for two different subsets of /7-tuples.
When we attempt to establish the preservation of the grading between two
sentence forms, we find in some cases that it does not hold for all the /7-tuples
of values in both forms, but only for a subset of them. For example, the
4 There are special cases involving language change or style and subject-matter difference.
For example, people may accept equally The students laughed at him. The students talked about
war, but may grade He was laughed at by the students more acceptable than The war was talked
about by the students.
Analysis of the Set of Sentences 205
NitVN2/N2t be Ven by N\ relation holds for The clock measured two hours/
Two hours were measured by the clock but not for The man ran two hours (no
Two hours were run by the man). We must then define the relation between
the two forms over a particular domain of values of the variables. This raises
immediately the danger of trivializing the relation: there are many more
word n-tuples than sentence forms, hence for any two sentence forms of the
same word classes it is likely that we can find two or more n-tuples of words
whose relative grading is the same in the two forms. For example, given only
A dog bit a cat, A hat bit a coat, and A cat bit a dog, A coat bit a hat, we could
say that the grading preservation holds between N\tVN2 and N2tVN\ for this
domain. Interest therefore attaches only to the cases where a domain over
which two forms have the same grading is otherwise syntactically recogniz¬
able in the grammar: e.g. because the domain covers all of a given morpho¬
logical class of words (all adjectives, or all adjectives which can add -ly); or
because the domain, or the set of n-tuples excluded from the domain,
appears with the same grading also in other sentence forms and is too large a
domain for this property to be a result of chance.
One can specify the domain of one grading preservation in terms of the
domain of another. For example, the passive is not found for triples which
appear with the same relative grading both in N{tVN2 and in N/tV for N2
(checking all N2, with a given V):
The man ran two hours, and
The man ran for two hours, whereas
Two hours were run by the man is rare or lacking;
as against:
The clock measured two hours, and
Two hours were measured by the clock, whereas
The clock measured for two hours is rare or lacking.
One can also specify the domain in terms of stated subsets of words. Some
of the sets of n-tuples whose grading is preserved as above have some
particular word, or an inextendable subset of words, or the complement set
of these, in one of the n positions. For example, the passive is never formed
from /r-tuples whose V is be, become: no An author was become by himT
In other cases, a subset of words in one of the n positions is excluded from
the domain of the grading preservation, but only in its normal use; the same
words may occur in that position in a different use. In the preceding
example, we can say that the passive does not occur if N2 names a unit of
duration or measurement (mile, hour) unless V is one of certain verbs
(measure, spend, describe, etc.). But in many, not all, of these cases, it is not
5 It will be seen later that most of the problems of restricted domains apply to an aberrant
case: the analogic transformations. These domains are partly due to the compounding trans¬
formation. For the base transformations, the domains are unrestricted, or restricted in simple
ways.
206 Grammatical Analysis
The term 'propositional sentence- or 'proposition- as defined here differs from 'proposition' in
logic, where it represents the set of all sentences that are paraphrases of each other. However, it
is an approach in natural language to the 'proposition- of logic, for paraphrases in language can
be defined only on the basis of propositional sentences. Propositions will later be defined as
particular sentence pairs (8.1.6.2).
Analysis of the Set of Sentences 209
10 The one difference between sentences which goes beyond the difference between their
corresponding propositional forms obtains if a word of an /i-tuple has one form in one sentence
and another in its transform, i.e. if the shape of a word changes under transformation (e.g. -ing,
below). This is covered by the morphophonemic definition of the word; if the change is regular
over a whole grammatical subclass of words, it is included in the morphophonemic trans¬
formations (below). A similar situation is seen when the regularity of morphemic difference
applies in certain cases to families of syntactically equivalent morphemes rather than to
individual morphemes. Thus is some cases -ing and zero are equivalent nominalizing suffixes on
verbs (imposed by certain operators): e.g. He felt distrust or trust, in contrast to a dislike but a
liking. At the same time -ing occurs for all these verbs as a different ‘verbal noun’ suffix
(imposed by other operators): He kept up his distrusting or trusting or disliking or liking.
210 Grammatical Analysis
grammar is made more regular and compact if we assume in these cases that
there was an indefinite noun in the given position. If we do not, then the
many verbs like read have to be both transitive and intransitive; and verbs
like watch have to have not only a nominalized sentence (people’s dancing)
as object but also a verbal noun by itself (dancing).
Finally, in particular environments, there is optional zeroing of words
which are unique or highly favored there (G158). For example, the relative
pronouns which, whom can be zeroed in the book which I needed (to the
book l needed), the man whom I know (to the man l know). It can also be
argued that / ask has been zeroed in obtaining Will he go? from I ask: Will he
go? from / ask whether he will go, and that / request you has been zeroed in
Come! from / request you that you come (this being the only way to explain
the yourself in Wash yourself!). Also it can be suggested that to come or to be
here is zeroed in I expect him from I expect him to be here-, otherwise we
would have to say that expect can have as object not only a sentence (I expect
John to forget, I expect John to come), but also a noun alone (with possible
modifiers), as in 1 expect John. Furthermore, as has been noted, the
gradation of likelihoods of nouns as objects of expect is roughly the same as
that of the same nouns plus to come or to be here as objects of expect. And the
meaning of expect with a noun object is the same as the meaning of expect
with the same noun plus to come or to be here as object.
because it was raining, was given inside; Because it was raining, the concert
was given inside. If C is and, or, but, or certain conjunction-like words such
as so, then CS2 cannot move to before S\. The conjunction can also be be P
(or be AP, e.g. is due to): His departure was with (or: due to) my approval.
He departed with (or: due to) my approval. The PS2, APS2 can then move to
in or before 5^ Due to my approval, he departed.
There is also an important conjunction wh-, though it is not generally
recognized as such (3.4.4). Wh- can join S2 provided there is some
segment occurring in both. This segment (a noun with its modifiers, or PN,
or a nominalized sentence) in S2 is then pronouned into -ich, -o, -en, -ere,
etc., adjoined to the wh-, and the CS2 thus transformed usually moves to
immediately after the segment's first occurrence, in 5) (and to before it when
which is, who is is zeroed): given A man telephoned and The same man would
not leave his name, we obtain A man telephoned, who would not leave his
name and more commonly A man who would not leave his name telephoned.
A necessary condition is that the repeated segment have the same referent,
as above. The wh-S2 is called a relative clause.
Types 1-3 above transform free sentences into free sentences, with type 3
and some cases of type 2 adding material to the original sentence. Types 4, 5
transform free sentences mostly into bound sentences (i.e. component
sentences), although conjunctions can be looked upon as transforming two
free sentences into one free sentence. In any case they add material to the
original.
There are transformations which can be analyzed as products of elementary
transformations of the above types. For example, adjectives before nouns
are the product of a wh- conjunction (type 5) acted on by zeroing of wh-
(type 1) whereupon the residual adjective is moved (type 2, noted under
type 5):
This derivation of the question makes it possible to obtain both the yes-no
question (Will John go?) and the wh- question (as in Who will go?) from the
same transformational sequence on two slightly different kinds of source
(G331). Many other transformational relations among sentences have partial
214 Grammatical Analysis
similarities which suggest that they are composed in part of the same
elementary processes. Such is the passive (1.1.2, G362), which contains the
same -en as the perfect tense (The book was written, John has written it) and
the same by as in nominalization (The book was written by John, the writing
of the book by John). Such also is the set of nominalizations based on -ing
listed in 4 above. The composition of these transformations is resolved by
the more powerful methods of 8.2 below.
8.1.6.0. Introduction
3" z: of N2 —» zero.
1"
K: Frost reads N2.
m: Sn is A -*SAly.
z: N{s —» zero.
Oa: is smooth.
1A K: Ni reads Frost.
10. z: on K2, 3.
9. Ooa: and.
7. Ooa\ and.
3. m: slowly.
2. Oa: is slow.
1. Kl: A man walked.
For a simple example, we consider A man walked and talked slowly. One
decomposition is given in Figure 8.4. The resultant sentences after each
point in the semilattice are:
After point 2: 1, A man’s walking was slow.
After point 3: 7, A man walked slowly.
After point 5: 2, A man’s talking was slow.
After point 6: 2, A man talked slowly.
After point 7: 7, A man walked slowly and 2, a man talked slowly.
After point 9: 1, A man walked slowly and 2, a man talked slowly, and 3,
man in 2 refers to the same individual as man in 7.
After point 10, where z replaces K3 by zeroing those words in K2 which
are identical with corresponding words in Kl (in this case a man, slowly)
and the permuting of the residue of K2: A man walked and talked
slowly.12
The other decomposition is given in Figure 8.5. The resultant sentences after
each point are:
After point 3: 2, A man's talking was slow.
After point 4: 2, A man talked slowly.
12 The kernel-sentences are numbered in order of appearance in the lattice.
224 Grammatical Analysis
8. z: on K2,3.
7. 000: and.
5. 000: and.
form n subsets of sentences, each subset consisting of that sentence and the
relevant distinguishing partial sentences. For the examples above, the
sentence-pairs are:
1. (Frost reads smoothly; Frost reads N),
2. (Frost reads smoothly; N reads Frost);
1. (A man walked and talked slowly; A man walked slowly),
2. (A man walked and talked slowly; A man walked).
We can form a set of subsets of sentences as follows: For each ambiguous
sentence, a subset is formed for each different decomposition of the sentence
and the differentiating partial sentences of that decomposition, these being
in some cases simply the kernel-sentences and in other cases the kernel-
sentences carrying a particular transformation. For all non-ambiguous
sentences, the subset is simply the given sentence itself. This set of subsets
of sentences contains no ambiguities, and maps isomorphically onto the set
of propositions of the language.13 This set of subsets can be obtained from
the set of sentences by means of the elementary transformations, and
provides us with a set of sentence-pairs (or more generally sentence subsets)
containing no ambiguities, which is expressed purely in terms of sentences
(subsets of sentences) without explicit reference to the analyses or gradings
of the sentences.14
13 The isomorphism with the set of propositions depends on the fact that no word-sequence
can appear twice in an acceptability-graded subset of sentences. If a sentence is n-way
ambiguous, it appears in n different acceptability-graded subsets of sentences, i.e. in n
partitions of the set of propositions.
14 Since ambiguous sentences have various transformational decompositions corresponding
to the various grammatical meanings, and since the traces of these various transformations
affect the word-repetition which is required in all regular SCS . . . CS and discourse
neighborhoods in which the ambiguous sentence appears (9.1), it follows that each n-way
ambiguous sentence appears in n different sets of SCS . . . CS or discourse environments. The
neighborhood therefore differentiates the different meanings of an ambiguous sentence (i.e. in
most cases a sentence is not ambiguous in its neighborhood, and can be disambiguated by
appeal to its neighborhood), unless the differentiating parts of the neighborhood have been
zeroed.
226 Grammatical Analysis
The grammars of all languages are not arbitrarily different from each other.
The similarities in basic structure are sufficiently great as to make it possible
to take the similarly formulated grammars of any two languages and
combine them into a single grammar which exhibits that which is common to
the two languages and relates it to the features in which the two grammars
differ. Explicit and rather obvious methods for doing this can be formulated,16
and can be so devised as to exhibit the structural differences between any
two grammars.
In particular, the types of elementary transformations of various languages
are quite similar one to the other. The elementary transformations of a
particular type in one language have approximately the same semantic effect
15 For example, one can state finitely the meaning of any number of iterations of very at any
point in which it occurs; one can state in a (finite) sentence the meaning of n iterations of, say,
son’s or of son’s daughter’s as a function of n. The sentences which describe the information of
each iterable part can then be conjoined by logical and to make one sentence giving the
information that is contained in all iterations of all parts of U.
16 Z. Harris, ‘Transfer Grammar’, International Journal of American Linguistics 20 (1954)
259-70.
228 Grammatical Analysis
is avoided. Words which have several different types of object are shown to
have a single object-type with transformations on it (e.g. by the side of /
expect John to be here, the form I expect John contains not a different object-
type for expect but merely a zeroing of to come). Nominalized verbs and
adjectives (e.g. absence, and so most abstract nouns) are verbs and adjectives
(absent) under a further operator. And much of morphology is treated as
residues of transformations (e.g. His boyhood was harsh is a transform of
some such sentence as His situation of being a boy was harsh—as indeed it
was historically). The success in doing all this shows that transformations are
not an artifice or short-cut of structural description but a real relation among
sentences and a real property of language.
The inadequacy of word-sequence as sole characterizer of sentences has
led all grammatical models to characterize sentences as (regular) sequences
of (regular) sub-sequences of words, these sub-sequences being in general
not themselves sentences (e.g. constituents, strings). In transformational
grammar these entities which are in general larger than words but in general
smaller than sentences are shown to be simply the elementary sentences of
the language.
With the preservation of word-choice comes meaning-preservation.
Somewhat as two words which have the same selection (co-occurrences) are
synonyms, so are two grammatical constructions which support the same
selection; and indeed the transformations are found to be paraphrastic,
aside from considerations of style and nuance. Transformational grammar
also states grounds for a sentence being ambiguous (if it results from
different transformations on different sources)—this is in contrast to lexical
synonymity and lexical ambiguity.17 Furthermore, because the domains of
many transformations are fuzzy, transformations create various types of
marginal sentences, intermediate between the word-sequences which are in
the language and those which are not. In these ways transformational
analysis brings some meaning-distinctions into grammar, and—what is
more important—separates the meaning-bearing arrangements of words
(e.g. in the structure of each elementary sentence) from the non-meaning¬
bearing arrangements.
In respect to the structure of each sentence, any transformations that take
place in it are located in the partial order of the words that compose the
sentence, since each transformation takes place precisely at the point at
which all the conditions for which it is defined have been satisfied.
In respect to the structure of the set of sentences, sentences can be defined
as objects which are reached from other sentences by a directed path,
which is a transformation defined as being able to operate at the initial
point of that path. In some situations, it may be possible to define a
grid of transformational differences rather than directed paths; in that case,
17 Differently from lexical ambiguity, this grammatical ambiguity is explicit: an alternative
source.
230 Grammatical Analysis
ls Thus He painted two hours and He painted for two hours are transforms of each other, but
He painted two boys and He painted for two boys are not. (Hours here is in a different subclass of
N. namely that of time words, than is boys.) The artist's painting was of two boys and Two boys
the artist painted are transforms of The artist painted two boys; but Two boys painted the artist
and The boy painted two artists are not.
Analysis of the Set of Sentences 231
8.2.0. Motivation
. . ., producing (as a sentence in its own right) / know sheep eat grass, where
the status of know between its two arguments (/ and eat) is like the status of
eat between its two arguments (sheep and grow). (These are 3, 4, 5 in 8.1.4.)
As to the type (b) transformations, which delete words, or insert a fixed
word or affix, or permute words, these are found to be the resultants of a few
kinds of change in form: primarily reduction of the phonemic shape of words
in the underlying sentence, and the moving of certain words toward the
beginning of the sentence. The underlying sentence, with its word-choices,
remains in the transformed sentence. (These are 1, 2 in 8.1.4.)
The particular development from a transformational to an operator-
argument theory is motivated as follows. We note that each transformation
from simpler source sentence-forms as domain to more complex resultants
as range has more restrictions in the range than in the domain, because each
transformation acts on all or a proper (restricted) part of its domain. Thus in
the resultants of the passive, America was left by me last year is dubious, i.e.
somewhat restricted, in contrast to the normal America was left by the best
writers of the generation and the uncertain The office was left by me at 5,
whereas in the active source I left America last year. The best writers of the
generation left America, I left the office at 5 are all equally normal. In the
English interrogative, if we compare Who met her?, Whom did he meet?, Is
he coming?, Did he come? with the sentences of the source domain He met
her, He is coming. He came, we see in the interrogative form restrictions in
the position of whom (compared to that of her) and in is and the tenses.
This greater restrictedness in the resultants of transformations leads us to
seek other pairs of sentence-forms which satisfy the conditions for trans¬
formation and in which one form is less restricted than the other and is
therefore to be considered the source. One such case is the many affixes for
which an affixless source can be found. A trivial example is pro- before
nouns, derivable from for. He is pro-war from He is for war or He is in favor
of war. Here, ‘X (e.g. pro- above) is derivable from a source Y (e.g. for)'
means, in the observational terms of hearer acceptability, that for all
sentences of the given form containing X, the sentences with Y in place of X
are ordered (as to acceptability) roughly like those containing A: Women are
less pro-war. Children are pro-air, Rocks are pro-universe have among them
much the same acceptability ordering as do Women are less for war. Children
are for air. Rocks are for the universe. In most cases, the sentences with Y
replacing X are a subset of the sentences containing X. Another example is
tin- before adjectives. Under this condition—before adjectives—im-has
the same combinations as less than rather than as not: untrue (but not
unfalse), unhealthy, (but not unsick), unlike (but not undifferent; hence also
unscholarly for unlike a scholar), are replaceable by less than true (but no less
than false), etc.; hence we would consider these occurrences of un- to be
reductions of less than. A suffix example is -less, e.g. in hopeless, derivable
from a free word less, ‘without' (e.g. in a presumed source less hope), which
Analysis of the Set of Sentences 233
is still visible in a year less a day. Less obvious is the derivation of -hood from
state of (being) or the like, asm boyhood from state of being a boy, where it is
known that -hood is a suffixal form of a Middle English free word had which
in free position must have been replaced historically by some such word as
state (G 175-6).
In establishing such derivations, we consider the current affixes, restricted
in their positions and in the words to which they are attached, as reductions
of currently existing free words lacking these restrictions. As another
example, note that when we establish adequate derivations for the inter¬
rogative forms, e.g. I ask whether he came as source of Did he cornel, we find
that the source is free of the special restrictions of the interrogative (G 331).
As we establish least restricted sources from which the various more
restricted forms can be derived, we note that these least restricted forms
have a common structure. The structure consists of certain words, chiefly (1)
verbs, (2) adjectives, (3) prepositions, and (4) con junctions, all of which will
be called operators, appearing after or between (a) nouns or (b) sentences
(including nominalized sentences), which will be called their arguments;
(la) John coughed, John poured water, (lb) John's coughing stopped, John
knows she left, John's coughing caused her to leave, (2a) John is tall, (3a)
John is for war, (3ba) John left for Europe, (4b) Her leaving is because of
John's coughing. She left because John coughed.
This structure is so widespread that it becomes of interest to take every
sentence form that does not fit this structure and seek to derive it from one
that does, even if no restriction is circumvented by the derivation. An
example is the deriving of adjectival adjuncts and relative clauses from
secondary sentences: e.g. / saw a tall man from I saw a man who was tall from
I saw a man; the man was tall, where the secondary-sentence intonation,
written as a semicolon, is the conjunction between the two sentences (type
4b above).
To see an extreme example of a word that does not fit this operator-
argument structure, consider yes. This word seems to be outside the
grammatical classes and their co-occurrence conditions. However, note the
following: to the question Did you close the door? a person who has done so
can truthfully answer Yes, or its equivalent here Yes, l closed the door. But
if, upon hearing the question, he closes the door and then says Yes, or Yes, /
closed the door, he is dissembling and not quite truthful. This means that the
Yes is not related to the tense of the answer (/ closed the door, which he had
indeed done before answering), but to the tense of the question: Yes could
be said truthfully only if the door had been closed before the question was
asked, since the question was in the past tense. Hence yes is a referential
word (5.3), containing a repetition of the question (and not a newly made /
closed the door). Yes is thus an O,, (information-giving) operator on the
question-sentence, something like / say yes to your saying ‘Did you close the
door?’. It is thus not unrelated to the operator-argument structure.
234 Grammatical Analysis
It turns out that we can find operator-argument sources for almost all
words and sentences of the language, with few definite restrictions on word-
combination other than those of the operator-argument structure. Further¬
more, the transformations that produce the derived sentences out of the
sources are found to consist almost entirely of reductions in the phonemic
composition of words, specifically of those words which add little or no
information to their operator or argument in the sentence. (Nevertheless,
various languages have important and common transformations (deform¬
ations) which are not readily analyzable as reductions, such as are seen in
The organizing of the event was his, and Her cooking is mostly of vegetables,
or in pairs such as John is the author, The author is John.) We thus approach
a situation in which, first, the only restriction on words in the source
sentences is the operator-argument relation—i.e. this relation creates
sentences—and, second, virtually all other sentences can be considered to
be merely reduced (or otherwise transformed) forms of their source sentences.
This is the system described in Chapters 3-5.
Compare (1) / locked the door after washing with (2) I locked the door after
washing myself. In (1) it might seem that I serves as first argument (subject)
for both lock and wash; but in (2) we see from -self that wash had its own first
argument, which was zeroable as being a repetition of the one for lock. Thus
auxiliary-like verbs (‘preverbs’, G289) such as take in I'll take a look, and
even can in I can go, and have in I have gone are not taken here as some
special expansion of the following verb, but are analyzed as being operators
with their own subject, which is however the same as the zeroed subject of
the following verb (which is the second argument of that take, can, have).
We thus arrive at a theory not merely of sentence transformation, or of
derived sentences, but of sentence construction ab initio: it deals not with
relations among already-extant elementary sentences, but with the formation
of word-sequences, first into elementary sentences and then in the same way
(i.e. by satisfying the dependence) into all base (unreduced) sentences, and
finally with the reduction or deformation of these sentences to yield the
other sentences. This theory represents first a dependence process of word
juxtaposition (operator on argument), with inequalities of the likelihood
(from zero up) for individual words to occur with particular other ones. On
these word juxtapositions there acts the shape-changing process (kindred to
morphophonemes, 7.4), which consists primarily in reducing words whose
likelihood in the given operator-argument situation is very high. These
shape-changes on words, when suitably chosen, can produce the same
'derived’ sentences as can the transformations on sentences. The 'derived’
sentences are thus the ‘same’ sentence, with the same meaning.
When this is done, it is seen that the cumulative construction of a
sentence, and the likelihoods in word-choices, and the word changes, are all
defined in terms of the finite (though large) set of operator-argument word-
pairs (or triples) rather than on the unbounded set of sentences. It is seen
that the high likelihood on which word-reducibility depends is apparently
definable, in a complex way, in terms of word-pair frequency in specified sets
of discourses, whereas the transformations and the well-definedness of the
set of sentences depended on subjective evaluations of sentence acceptability.
And since the changes are only in the shape of a word (even if to zero) and its
position, but not in its having been chosen, or in its operator-argument
relation to other words in the sentence, it follows that the choices and
operator relations of the words in the sentence—on which the meaning of
the sentence depends—are not altered by the changes, something that does
not clearly follow in the case of transformations.
The two characteristic processes, of operator-argument dependence and
reduction in the shape of high-likelihood words, leave room for additional
processes or entities which may appear in one language or another. For one
thing, the operator-argument dependence creates a partial ordering of
words. Since speech is in essential respects linear, mapping the operator-
argument order onto a linear one leaves room for cases of alternative
236 Grammatical Analysis
20 This analysis is needed because some words when under particular operators change their
further co-occurrents (3.2.2). Hence one would seek a zeroed intervening operator in the
reduction.
Analysis of the Set of Sentences 239
on the base; it is indeed reasonable to suppose that such may have been the
distant history. Such a solution is clearly appropriate in Semitic, where the
simplest verb form can be the source from which other verb forms and
related nouns are obtained; in addition, some simple nouns would be taken
as sources for verbs.
More generally, the operator theory does not intrinsically require that the
source form of operators and arguments be free words. The fact that to some
extent affixes can be derived from free words makes it possible to see the
similarity in operator-argument structure between languages which have
morphology and languages which do not. But it is not essential for the
present theory.
The problem is different if we wish to construct words like the English
conceive, perceive, contract, etc., let alone dazzle, frazzle, etc., out of two
free words, and more generally out of an operator-argument relation.
There is not enough regularity to permit this in English. Furthermore, the
small number and encapsulated nature of these composite words allow us to
consider them as indivisible entities in the syntax; their composition is then
a semantically relevant pre-syntactic structure in the vocabulary.
Similarly, if nouns consist of a stem plus a required case-ending, we would
take the ending, at least in the nominative, as a required indicator of
argument position. Then for the operator-argument analysis the stem
would function as a free word, though in being said it would carry the
indicator. Likewise, if nouns have gender affixes with no consistent male-
female pairing, the affixes could be considered argument indicators; but if
they have paired male-female affixes, one of those affixes (depending on
morphophonemic convenience) would be considered as (male or female)
operator on the base noun (which, in the absence of this operator, would
mean the other gender and would contain the other affix as noun indicator
rather than as gender indicator).
Another aberrant situation is that of reductions which are required rather
than optional. When the required change a follows on an optional one b, we
can say that the ordered pair ba together is optional. In (1) the glass which is
broken, the zeroing of which is is optional, since (1) exists; but once which is
is zeroed certain residues must move to before the host, yielding (2) the
broken glass', the two changes together change (1) optionally into (2). When
the required change acts directly on the source sentence, we cannot make it
part of an optional event .John thinks that he is right must be derived from (3)
John thinks that (the same) John is right, but (3) is not normally said. The
ultimate sources, the elementary base sentences—what were called the
kernel-sentences in transformational theory—indeed do not have required
reductions, but the extended sentences of the base, such as (3), may. This
means that not all base sentences are actual sentences of the language, even
though complicated ad hoc restrictions would be needed if we were to
formulate a grammar that would routinely exclude (3). As long as required
Analysis of the Set of Sentences 241
reductions hold for only a few grammatical situations, such as in (3), we can
still claim that the base is a subset of the sentences of the language (i.e. that
all source sentences exist in the base)—aside from the few required reductions
—and that all information statable in language is statable in its base subset.
If, however, required reductions in base sentences were many and diverse,
this claim would have to be given up; the base would then become a
grammatically regular set of forms that underlie the sentences of the
language rather than being a subset of it.
Another problem can arise in the case of analogy. In this process, given a
form Ah together with Af differing from it by certain reductions (or additions
or replacements of words), and given 5, (but no 5y), there is formed a new BJ
having the same overt difference from B,. In most cases, Bf is then under¬
stood as though it had the same reductions or other grammatical differences
from B, as Ay has from A,. In some cases, however, the words of B do not
admit of the changes made in A, or By may have been made on the model of
Aj without Bi existing in the language. If these latter difficulties are very
numerous, the explicatory value of the present theory for the individual
sentences of the language is decreased.
There are also certain properties which are common in languages, but
which are not essential to the adequacy of the theory, although some of them
increase its utility or simplify its recognition of sentence analysis. One is the
fact that reductions are ordered in respect to the operator-argument
ordering. Another is the availability in many languages of operator indicators
and argument indicators; e.g. case-endings are argument indicators for the
zero-level arguments N. A third is having different words in the different
dependence classes N, On, Oa, etc., with only few words multiply classified.
If a language had a great many words which are members of several classes,
then the classes, which in any case are defined as relations among word-
occurrences (‘word tokens’), lose the support of being distinguished by their
word members (‘word types’).
Some features of traditional grammar are readily describable in operator-
argument terms. Such are pronouns (as repetitions); the infinitive (as an
argument indicator making operators into arguments); the status of adjectives,
prepositions, and conjunctions (as operators); the interrogative and im¬
perative (as metasentential operators); the absence of to under auxiliaries
such as can (as zero variant of the infinitive). Other features are more
difficult. Such are, in English: the various nominalizations; the borderline
beween ‘indirect’ objects and adverbial modifiers as in rely on something
(object), sit on something (object or adverbial phrase?), stand on something
(adverbial phrase); the requirement for a before ‘count’ nouns in the
singular. There are also problems with many individual words whose use is
restricted or specialized in various irregular ways: e.g. afford (He can afford
to buy it; no He affords to buy it), supposed to (which differs in selection and
meaning from suppose: I’m supposed to be there tonight is not paraphrastic
242 Grammatical Analysis
unreduced, or because the vocabulary needed for the unreduced form is not
(now) available.
In some cases historical replacement has led to this vocabulary lack, as in
the reconstructed base state of (being a) child for childhood, where -hood is
the modern form of a once-free word meaning ‘state’. But in many cases it is
not a question of historical loss. Rather, it is a matter of the partial
independence of syntax and vocabulary. True, the syntactic structure is
arrived at as a relation among words in utterances. But when that structure is
stated, it is not in respect to specific words—which in any case can drop out
of the language or enter into it—but in respect to the ability of variously
related words to fill that structure. In particular, the N, O,,, etc., classes of
the present theory are defined by certain dependence properties of words,
not by a particular list of available words. Given a particular sentence,
and a reconstructed one from which the given can be reached by known
reductions, the reconstructed form may be peculiar because a particular
word in the given sentence does not normally occur (or no longer occurs) in
the position it is supposed to occupy in the reconstructed sentence, or does
not have normal likelihood in respect to the other words in the reconstruction.
This is easiest to see when the word had indeed occurred there and had then
been replaced. For example, at the time when the percursor of -hood;
(above) had still existed as a free word, we would have had by the side of the
compound noun child-hood (the vowel in it was then a) also the had of (being
a) child, just as by the side of birth-rate we have the rate of birth', the stressed
had was the free variant of the unstressed -had. When, however, had ceased
to be used in free-word position, the word that replaced it there—state or
the like—became the free-word suppletive variant of the unreplaced suffix
-hood, yielding the reconstruction the state of (being a) child.
Because the operator-argument theory refers everything in sentence
structure to just a few principles—and ones which can be considered
reasonable, given the use made of language—the theory has an explanatory
value, showing how the properties of the grammar and those of particular
forms—even of some peculiar ones—arise from these first principles. For
example, we see why derived sentences have the same general structure as
do base sentences (8.2.1, G77); how cross-reference from one word-position
to another is managed without a system of word-addressing (5.3.2); why
grammatical constructions are nested one in another (2.5.4) and why
permutations are limited as they are (3.4.3); the distinction between the zero
reduction of a word and non-occurrence of that word (G136, 150); why
certain transformed (reduced) sentences, including metaphors, do not
support further transformations which their apparent form would admit
(G405); why some of these require modifiers (e.g. not He peppered his
speech but only He peppered his speech with asides)', and why many word
subclasses are fuzzy as to membership (e.g. the auxiliaries, the reciprocal
verbs). In respect to particular constructions, we see, for example, how
244 Grammatical Analysis
tense (G277) and plural (G244) can be fitted into the rest of the grammar,
and why certain peculiar uses occur in them (eh.5 n.5); how compound
tenses fit in grammar (G290); how ‘topic’ permutation takes place (3.4.2,
G109); how the relative clause is formed, and modifiers in general (G227);
what in a sentence can be asked about and what can carry a relative clause
(G121); how yes-no questions and wh- questions have a common derivation
(G331); the grammatical basis for the distinction between use and mention
of words (5.4); how we obtain the peculiar grammatical properties of the
comparative (G380); how the special limitations of the passive arise (G364);
how peculiar sentences arise such as It is true that John left, It is John who
came (G359); how certain words such as any, but, only come to have uses
that are almost opposites (G327, 401); why one can say He truly came on
time or He will undoubtedly come but not He falsely came on time or He will
doubtfully come (G306).
As an analysis of language, the operator-argument theory has certain
overall advantages:
First, it covers a language more fully than has been expected of grammatical
theories. It can in principle yield not just the sentence forms but also the
actual sentences, and this with the aid of a list not of elementary sentences
but just of normally likely operator-argument pairs; to prepare such a pair
list, with the aid of many fuzzy subclasses of words, would be a tremendous
task which no one would undertake, but it can in principle be done. A word-
sequence which is fj-ways ambiguous on grammatical grounds (e.g. G143) is
obtained not once, as a single sentence, but n times, as n sentences
originating from different sources via different reductions which degenerately
yield the same word-sequence. The theory can also account for various kinds
of nonce-forms, daggered reconstructions (as above), joke forms, and
fragments, as particular kinds of departures (chiefly by extending the
domain of a reduction) from the main, but not well-defined, body of attested
sentences. At the same time, the theory does not overproduce, does not
yield word-sequences which are not sentences. The product of the theory
includes every contribution to the substantive information of a sentence, as a
fixed interpretation of each step in producing the sentence. And it does this
entirely in accord with the apparatus of the grammatical theory (especially
selection), without appeal to any mental structures.
Second, the operator-argument theory is intended to be maximally
unredundant in that everything that is constructed is then preserved under
further construction. In particular, the existing vocabulary is utilized more
fully, as when affixes are obtained as positionally restricted reductions of
free words (which are already attested otherwise), or when pronouns are
obtained as reduced repetitions of a neighboring word. The existing con¬
structions are utilized maximally, for example in keeping multiple classi¬
fications to a minimum, or when the intransitive read is obtained by second-
argument zeroing from the otherwise existing transitive read (He reads all
Analysis of the Set of Sentences 245
the time from He reads things all the time). The meaning-interpretation is
simplified by showing the reductions and alternative linearizations to be
substantively paraphrastic. All this means that fewest constraints have to be
stated in order to obtain the sentences of the language and to relate them
relevantly to each other.
Third, the theory provides an effective procedure for characterizing
utterances. It does so by locating all its events in the finite though large set of
operator-argument constructions rather than in the unbounded set of
sentences (let alone whole utterances). And it retains its effective property
by having only stated types of deformations—primarily reductions and
alternative linearizations—and by having one necessary though not sufficient
condition for determining the domain of almost all reductions, namely low
information (high expectancy) in respect to the immediate operator or
argument. When grammatical events are formulated in such a way that the
relevant environment for them is the operator-argument pair rather than
the sentence, then the reduction of a word can be based on its low
informational contribution, ultimately referable to its relative frequency in
the given position, rather than on the acceptability of the sentence as
reduced.
Lastly, because the events of construction and of deformation are uniform
in their basis—and because nothing extra-linguistic is presupposed—it
becomes possible to compare structurally, and to measure in respect to
complexity (and the cost in departures from equiprobability), each analysis
of a sentence (especially if more than one analysis is possible), and also the
sentence structure itself, and even the whole language.
Most generally, one reaches a fundamental pervasive structure consisting
of a few conditions (constraints) on word combination, with the successive
or simultaneous satisfaction of these conditions producing the specific con¬
structions of language occurrences. What is gained here is not only the
simplicity of the structure, but also the finding that each of these elementary
constraints makes a fixed contribution to the meaning of every sentence in
which it appears.
The major cost of reaching a simple and regular syntactic system has been
found to be moving many form-differences from syntax into morpho¬
phonemics, i.e. into the phonemic shapes of morphemes (words). However,
the phonemic shapes of words are ‘arbitrary’, i.e. they can only be listed
since there is no regularity which would enable them to be derived. And
various morphemes in various languages have more than one phonemic
shape (7.4). Therefore when we replace morpheme-combination differences
by phonemic-shape differences (e.g. by saying that -hood is a suffixal
phonemic shape of state, or that different declensions are just matched sets
of different phonemic shapes of the case-endings), we are only adding to
existing lists of the phonemic shapes of words; we are not bringing a new
degree of freedom into the grammar. Furthermore, the total phonemic
246 Grammatical Analysis
content of words is neutral to syntax: it does not correlate with the meanings
or the combinability of the words. Hence moving a form-difference into the
stating of phonemic shapes of words is moving it out of the stating of
syntax—so long as this moving preserves the selection (combinability) of the
words affected, e.g. so long as the selection of -hood as an independent suffix
is included in the selection of state of which -hood is to be considered a
variant.
In evaluating any theory of syntax, certain limitations must be kept in
sight. There exist additional processes of considerable importance in the
formation or availability of sentences in a language, or in the way speakers
understand the sentences (i.e. relate them to others). Such processes are:
the phonemic similarity among syntactically related words, morphology,
borrowing, historical change in phonemes and in use-competition among
words, and above all analogy. These processes are presently external to any
systemic syntactic theory, but they affect considerably the form of many
sentences which the syntactic theory has to account for. Since the present
book offers not a survey of syntactic description but rather a theoretical
framework, it suffices here to note that given the forms which result from
these other processes, we can reconstruct for them operator-argument
‘sources’ in the same general way we use for attested sentences in the
language. The forms are as though derived by the known types of deformation
from the known types of sources. This is the case for all established
sentences except some listable, usually frozen, expressions.
Some features of language remain effectively outside the syntactic frame¬
work: for example, the phonemic composition of words, and the quasi-
morphemic composition of certain words (e.g. the si-, gl- words, the -le
words, and the con-, per-, -sist, -ceive, etc., set). Other features are only
tangentially disturbing to the operator-argument structure: for example,
popular etymological perception of particular words, morphological paradigms
over classes of words, and the intentional departures in vocabulary and
syntax seen in argot, linguistic jokes, and in some literature.
There remains one analysis which is outside current syntax but which
would simplify the application of the operator-argument construction. This
is factoring words into syntactically more elementary word-components. In
general, syntax starts from the existing vocabulary. Any detailed syntactic
theory will have to admit some regularization of the vocabulary, such as
combining restricted suppletive words into a single word (is, am, are into be\
-s, -z, -en, etc. into a plural morpheme), recognizing some suffixes as
variants of free Morels, etc. The present theory goes beyond this in order to
give operator-argument sources for words that do not seem to accord with
operator-argument structure (e.g. the from the phrase that which is, or
which, who, etc. from wh- plus a pronoun; also able or the like as source for
can). Such analyses are not based on any semantic criterion, which is
not controllable, but on their usefulness in reconstructing appropriate
Analysis of the Set of Sentences 247
The considerations of 8.2 show that the words of a sentence have the relation
of a branching process in which the dependence of a on b, . . ., d is
represented by each of b,. . ., d branching out of a, with b,. . ., d linearly
ordered (because likelihood is not preserved among the different orderings
of co-arguments, as in John ate cheese and Cheese ate John). More specifically,
for all word-occurrences a, b in a sentence S, we set a^b for the case where
either b is the same as a, or where b is reachable from a by a sequence of
dependings. When the length of this sequence is not zero, a>b: and when it
is zero, b is identical with a. Here a is an upper bound of a subset Z of word-
occurrences in S if a^c for all c in Z, and it is the least upper bound (l.u.b.) of
Z if a’^a for every upper bound a' of Z. The word-occurrences of a base
sentence have the join-semilattice property, namely that every two occur¬
rences have a l.u.b. in the sentence. If a>b and there is no c such that
a>c>b, the a is said to cover b, or a is the (immediate) operator on b, and b
is an argument of a. Here there is the added property that the word-
occurrences covered by a are linearly ordered; they are called co-arguments
of each other: e.g. John and cheese in John ate cheese. The cover relation is
the previously described dependence relation among word-occurrences in a
sentence, and is the one elementary relation of sentence construction. Also,
every subset of S which contains its l.u.b. is a component sentence of S. A
join-semilattice property of word-occurrences is that no two occurrences a,
b have a greatest lower bound other than a or b themselves. This semilattice
differs in various respects from the transformational lattice of 8.1.5.
248 Grammatical Analysis
limitations. One is that the theory contains only a necessary but not
a sufficient condition for reduction, so that the specific reductions and
permutations in each language have to be listed. The second is that indi¬
vidual languages may have many and prominent regularities which are not
germane to the operator-argument structure, such as phonemic similarities
within or among particular subsets of words, or morphological paradigms.
The third limitation is that some languages have created local restrictions
on the general system, by letting certain sentence-making steps become
requited rather than optional: in a given situation, a particular word may be
required (e.g. repetition of subject under can, may), or a reduction may
be required (e.g. zeroing of this repeated subject).
Differences in these respects give languages different grammatical features
(aside from the differences in vocabulary—in phonemic content and com¬
binatorial ranges). Some ('inflectional') languages have interrelated systems
of affixes, others have non-systematized affixes strung onto host words or
onto stems, and yet others have no explicit affixes. For the operator-
argument theory the affixes are of little moment, since they are equivalents
of free-word operators on the host words, but for the grammar of the
individual language these differences are important. Some languages also
have for all or many words special forms which are foreign to the operator-
argument relation, but can be shown not to violate it. Thus in languages
which lack enough free words, because most words have stems or roots
which do not occur without one or another of a set of attached morphemes,
one can (as noted above) consider the combination of stem with one
attached morpheme to constitute the base free word, with the other forms
obtainable from it by known operations of the grammar. All such features,
including the fixed phonemic (and syllabic) properties which words have in
some languages, are important regularities in the description of the language,
but secondary here to its operator-argument and reduction structures.
In addition to these word-structure properties, many languages have word
subclasses with important regularities which again do not violate the operator-
argument and reduction systems but rather are in many cases by-products of
them. Many of these subclasses are fuzzy in their properties (e.g. the
'human' class in English) or in their membership (e.g. the English auxiliaries).
In certain cases, members of a subclass constitute a required or virtually
required participant in sentence composition, above all in the case of
grammatical paradigms: tense and person, with aspect in some languages;
also noun-case, gender, plural, and agreement. Many of these overtly
required constructions create grammatical categories for the language. Such
categories may apply semantically even to simple words which have no
explicit construction, if the absence of the construction means a particular
value of the category (e.g. if a word without plural affix is singular,
in meaning and in agreement, rather than being unspecified as to
number).
250 Grammatical Analysis
22 Cf. ch. 4 n. 1.
252 Grammatical Analysis
23
Cf. ch. 3 n. 4.
IV
Subsets of Sentences
9
Sentence Sequences
' There is no further constraint on word classes because sentence boundaries were reached
and defined precisely as the point beyond which no constraint was carried over stochastically
from the preceding word classes (7.7)
256 Subsets of Sentences
we need word-repetition among the components. There are cases where the
unexceptionability is retained without word-repetition; but then we can
always find zeroable sentences which would provide the word-repetition and
which we can reconstruct as having been present, but well known and
therefore zeroed. 1 hus given (4) We stayed at home because John was tired,
where S]0ooS2 is as unexceptionable as 5), S2 but without repetition, we can
assume some explanatory zeroed sentence connecting John with we (e.g.
John is one of us, or John is our son).
The zeroed adjoined sentences which provide the missing repetition are
in many cases matters of common knowledge, or definitions, which are
zeroable as adding nothing to the information in S\OoaS2. For, example, in
(5) I wrote it on Tuesday because l had to hand it in Wednesday morning, the
conjoined repetition-supplying sentence would have been and Wednesday is
the day after Tuesday, or the like (2.2.3, 2.3.2). When our supplying the
presumed zeroed sentence OaoS3 raises the acceptance of S{000S2, as in (2),
it is because the 0ooS3 of (2) was not known to the hearer, whereas in (4) and
(5) it is known. Thus a sequence S\000S2000 . . . 0ooSn is generally as
acceptable as the least acceptable S, in it if each component St contains at
least one argument or operator which is also present in some other com¬
ponent Sj (before reduction). The above sequence can be shortened without
loss of acceptability by zeroing any S in it which is common knowledge, at
least to the speaker and hearer. Omission of any other S may make the
sequence less acceptable than any of its components. The fact that sentence¬
connecting has this property leads to important results about the structure of
discourses and the nature of argumentation, as will be seen below.
9.2. Metasentences
girls the subject of the word wanted; and He asked for cardizem; the word
cardizem here is the name of a medication (usually reduced by ‘abus de
langage’ to He asked for cardizem, which is the name of a medication).
The metasentence adjunctions can fill many functions for the host sentence.
They can state that a word-occurrence X in one position is the same word
(with the same selection, i.e. in the same sense), or that it refers to the same
individual, as a word-occurrence Yelsewhere; in this case the metasentence
can be replaced by reducing A" to a pronoun (possibly zero) of Y. For
example, I saw some high-flying kites, which are hard to keep up from / saw
some high-flying kites; kites—the preceding word refers to the same as the
prior word—are hard to keep up; note that neither the which nor the
sameness statement that is its source would be said if the prior occurrence of
kites referred to the bird ‘kite’, since the predicate in the second sentence is
in the selection only of the ‘toy’ occurrence of kite (5.3.3).
The metasentences can state that the speaker is talking about something,
in which case the word for that thing in the host sentence can be pronouned
as a repetition from the metasentence, which is then zeroed; this gives a
demonstrative (deictic) meaning to what is constructionally just a repetitional
pronoun, as in This coffee is cold from, say, / am speaking about some coffee;
this coffee is cold (5.3.4). The metasentences can also supply grammatical
and dictionary information about all parts of the host sentence. When they
do, the sentence—host plus its metasentences—becomes a self-sufficient
universe requiring understanding only of the given metalinguistic material
(grammar and dictionary) but not of the host sentence itself, since the latter
is.explained by the metasentences.
That this is indeed the case has been remarked above (5.2), namely: we
can take an arbitrary sequence of English phonemes that satisfies English
syllabic structure, and can make it an English sentence if we adjoin to it
appropriate metasentences, for example ones stating that an initial sub¬
sequence in it is the name of a scientific institute, and that the middle sub¬
sequence ending in a (spelled) 5 is a term for some laboratory technique, and
that the final segment is some science object (e.g. a chemical compound).
This means that when we hear a sentence which we accept and understand as
English, it is because the metasentences which performed the above kind of
task for it had been zeroed as being common knowledge. Thus a sentence of
a language is simply a phonemic sequence of it with known or guessable,
usually zeroed, metasentences identifying its successive segments as entities
permitted there by the metalanguage and usually known by the hearer.
Since different grammatical and dictionary metasentences may apply to
the same sentential phoneme-sequence, the zeroing of them may leave the
phoneme-sequence ambiguous. This takes place when the metasentences
state different reductions on different operators and arguments, which
happen to be degenerate as to their resultant phoneme-sequence. Another
kind of degeneracy results when metasentences are erroneously zeroed; for
Sentence Sequences 259
example, if we zero a metasentence which identified a word in the host as the
name of an object or relation rather than as referring to the object or
relation, we would obtain the effect of ‘abus de langage’ as above.
2 Z. Harris, (a) Discourse Analysis Reprints, Papers on Formal Linguistics 2, Mouton, The
Hague, 1963, pp. 20 ff.; (b) ‘Discourse Analysis’, Language 28 (1952) 1-30 for the general
method; (c) ‘Discourse Analysis: A Sample Text', Language 28 (1952) 474-94.
260 Subsets of Sentences
by
7 where [a] is the specific rotation and A the wavelength of the measurement. A and
Ac have a certain amount of theoretical significance but will be regarded here as
empirically determined quantities.
8 The dependence of the optical rotation on experimental conditions results
from the fact that A is in general a function of temperature, pH, ionic strength,
9 denaturation, etc. Ac, on the other hand, varies only as a result of drastic changes
in the protein system, in particular denaturation by urea or guanidine or titration
10 to a pH between 10.5 and 12.0. The values of Ac and [a]f]' are given in Table I
11 for a number of proteins and related substances. (A2o may be obtained from
12 [a]o by means of the relation A2() = [«]o'(AD2—A2). Ac is not dependent on
temperature within experimental error (approximately ± 50A).
13 The results tor gelatin are taken from CARPENTEk AND LOVELACE.
14 [Table I was inserted here.]
15 [Note to Table I] Except for insulin the specific rotations were independent of
moderate changes in protein concentration.
16 The substances in Table I are listed in the order of descending values of Ac.
17 The table has been divided into two groups to emphasize an obvious pattern,
namely, all the substances in Group A are native globular proteins, all those in
Group B are not.
Reprinted from Biochimica et Biophysica Acta, 15 (1954) 156-7. Table 1 in this paper has
not been reprinted here, because it is not in language form, and only details the information
given in this paper.
Sentence Sequences 261
18 The entries in Table I are too few to permit any certain generalizations but the
indication is that Ac provides a measure of the presence of secondary structure in
polypeptides, having values above 2400 A for the ordered configurations of
native proteins, and values less than 2300 A for polypeptides in disordered states.
19 We are here subscribing to the view that heat and chemical denaturation result in
20 the unfolding of the hydrogen bonded structure of a protein. Clupein naturally
falls into the latter group because the internal repulsion due to its high positive
charge makes folded molecular forms unstable.
The presence of the oxidized A-chain of insulin in Group B was not expected.
— In tact it was hoped that the A-chain would exist as an a-helix in solution and
would therefore serve as a model substance of known structure in the study of
23 denaturation. Instead it was found that the A-chain possesses rotatory properties
which resemble those of clupein very closely, but do not resemble those of insulin
24 itself. Most striking is the fact that the specific rotations of clupein and the A-
chain are virtually unaffected by strong solutions of urea and guanidine chloride.
25 Ordinary proteins, including insulin, undergo changes in specific rotations of
26 100% to 300% under these conditions. These results suggest that the oxidized A-
chain is largely unfolded in aqueous solution and are in agreement with the recent
finding that the peptide hydrogen atoms of the A-chain exchange readily with
D20, whereas those of insulin do not.
27 A detailed report of this work will appear later.
In the table in Figure 9.1, column 1 gives the sentence number in the original,
column 2 gives the sentence number transform. Each following column contains
an equivalence class; that is, the entries in each column have been shown to be
equivalent to each other by the procedure described above. It turns out that
throughout the whole article, every sentence of the transform consists of the
HRLK equivalence classes (or of two or three out of these four), with intervening
verb equivalence classes.
The interpretation is, of course, not that all members of an equivalence class
are synonyms of each other, but that the difference for the given discourse
between any two members X,, X2 of one equivalence class X corresponds to the
difference between the corresponding members T,, Y2 of the class Y with which
they occur. That is, the discourse compares X,, Y\ with X2, T2, or equivalently,
Xu X2 with Yu Y2.
The word subclasses in question can be determined in the following
equivalence relation (transitive except for subscript, and symmetric and
reflexive) on word-sequences:
4 After the equivalence classes have been set up and their relative occurrence studied, we
may find reason to say that in a particular case a=b„ a: i.e. that a particular occurrence of the
morpheme-sequence a is not discourse-equivalent to the other occurrences of the morpheme-
sequence a. (This differs from the case of two sentential word-sequences being identical in form
but not in derivation, i.e. source: see 8.1.5.) The a±acase will happen if we find that accepting
the equivalence in this particular occurrence forces us to equate two equivalence classes whose
difference of distribution in the double array described below has reasonable interpretation.
Such situations are rare; and in any case the equivalence chain has to start with the hypotheses
that occurrences of the same morpheme-sequences are equivalent to each other in degree zero.
The equivalence relation then states that a is equivalent to b if for some n, n>0, we have a=„ b
by the formula given here.
262 Subsets of Sentences
s P C H V R U
7 though
8 and
9 structure
20 because
6 21
22 the polypeptide syst. obeyed within a— one —
experimental
error
23 the polypeptide syst. obeyed within a— one—
usually less
than 1%
24 the polypeptide syst. obeyed within N 1
usually less the specific rotation j
than 1%
7 26
27 but
L w K Metadiscourse S
compared with 2 T
compared with 1
depending on wavelength 5
One phase, 18, of this
research is recorded here
it is of special importance to
the problem at hand
A2) |
A +(-Aac + the wavelength of the >
measurement J
were investigated
5 P C H V R U
34 in particular
35 or
36 or
11 39 Md from"1
40 by means of
M2d
=
relation
12 41
42
13 43 gelatin
17 48
L w K Metadiscourse .5
dependent on experimental conditions 8
A is a function of temperature
A is a function of pH
A is a function of ionic strength
A is a function of denaturation, etc.
Ac varies only as a result drastic changes in the protein 9
of system
K varies as a result of denaturation by urea
K varies as a result of denaturation by guanidine
Ac varies as a result of titration to a pH between
10.5 and 12.0
Ac given in Table I 10
Ac given in Table 1
1 under —
- - under —
in Group B
266 Subsets of Sentences
5 P C H V R U
18 51
52 but
53 polypeptides have secondary structure whose presence
is provided a
measure by
54 proteins native with ordered have values
configurations
55 polypeptides in disordered states have values
19 56
20 59 clupein naturally
falls into
60 because (clupein has) unstable folded
molecular forms
22 63 In fact
64 the A-chain as an a-helix
65 and therefore the A-chain of known structure
as model substance
23 66 Instead
67 the A-chain possesses rotatory properties
68 clupein possesses rotatory properties
69 but insulin itself possesses rotatory properties
24 70
71 clupein has specific rotations
72 and clupein has specific rotations
26 77
78 the oxidized A-chain is largely unfolded
79 and
27 82
L w K Metadiscourse S
The entries in Table I are too 18
few to permit any certain
generalizations
the indication is 53-5
of Ac
>2400 A
<2300 A
of Ac
Group B 20
which the procedure has to take into account, thus enhancing the ability to
build up equivalence classes. Since the transformations, and more generally
the reductions and position-changes, do not change the operator-argument
selection or the substantive information in the sentences, any chain of
environment-equivalences of word subclasses established for transformed
sentences will be valid also for the original (untransformed) text. And since
the grammatical relations, and the reductions and other transformations,
have been established for the language as a whole, and are not being
proposed ad hoc for the discourse which is being examined, there is nothing
circular in applying them. One might however go beyond the whole-
language transformations in one case: one could use any synonym or
classifier of particular words as an aid in furthering the building up of their
equivalence classes, but only if it is clear to everyone involved that these are
indeed synonyms and classifiers of those words in the subject matter and
style of language use of which the given discourse is a case.
This method segments the discourse into recurring sequences of certain
equivalence classes. These classes and the sentence types made of them
constitute a special grammar of the particular discourse. Thus, although the
discourse has to satisfy the grammar of its language, it also satisfies a
grammar of its own based on its particular word-choice co-occurrences. If
we take a translation of the given discourse into another language, we find
that the sentence grammar is then that of the other language, but the word-
choice subclasses and their sentence types are much the same as in the
original discourse.
It is possible, in a discourse, to collect all the sentences which have the
same classes, so as to see how the members of a class change correspondingly
to the change in members of the other classes, and so as to compare these
sentences with sentences having in part other classes. In many cases it is
possible to obtain a rough summary of the argument; and it is possible to
attempt critiques of content or argument based upon this tabulation of the
structure of the discourse.
If we consider many discourses of the same kind, we find that their
analyses are partially similar. For example, in scientific articles it is charac¬
teristic that we obtain a set of sentences in the object-language of the
science, and on some of them a set of metascience operators which contain
words about the actions of the scientist, or operators of prior sciences (e.g.
logical relations). In the table the last column (and some of column 3) is of
this nature.
The result of discourse analysis is essentially a double array, each row
being a sentence of the transformed (i.e. aligned) discourse, and each
column a class of sentence segments which are equivalent by the discourse
procedure. All further analysis and critique of the discourse is based on the
relations contained in the double array. The discourse dependence of later
sentences in a paragraph upon the first sentence is different from the
270 Subsets of Sentences
5 Cf. the analysis of James Thurber’s ‘The Very Proper Gander' in Harris, Discourse
Analysis Reprints, Papers on Formal Linguistics 2, pp. 20 ff.
6 An example of the latter is in Harris, 'Discourse Analysis: A Sample Text', Language 28
(1952) 474-94.
Sentence Sequences 271
Sublanguages
10.0. Introduction
related the fact that sublanguages have the dependence of 3.1.1, but not in
general the dependence-on-dependence of 3.1.2.
A sublanguage can differ from the whole language by omitting some
grammatical properties of the language. This is possible because many of the
constraints that create grammar are satisfiable independently of each other.
A sublanguage may happen not to contain the conditions for a given
constraint. The subset of sentences which forms a sublanguage must satisfy
the whole-language grammar, but it can do so vacuously in the case of some
of the constraints, which are thereupon omitted in the sublanguage grammar.
And it can satisfy in addition certain other constraints which are washed out
in the rest of the language.
In Chapter 10 we will see sublanguages based on grammatical conditions
(10.1-2), and those based on subject matter (10.3-4), also formula-sub¬
languages of science which can be considered distinct linguistic systems
(10.5). There will be also comparisons of natural language to other systems
of a linguistic character, specifically mathematics (10.6), and to entirely
different systems of symbols, chiefly music (10.7).
English, tense, plural, the/a, referential indication., and possibly the relative
clause in the base sentences. It may be noted that, since the base structure is
more universal than the reductions, the base sentences of one language
translate more easily into those of another than do the transformed sentences,
so that practical considerations are involved here.
There are also various distinguished subsets of sentences which have
grammatical and informational properties but fall short of satisfying the
conditions of being sublanguages. Such are classifier sentences, definitions,
and ‘general sentences’ such as those that express laws of a science. For
example, in classifier sentences it is not always possible to distinguish
formally or semantically between predicates indicating classification and
predicates indicating other properties (e.g. in Cats are mammals. Cats are
hunters), or to specify the ultimate objects of classification (e.g. A horse is a
mammal, A roan is a horse, A stallion is a horse. The stallion is a roan).
In all cases of grammatically based sublanguages, sentences can be
usefully characterized not only by their decomposition but also by the
sublanguages (or more generally the characterized subset of sentences) into
which they fit. To a lesser extent, this holds also for the subject-matter
sublanguages, and for the subsets of sentences which do not quite make the
sublanguage grade, as above.
metalanguage, Li+l, of L,. As noted above Lx has a structure of its own: its
zero-level arguments include all entities in utterances of L0; its first-level
predicates include, for example, differ phonemically, is a phoneme, is a
word, operates on, occurs with; its second-level operators include highly
likely, is more frequent than. The grammar of Lx is written in L2, which is the
metalanguage of Lx; like everything which is not a whole natural language,
L, has an external metalanguage, and L2 is a sublanguage of L0, not of Lx.
L2 in turn has a structure of its own. Its zero-level arguments are not the L0
entities themselves but the classes and relations of them as established in Lx,
hence in effect the predicates and operations of Lx, such as phonemic
distinction, grammatical events, occurrence with, high likelihood; its operators
include is partially ordered, is less redundant, but not, for example, is more
frequent than, because, differently from whole languages, the relative
frequency of words in a sublanguage (here, Lx) has no grammatical relevance
to the sublanguage (10.3-4). Proceeding further, the grammar of L2 is
written in L3, the metalanguage of L2. Here again we find certain differences.
The arguments in L3 include the predicates of L2 such as classification,
redundancy; in addition, certain operators such as is a word, occurs with, will
be found in all grammars including those of the metalanguages. After stating
these elements as a grammar of L3, written in L4, we find that the grammar
of L4 is formally identical with that of L3. Its referents are different, in that
the grammar of L3, in L4, refers to sentences of L3, while the grammar of L4
(written in L5) refers to those of L4. But the kinds of argument and operator
classes needed to describe L4 are not different from those needed to describe
L3. Thus, although there is an infinite regress of referents in the meta¬
languages, the form of their grammars remains essentially unchanged after a
certain point. The reason for this syntactic result is that Lx has the form of a
factual description—the grammar of a (natural) language, while L2 has the
form of a theory, describing the syntactic structure of such factual descriptions,
and L3 has the form of a metatheory, describing the syntactic structure of
theories, which then repeats in L4 and thereafter.1
This means that the language of metatheory, that is, the sublanguage
which presents the structure of a theory, itself has a metalanguage which is
structurally identical with itself, rather than being a proper subset of it as is
the case for a whole language, or external to it as is the case for subject-
matter sublanguages. To generate a whole language it may therefore be
1 The chain of metalanguages has a certain similarity to the chain of translated sentences in
the story about the mathematician who knew no English and asked a friend to translate an
article of his into English. In the published form three lines appeared under the translated title:
I thank X for translating this paper and its title.
I thank X for translating the above line.
I thank X for translating the above line.
The change of referent in successive occurrences of above enabled the author to copy line 3
himself from the translated line 2.
278 Subsets of Sentences
we can always define a class or its environment or other relation not by its
constraints but by an external metalinguistic statement. Thus it is possible to
have multiply classified words—elements which in certain environments
belong to one class and in others to another, the distinction being made in
the metalanguage: in the immunology example below (10.4) it will be seen
that in some environments wh- is a form of the special time-connective in the
bi-sentential sentence type of that science, while elsewhere in the science it is
the general subordinating connective that it is in the whole language.
As in discourse grammar, but not in the whole language grammar, a
sublanguage can have phrases as members of a word class. Even in the whole
language it is possible for certain phrases to be considered as members of a
word class. For example, providing, provided, provided that, with the
provision that, all appear as conjunctions, classifiable under Oao\ l will go,
providing he will go too, and correspondingly for the others. But all of these
also have an analysis consistent with their parts, without which they could
not exist in the sentence. We can see this for with the provision that if we
begin with:
Similarly we can see this for the conjunctional provided that, if we begin
with:
And we may see it for the conjunctional providing that, if we begin with
something like:
2 This is also justified by the fact that passive constructions such as is secreted by may not have
in the sublanguage the stative environments and meaning that they have in principle in the
language as a whole. Whereas the restriction of selection and meaning that such constructions
as the passive, or provided, providing, should have may be unnoticed or disregarded in various
sentences of the whole language, in sublanguages they may be systematically disregarded so
that there they have to be accepted not as grammatical constructions but as phrasal members of
a word subclass.
3 The sublanguage subclasses are not merely an extreme case of whole-language combinatorial
likelihood differences. Rather, they provide a classification of the sublanguage vocabulary into
subclasses in respect to combinability, in a way that fits the informational framework of the
subject matter. In contrast, the differences in likelihoods of combination that remain among the
words within a subclass (after synonymy has been eliminated) reflect differences in meaning of
the various terms within the current classification knowledge. Thus the sublanguage classes are
specializations of the dependence relation (3.1.1), not of the likelihood relation (3.2). An
opposite analysis was made for the auxiliaries (3.2.4), whose combinatorial peculiarity (lack of
a different second subject) was taken as zero grade of likelihood rather than as dependence.
The justification there was that auxiliaries were a small set of operators whose combinatorial
peculiarity could be readily ‘cured’ by supplying a synonymous source that was regular; they did
not constitute, as in the sublanguage, a part of a global combinatorial subclassification of words
that proved semantically useful.
282 Subsets of Sentences
(unreduced) form of the sublanguage than there are in the base of the whole
language.4
Aside from the differences between sublanguage and whole language, we
can also consider the relations among subject-matter sublanguages, because
the fact that they are all subclass specializations within the operator-
argument structure makes inter-system relations specifiable, much as one
can specify the cost in respect to the operator-argument system of one
another of a given sentence. The transformational analysis of 8.1 has already
introduced the notion that sentences can be identified not only by their
components but also by their relations to other sentences, as transforms of
them. The sublanguage analysis shows that sentences can also be identified
by another inter-sentence relation: by having the same sublanguage properties.
Sublanguages can have various relations to each other. First, some
sublanguages may largely share the same word subclasses, others may have
partly similar structures, somewhat as certain fields of mathematics are
abstractly equivalent. Secondly, sentences of one sublanguage may have
special relations to another sublanguage. For example, certain sentences of
one sublanguage may be arguments in certain sentence types of another: in
Digitalis affects the heart conractility (an example of a common sentence type
in pharmacology), the second argument (The heart contracts) is an example
of a sentence type in physiology. This particular syntactic relation in
sentences of different fields creates a relation of ‘prior science’ between
sublanguages, where one science acts upon events of the other. One may
even think of large parts of language as being covered by various sub¬
languages, possibly overlapping in their sentences. But in any case, the
whole language—colloquial, general, literary, and so on—is not merely an
envelope of sublanguages. There is always a complete, independently
standing language over and above all sublanguages. Informationally, this
whole language may have a relation to the perceived world (11.6) similar to
that of a science language to its science. But there may be little to be gained
from considering the structure of the whole language to be itself a sub¬
language structure.
It is also of interest to consider the similarities between the structures of
subject-matter sublanguages and discourse structures. The main similarity is
that both have special word subclasses and sentence types, but for partly
different semantic reasons. Discourse has these because of the constraints
due to maintaining a topic, to giving information and discussion beyond
separate observations. The sublanguages have these structures because all
4 Pure synonyms, which rarely exist in whole natural languages, are distinguishable in
sublanguages. Within a particular research line for a given two words, if the likelihoods of their
combining with other words become more similar as the corpus of discussion increases, then the
two given words are synonymous for that research line. This does not indicate that they are
synonymous elsewhere. For example, in the immunology material of 10.4, agglutinin and
gamma-globulin are synonyms of antibody, although their specific meanings, and their uses
elsewhere, are not identical.
Sublanguages 283
restrictions. All these appear together in a few sentence types specific to the
science.
In broadest terns, the grammar for the immunology material turned out as
follows (meanings are given here for convenience, but were not used):
GJB, for sentences of the type Antigen is injected into a body part or an
animal.
Glf'TT, similarly for Antigen moves from some tissue to some tissue (the
superscript indicates from . . . to).
TW and CW, for A tissue (or cell) has some property or undergoes a
change.
A VC for Antibody appears in, is produced in, or is secreted from a cell.
CYC for Some cell is similar to, or is called, some cell.
CYcC for A cell develops into another cell.
In the cell-transfer studies, in which antigen is injected into one animal,
whereafter lymphocytes are injected (transferred) from that animal to
another, with antibodies then being sought in the second animal, an additional
sentence type is found:
CP’BB, for Cells are injected from an animal into another animal. The P’
with two B differs from the J above, even though injected appears in both
subclasses.
There is a special conjunction, internal to a particular sequence of
sentence types, which is seen or is implicit in almost all occurrences of the
pair GJB and AVC. This is thereafter and its synonyms, marked here by a
colon. It often carries a time modifier, e.g. three days, five hours as in
GJB:'AVC, for Antigen is injected into a body part; three days later (or the
like) antibody appears in cells. Also, Antibody appeared three days after
injection of antigen, and the like. This conjunction takes different gram¬
matical forms (e.g. to in The cell contained antibody to the antigen and even
the relative clause wh- (10.3) ); all of these forms synonymously connect
AVC (or CW or TW) to GJB.
Metascience material, giving the scientist’s relation to the information of
the science, can be separated off. Mostly, this is isolated immediately, as the
highest operator in a sentence: O0 and Ono (e.g. as was expected or Xand Y
have shown that). But there are also cases where metascience operators
combine with operators of the science language proper, as in Antibody is
found in lymphocytes, which could be analyzed as either from (roughly) It
was found that antibody is in lymphocytes or else from We found antibody in
lymphocytes.
The relation of this sublanguage structure to the information of the
science can be seen in various ways. We note first how the known change of
information through time is seen in the change of word subclasses and
sentence types in the successive articles of this period. The class C (cell in
contrast to T, tissue) first appears at the very end of the earliest article, and
thereafter becomes very common. The class S for intra-cellular ultrastructures
appears only midway in the series and becomes common as more cellular
structures become discernible. In C, lymphocytes and plasma cells appear
early, and later we find a welter of different names for lymphatic cells as
Sublanguages 287
more and more cell distinctions are made; many of these names appear only
in particular articles and not throughout. In the subclass V(operators on the
pair A, C), the earlier articles have primarily the member Vt (is found in,
etc.) but the later articles increasingly have Vp (isproduced in), as evidence
of actual production becomes obtainable. Finally, the whole subclass Y
(operator on the pair C, C) puts in a relatively late appearance, both in its
member is the same as, is called, which is used to match various new cell
names that appear in different articles, and in its member Yc (changes into,
develops into), when it becomes recognized that some cell names represent
merely later stages of what other cell names represent.
In sentence types, the earlier A VT (Antibody is found in tissue) is later
replaced by AVC (Antibody is found in a cell), where the neighboring
sentences show that this is an abbreviation or conclusion from A VTX; CWTX
(i.e. Antibody is found in a tissue; certain cells abound in that tissue). Later
yet the AVC occurs without a distantly connected AVT: the antibody is
found directly in the cell. Finally, the later articles introduce a new sentence
type CxYC2 (what has been called by cell name Cx is the same as, or develops
into, what has been called by cell name C2); this becomes necessary as the cell
names proliferate, and especially when it is realized that the plasma cell is
not an alternative to the lymphocyte but a transformation-product of it.
Next, we note how the controversy is reflected in the sentence structure.
The disagreement appears first in the fact that while some papers have
exclusively (1) AVpCy (Antibodies are produced in lymphocytes), others
have A VpCz (Antibodies are produced in plasma cells) to the exclusion of (1)
but frequently accompanied by A VpCy (Lymphocytes have a role in production
of antibodies—as against actually producing them). This superscript r (have
a role in) is in itself an indication of the controversy; it has the same position
(on Vp) in the AVpC sentence formulas as the negative superscript ~ which
denies production, and as the absence of ~ which asserts production; and its
meaning—judging from the English words used and from the different
relations of A VpCy, A Vp~Cy, and A VpCy to their neighboring sentences—
is somewhere between denying and marginally admitting production. Finally,
the disagreement appears explicitly when some of the papers that contain
AVpCz also have AVp~Cy (or else AVpCv under negative metascience
operators); e.g. The experiments clearly refute the idea of antibody production
by mature lymphocytes and Antibody is not produced by small lymphocytes.
It is relevant to the ‘which cell’ controversy that several differences in
experimental conditions were common in sentences involving plasma cells
as contrasted with lymphocytes. In the formulas these differences appear as:
P rather than Jx or P (P indicating more massive and repeated injection of
antigen), ,+ rather than (l+ indicating longer time before antibody
finding), and Vp rather than Vf~ (Vp indicating larger amounts of antibody
found). Thus characteristically GPB:'+AVpCz as against GPB:'~AV~Cy.
In addition, we find AVsCy but not AVSCZ: Lymphocytes (but not plasma
288 Subsets of Sentences
cells) secrete antibody, which can account for their containing less antibody.
Al! this is relevant to the issue of whether lymphocytes produce antibody,
and it is visible in the formulas. There is also a relevant consideration that is
not visible in the formulas, but rather requires a combining of these Cz
formulas with other Cz formulas: this is the evidence that the Cz are large
cells with organelles in which antibody could be stored, which could account
for the plasma cells containing more antibody than the lymphocytes. In
contrast, the Cv are small cells with little internal structure which were
widely considered (without adequate evidence) to be the end of their line.
The resolution was that lymphocytes produced and secreted antibody, then
grew into plasma cells which continued to produce, and store, antibody.
One might ask why the sublanguage analysis is needed to show the
disagreement when it is expressed so clearly. The answer is that in the
sublanguage sentence types, such phenomena as disagreement can be found
on grammatical grounds. The immediate purpose here was to show that the
disagreement which we see in the Lymphocytes produce (or: do not produce)
antiboody sentences, explicitly in the use of refute and not and implicitly in
have a role, appears upon inspection in the superscript position on the Vp in
the formulas. Furthermore, in that position, after the sentences of the
articles have been mapped onto AVC and the other sentence types, the
disagreement can be located a priori and even mechanically, without much
knowledge of the meanings of words: if the texts contain a sentence formula
both with the tilde (~) and without it, then there is either a disgreement or a
difference in experimental conditions. Thus when the sentences of scientific
articles are mapped into sublanguage formulas, as can in principle be done
by computer programs, denials or doubts or hedgings about a given statement if
present can be found in a priori statable locations of the formula for that
statement, and so for other items and aspects of information. This makes
possible various human and machine inspections and processings of inform¬
ation, and also a formulation of the structure of information in the
science.8
Lastly, we come to how the resolution of the 'which cell' controversy
appears in the sentence structures. The information structure that turned
out to be involved in the resolution appears early in the series of articles,
when a new sentence type CYC is introduced. It is introduced for two new
kinds of information: Y without subscript is introduced for is called and is the
same as, for the proliferating cell names; and Yc is introduced for changes
into, develops into, noted above. Implicit in the need for these sentence
formulas is the fact that cell names are artefacts of the scientist's work, and
K More generally, considerations of formula structure may suggest questions about the
interpretation of data. For example, the mere fact that we find both GJB.A VC and GJB.CW,
i.e. that the colon can connect GJB to either A VC or CW, could raise the question as to what
precise relation—of time or of cellular activity—holds between AVC and CW, and what
mechanism represented by the colon has the effect of leading, at least initially, to one outcome
rather than the other, or one outcome before the other.
Sublanguages 289
that one named cell changing or developing into another (CYcC) is the same
fact as a cell undergoing certain development (CW). When evidence ac¬
cumulated that both plasma cells and lymphocytes produced antibody, the
controversy was resolved in terms of Yc: namely by CyYcC2 {Lymphocytes
develop into plasma cells). As before, this is not to say that one would have
known in advance that plasma cells are a later stage of lymphocytes: the
need to have introduced the CYC sentence type does not imply the CyYcCz
case of that type. But it shows that the resolution can be recognized
structurally in the sublanguage, and is related to earlier developments in the
sublanguage; perhaps awareness of the sentence type might have made it
easier to consider the possibility of the CvYcCz case.
In the standardized ways of presenting scientific results, the articles have
different sentence types in different sections: e.g. in the Materials and
Methods section as against the Results section, with certain word subclasses
and also sentence types being common to both sections and certain ones
specific to each section. The Discussion section, in contrast, is not formed
out of new sentence types but mostly out of the Result sentences with special
constraints. These constraints involve primarily quantifiers, classifiers, certain
secondary operators (superscripts, in the formulaic representation), con¬
junctions (more generally, O00 operators), and finally metascience segments
(Ono and O0 operators). Much of the vocabulary of all these further
constraints is common to discussion in many sciences, although certain
classifiers and secondary operators (such as the have a role in above) are
specific to the given sub-science. The science sublanguage structure has
specifiable positions in which we find expressions of the disagreements, the
uncertainties, and the change in knowledge. In the present case, the
disagreements are found primarily in the membership of the C subclass as
second argument of Vp. The sharpest uncertainty is in the r superscript (have
a role in), in superscripts such as + and — for imprecise quantifiers {much,
little), and in secondary operators (as superscripts) and metascience operators
that explicitly express uncertainty {likely, it may be that . . .). Change in
knowledge is expressed by new members of existing word subclasses, by new
subclasses, and by whole new sentence types such as CTC; also by changes in
the uncertainty markers above.
A detailed examination of the sublanguage of the immunology articles
shows that here, as also in other sciences, the boundaries of the sublanguage
have yet to be defined. There are several ways in which we can judge the
boundaries of the early-immunology sublanguage, in each case satisfying the
requirement of closure under the set-theoretic conjunctions and, or of
the whole language, and under the reductions and other transformations of
the whole language.
In the first place, the difference in sentence types between the Methods
and the Results sections could be considered grounds for constructing two
sublanguages, but this is undesirable because of their close and complex
290 Subsets of Sentences
10.5.0. Introduction
The way science sublanguages differ from whole natural languages suggests
that it may be of interest to look at them as distinct linguistic systems rather
than as sublanguages. This view is supported by the fact that when in a given
science we analyze articles written in different natural languages, we obtain
essentially the same sublanguage structure.9 Thus the consistent structure
obtained from all the articles of a field, in whatever languages, is not
necessarily a sublanguage of any one natural language. Furthermore, if we
are freed of the need to state the grammar of the sublanguage in a way that is
y Thus in the immunology study of n. 7 above, analysis of articles written in French yielded
substantially the same formulaic representation as in the case of English articles. It is also
relevant that many articles written in one language are read and discussed by scientists who
speak other languages, and that the peer pressure which is exerted on article writers to adhere
to precise and standard formulations can come from foreign colleagues as well as from same-
language colleagues.
Sublanguages 291
10 It should be clear that the formulas are simply canonical representations of sentences.
292 Subsets of Sentences
in question has this property (C/) carried by restrictive wh- (n. 8 above); or
again is it simply a secondary fact that the lymphocyte in question was
unrestrictively small—permanently or at the time in question—in which
cases the modifier small would be reconstructed into a secondary sentence (a
distinct formula attached here by unrestrictive wh-) The lymphocyte was
small (CVWV)? The difference between these formulaic assignments can be
decisive for how well the formulaic sentence fits into the sequence of
sentences that constitute scientific conclusion-drawing (10.5.4(d)). Of course,
we may not know which representation is intended or is best. But even to
raise the question, and to be aware of the alternative representations and
their implications for the sentence-sequences, is a step toward a more aware
andefficacioustreatmentof scientific information in process of accumulation
(10.5.4(d)).
For formulas to be obtained in a controlled procedure directly from the
texts—or for the texts to be mapped in a regular way onto formulas—
assures the relevance of the formulaic representation, and its coverage of the
texts, and also its sensitivity to change in knowledge. An example of the
sensitivity of sentential formulas to informational change: during the early
1970s, when what was indicated (or understood) by the words site in
biochemistry was changing from a location in a cell membrane to a receptor
molecule there, sublanguage analysis had shown, without appeal to bio¬
chemical knowledge, that site, which was at first clearly a member of the
‘cell-portion’ word subclass, was increasingly getting, as predicates on it,
words that were otherwise predicates on molecules rather than being
predicates on cell-portions; the observations made about ‘sites' were becoming
increasingly similar to observations about molecules (later established as
receptors). The gradual shift in experimental observation was visible early in
the details of the formulas. Somewhat as grammatical transformations can in
some cases show the direction of an ongoing grammatical change, so the
changes in operator-argument environments of a given word can in some
cases show the direction of an ongoing change in understanding.
The text-derived formulas also come closer than the text itself to a pure
representation of the information. It is known that both the selection of
scientific problems and also the views and presentations of the information
obtained are affected by socially fostered fashions and expectations, in
addition to the direct pressure of scientific developments. The procedure for
obtaining the formulas cannot correct for the choice of problem, but it can
correct for any language use beyond what is essential for the information,
whether it is teleological or analogical terminology used for its suggestive or
communicative value, or else dramatic presentation or over-reaching con¬
clusions and the like. The formulas, and the constraints on their conclusion-
sequences, also show something about the particular scientific field, and to
some extent about the character of current scientific work in general.
However, there is here no claim that the formulaic representation gives
Sublanguages 297
they are juxtaposed and confronted with other observations and considera¬
tions. In science languages, this is reflected in the fact that science sentences
do not appear alone or in small independent sets, but in rather extended
sequences, that is, in discourse. Characteristically, there are regularities and
constraints in the discourses, beginning with the general discourse property
of repeating word-class relations (9.3). For example, in the standardized
styles of scientific articles, the articles in certain experimental fields have
three essential divisions: Materials and Methods, Results, and Discussion. It
is found, as has been noted, that the word classes and sentence types of the
Methods section limit what can appear in the Results section, due of course
to the plan of experiment, but visible in the reporting sentences. Also the
sentences of the Discussion section are not independent of those in Results;
indeed the Discussion consists largely of Results (and other) sentences,
modified in certain ways (e.g. by use of classifiers) and arranged under a
restricted hierarchy of O00 operators which build an argumentation. Thus
the Methods sentences impose constraints on the Results sentences, while
the Discussion sentences are constrained to stay close to the Results
sentences. In addition, the sequence of sentences within the Discussion
follows a special type of discourse constraint (10.5.4).
The informational relations within science texts are thus indicated in the
structure of the sentence types (the rows of the double array), the correlated
changes in the columns, the local constraints on successive rows (10.5.4),
and the more general relations on the sentence types within a discourse (e.g.
immediately above). Within each discourse there may also exist more
complex paths of informational interconnection among sentences; but these
are not reached at present by any a priori method of analysis.
We can now summarize what kind of information we may obtain from the
structure of science languages about the science and its structure, keeping in
mind that such terms as ‘information’ and ‘structure of science’ are undefined
and can only be used loosely. The operator-argument analysis, and the
mapping of science sentences onto the formulas of a science language,
suggest that the latter can contain almost all the language-borne information
in the reports and discussions of the science. This means a better fit than we
have in natural language between differences in form and differences in
information—loosely, a better fit between form and information. What
there is to learn from the structure of science languages can be considered in
four structural contributions: (a) the internal structure of all sentential
formulas, (b) the different types of sentential formulas, (c) the kind of
300 Subsets of Sentences
information that characterizes the given science, and (d) the way the
formulas are combined in discourses of the science.
(a) The semilattice structure of each sentential formula isolates and
identifies the various sources that contribute to the information of the
science: the metascience, the prior science, the procedures and reliability of
observation, the quantities. It also instigates clarification of the relation
between particular subsets of terms: e.g. in a given research line, as to
whether small, or mature, is (as noted above) a subtype of cell (hence a
subscript on C), or a necessary condition of it for the given experiment
(hence a superscript on C), or an incidental (non-restrictive) property (to be
stated in a separate formula, attached by the wh- relative clause).
(b) Since only a few particular types of formula are found in each line of
research, they constitute a framework for its information: anything not
expressible in the stated types of formula is excluded, except as being a
specifiable addition to the field described thus far. Having a framework
creates new conditions for the representation of information. Since the
information is fitted into the available formulaic structures, we can obtain a
tabulation of the information in a text, and know where to look for each kind
of information if it is present. This makes it possible, given a tabular
representation, to inspect it for any particular kind of information, and to
summarize or otherwise process the contents of the text. All this holds only if
the formulaic framework has been obtained not by fiat—by someone’s
understanding of the field—but by the regularization of the text itself, or of a
whole corpus of similar texts, in such a way that it constitutes an objective
best fit for the information in the texts (as in 10.4). Furthermore, the fact
that this framework is not made by one expert or another, but is reached by
objective procedures carried out on the discourses of the science, assures in
principle that mapping the discourses onto the framework can be carried out
by computer programs. Finally, it becomes possible to recognize differences
in respect to the framework—imprecisions, disagreements, change through
time.
(c) As to the kind of information: the word-subclass relations that appear
in the various sentential formulas express the various types of fact and
conclusions with which the given science deals at the given time. This
specification makes it possible also to identify the kind and amount of
difference in subject matter between neighboring sub-sciences, and the
amount of change in a science over time. It also excludes from the science
any irrelevant material; such material cannot be mapped into any of the
sentential formulas. Thus a science language is protected against nonsense,
as is mathematics; but, like natural language, it is not protected against error
and falsification.
(d) The way the sentential formulas are sequenced in discourse is important
for inspectable presentation of methods and data, and above all for controlled
Sublanguages 301
could not even say how to recognize by its structure the truth of an arbitrary
scientific sentence—or of any sentence. But lesser goals may be available,
given the constraints in science languages: for example, consistency, retention
of possibility or plausibility, even the certainty that something follows from
explicitly given initial sentences.
It is of interest here that the syntactic and discourse properties of proof¬
like argumentation in science are also found in sentences expressing causation.
The term ‘causality covers a welter of situations: an agent or a precursor
causes an event; an earlier stage of a process or chain ‘causes’ a later stage;
even membership in a set ‘causes’ a member to have the properties of the set.
But in many cases the reporting of causality, i.e. stating that a causal relation
holds between events, has the structure of a sequence or partial order of
statements culminating in the statement of the caused event, as in the QED
of a proof.
In any case, under the constraints, the final sentence is a consequence of
the initial ones. Any attempt to specify the constraints on succession more
fully, and to formalize scientific argumentation or make it subject to
inspection and control, will meet with great difficulties. Although the
development of science is certainly affected by unpredictable advances in
technology and in methods of observation on the one hand and by social
controls and personal interests on the other, it is also affected by its own
results and conclusions which in themselves are not arbitrary or faddist or
competitive, but rather follow the cumulative direction not only of its
increasing data but also of its own argumentational constraints.
10.5.5. Notation
15 The detailed syntax is a bit different if the notation of mathematics is freed from the
linearity of the symbol-sequences. Certain properties of operations, for example, can then be
stated less redundantly and more essentially. Thus instead of defining commutativity as the
equivalence of two orderings (a+b=b+a), we can say that when the operation + on a set maps
a subset of members onto its image the subset is unordered: the unordered pair a, b is mapped
onto its image c. And instead of expressing associativity by grouping—a+{b+c) = (a + b)+c—
we can say that applications of pairwise mappings of members of a subset onto their image are
unordered: i.e. in a, 6, c, any ordered pair (a, b or b, c) is mapped onto its image in the set, and
then the pair consisting of that image with the remaining member of a, b, c is mapped onto its
image in turn.
1(1 One might also say that the statements of different fields within mathematics and logic are
intertranslatable, in their notation, in the case when the fields are abstractly equivalent. Andre
Lentin has called my attention to the possibilities of translation between two mathematical
systems which are not isomorphic. Cf. ch. 6, sect. 11, ‘Polymorphisms; Crypto-isomorphisms'
in G. Birkhoff, Lattice Theory, American Mathematical Society Colloquium Publications 25,
3rd edn. Providence, R.I , 1967, pp. 153 ff. Note here the comment (p. 154) about the
possibility of defining the same abstract algebra in several non-polyisomorphic ways.
306 Subsets of Sentences
names of the numbers—and variables defined for each text over a specified
domain of values. Signs such as plus operate on two zero-level arguments,
the result being again an argument of the same level; such an operator does
not exist in natural language. The negation sign operates on one sentence
and produces one sentence out of it (like the Oa of language). The equals
and greater than signs operate on two arguments, with the result being a
sentence (like Onn in language). The implication sign operates on two
sentences (propositions), with the result being a sentence (like O00 in
language).
To see more specific differences from natural language, note for example
that the symbol e for membership of an element in a class cannot be 0„„, as it
would be in language, since under e the second argument (a class) cannot be
of the same sort as the first argument (N: an individual). In contrast, the first
argument of is in such sentences as An opossum is a marsupial can also be its
second argument, as in This is an opossum: this is is closer to is a case of than
to is a member of. In the grammar of logic, we have two choices. We can say
that names of individuals and names of classes are two different sets of zero-
level arguments (something which can be defined in the external meta¬
language of logic). Alternatively, we can say that the names of classes are On
predicates, with e being an integral part of them (their operator indicator,
G37, 54): is-a-marsupial. Then is-a-member-of-class-X is an operator, as is
is-large or sleeps. The inclusion relation d then has to be an Oot} operator,
since each of its arguments is a class, hence an O. This makes it syntactically
possible for the logic of propositions to be the same as the logic of classes: the
same notation applies for the ability of a class to include another as for a
proposition to be implied by another. In the case of mathematical notation,
the variables function somewhat similarly to different indefinite nouns, that
is, as zero-level N arguments having no limitation of likelihood in respect to
their operators (G70). Operators that map a set A, or its Cartesian product
A xA, into the same A, can be defined in mathematics on various sets; but in
natural language they can be defined only on sentences, i.e. on operators
with their arguments, and not on zero-level arguments (N).
Because of its differences from natural language, the notation of math¬
ematics is a separate linguistic system, a science language, rather than a
science sublanguage of natural language. But the grammar of mathematics,
its external metalanguage, is a science sublanguage, which has to be stated in
natural language. The natural-language discussion in mathematics, e.g. the
talk around proofs, or about discovering proofs, would presumably also be a
sublanguage. Although mathematics is a discovery of relations which exist
on their own, while language would seem rather to be an invention (even if
evolving and unintended), the system of mathematics requires the existence
of natural language in order to be formulatable; but for it to be exhibitable
(in its own notation) does not presuppose natural language. Note however
that the exhibited notation, like science languages in general, does not have
Sublanguages 307
logic rests on the methods of proof; the problems that have accumulated
around mathematics in the last hundred years, even in the criticisms made by
the intuitionists, have not involved the basic apparatus of proof (other than
the marginal method of the excluded middle). In the structure of proof—
with slightly different strategies in different mathematical areas—the barest
essentials are that in a sequence, certain initial sentences of a fixed kind be
evaluated by axiomatic status or by truth tables, with the constraint on the
sequence (or partially ordered set of sequences) being such that its last
sentence, the QED, is not less true than the initial ones. These essentials can
be stated a priori and inspectably only in conditions such as those that obtain
in mathematics: where the truth value of certain elementary (axiomatic)
sentences, underlying the initial sentences of a proof, can be determined by
their syntactic composition, or by listing, or in case of difficult infinite sets,
by constructive formulation; and then the truth transmission of the funda¬
mental operators is given explicitly in the truth tables. In natural languages,
these conditions are not met, largely because of the likelihood inequalities
among members of a word class, tied to the shifting knowledge or perception
of the real world.
Even within a science, the likelihood inequalities of operators and argu¬
ments in respect to each other are too numerous and too changeable for us to
establish any useful set of axiomatic sentences whose truth value can serve
non-trivial proof-sequences. However, as noted at the end of 10.5.4, weaker
types of sequence-constraints, suggesting consistency and consequence,
may be statable for science languages, largely based on regularities of word
choice within the co-occurring operators and arguments.
In a different way, the language in which computer programs are written,
and the compilers which mediate between them and the binary code of the
computer, all constitute science languages or linguistic systems not identical
with natural language. Each program is a text in such a language, following
constraints which lead—largely in a partially ordered way—from one step
to a later step. The flowcharts that were used to design programs, and the
line indentations used in writing programs, are examples of partial ordering
of the sentences of the text. Despite the similarities of programming
languages to natural language, the lack of ‘logical form’ in natural language
(cf. at the beginning of 10.6), and the present lack of any adequate
constraints that would relate conjunctions to the particular words under
them (10.5.4), exclude any present simulation, representation, or equivalence
of computer programs to natural language and to the thinking it presents.
17 Differences among notes are differences in pitch, i.e. in sound-wave frequency. Differences
amono phonemes, however, are, acoustically, differences in timbre, i.e. in amplitude of the
various overtones, as are the differences between musical instruments. Hence the recognizability
of, say, the phoneme a, and its difference from e, is constant no matter at what pitch it is
pronounced (as in men’s and women’s voices). The existence of notes as distinct notations, or
discrete structures in musical instruments, and even the availability of more than one musical
instrument and of different timbres, are all limited to particular musical cultures and traditions.
Both the scope of the present book and the experience of the present writer unfortunately limit
the discussion here to the relatively recent culture of the ’Western world’.
310 Subsets of Sentences
and in much music, there are initial or underlying sentences or themes which
are repeated and modified in making the discourse or composition; and in
both there are subsections with internal regularities different from how the
subsections relate to the whole. In language we saw the double-array
requirement of repeating the word-relations (9.1,3); in science languages, a
few stronger similarities and progressions; in mathematics, the sharp con¬
straints that characterize proof. In music, on the other hand, a great variety
of complex modifications and developments of the themes and chords enter,
together with other elements, into the total musical work. These ‘discourse’
structures are obvious for example in rounds, fugues, the sonata form, and
differently in the demand to return to the tonic. One vague and second-
order similarity between compositions and texts is that, in both of them, the
structural relations among successive short segments (musical phrases;
sentences) are more stringent and ‘convincing’ than those among long
segments (movements; paragraphs or chapters).
In addition to the specific differences such as those noted above, the
overall structural difference is of two kinds. One is that linguistic systems—
including mathematics—are built by a few and permanent constraints,
whereas in music the constraints are more variegated, and differ from one
culture and period to another.1* The other difference is that in linguistic
systems most of the complexity of sentences and discourses arises from non-
informational elements—the transformations in shape. The meaning-effect
that this complexity contributes is quite different from that of the initial
informational sentence-making constraints. These last are simple in them¬
selves. In contrast, in music complexity can arise from the intricacy and
interweaving of the initial constraints themselves. The effect of the com¬
plexity is not necessarily different in kind from that of the smaller com¬
positional elements on which the complexity is carried out. The nearest
thing to grammatical transformations is in the musical form of variations
upon a theme; but the compositional status of this form is not different from
other compositional complexities.
A major issue at this point is the relation of the structure to the content,
to what is being said. In the case of language, it seems that the content
is achieved by the real-world associations of both the word-choice and
the initial constraints (11.4, 6). We cannot say that something comparable
holds for music, but we know that a composition has an effect, which
may be related to the affective-experience association of its sounds and
sound-sequences, its harmonic structure, and the way its compositional
constraints are developed. We may also suspect that just as certain gram¬
matical features are unrelated to the information of the sentence (11.3.3), so
indicated, say, in a picture. It cannot express in a direct and full manner the
whole range of human meaning—what a person ‘means’ to himself, what an
experience means to him, what precisely is the sensation of pain or whatever
that he has. In particular, language cannot give by its structure any direct
expression to emotion, except by bringing in extra-grammatical material,
i.e. items which have no regularities of co-occurrence in respect to other
items of the language: intonations such as of sarcasm or anger, or marginal
constructions and words such as How wonderful, Gee/, Ouch. Feelings are
carried in the language only by informational sentences, possibly false,
which state that the speaker has the feeling (/ am happy), or else by art-
activity (on which more below) of manipulating language, in ways, even
non-grammatical, that carry emotional or aesthetic impact.
In 11.6 it is suggested that the capacity of language to have an effect or to
‘mean’ is by virtue of the association of its elementary parts with something
in the world of experience, and by the constraints on combination of these
parts—or rather, by the association of these constraints with aspects of these
experiences. Further, the effect that language structure has is not the whole
range of meaning but only information, and more specifically the kind of
public information that can be transmitted with little or no error. Music too
has parts, and constraints on their combination: notes as compared with
phonemes, musical phrases as compared with words, methods of composition
compared with syntax. Despite some physical similarities between the parts
and constraints of language and of music, they differ deeply in their
structural properties and in their content—in the aspects of experience with
which they may be associated. However, the very fact that meaning—one
might say subject matter—in language is limited to information leaves room
for important areas of meaning or expression to be covered elsewhere.
Indeed, if language is specifically an information system (to express or
communicate) rather than generally an expression system (for all content),
then all types of expressing other than informational ones must be lodged in
other modes of expression and other types of activity.20
As to music and the arts, the content which they indubitably have must be
related to human experience, since art is made by people and responded to
by people. Many features in the constraints of art and the response to it, for
example the difference in response to art as against the response to decoration,
suggests that here, as in language, the effect (meaning) may be in response
to the kind and complexity of constraint (or stabilities, instabilities, and
organizings) which is met with in the course of human experience—different
from the constraints relevant to language (11.6), but nevertheless constraints
21 The fact that to this day we cannot say—or agree—what it isthat music expresses serves as
a cautionary tale about the otherwise invisible limits of what can be said in language, or at least
about the difference in content between language and music.
Sublanguages 315
straints on the form and on the sequence of predications, none of which are
available to art or to feeling. It is not that these last cannot complement what
we have in language, but they do not supplement it within its own universe
by supplying any immanent material that can then enter into, and correct,
the specific predicational and sequential-argument structure (10.5.4) which
constitutes the informational power of language. Indeed, as language is
important to us in giving us some structuring of the mass of objects and
events which we experience, music and art may be important to us in giving
us some structured expression of the sensings and feelings which we experience
both internally and externally.
In this sense, and given the difference in structure between language and
the arts, Pascal’s The heart has reasons of its own, unknown to reason’ is
questionable: any extra-rational status or validity ascribed to ‘the heart’ (for
Pascal, faith) cannot add to or correct what has been reached by the rational
structure of language. Whatever it is that man finds in art, or for that matter
in art-like response to nature, it cannot contribute to what man finds through
language; it cannot make him wiser in respect to his public problems,
whether material or social. Conversely, art and feelings cannot be said to be
by their nature ‘true’. In the matter of validation, the rationality of language
may be considered to be a way of dealing with the world, with successful
(effective) outcome; but the arts may be considered to be affective responses
to the world, with ‘success’ being either undefined or highly indirect. It is not
the case that what is said may be erroneous or false whereas what is felt is
ipso facto authentic and valid. There may be ‘false’ emotions, and cliche
emotions, which a person may indeed feel but which may be an artefact of
inculcation, or of institutional and public pressures which led him to feel so.
What is more, while the validity of ideas and statements in language can be
tested by reasonably explicit, direct, and public methods of critique (10.5.4-
5), the validity of feelings can be tested only by such complex considerations
as those introduced by Freud and by Marx. Somewhat as one may consider a
discourse superficial, one may consider a composition superficial. And as
one may question an idea stated in language, one might question a feeling,
for example awe, which one might consider no longer befitting man’s
relation to nature and society. More generally, as language cannot directly
express emotions, art cannot directly express ideas. Artists may be influenced
by cultural winds and social movements, but that content appears in their
work only in secondary and often amorphous ways.
The difference between information and feeling may also clarify some of
the difficulties in discussions of mystical experiences. The latter may consist
of real emotional experience associated with some particular informational
(factual) content, but where no publicly relevant connection can be made
between the experience and the informational content (whereas in language
such connection exists, 11.6). The difficulties may not be so great if we
understood that the experiences exist as such, due to whatever conditions.
316 Subsets of Sentences
but that the associated factual content which triggered the emotions in that
experience may not be contentually related to that experience, any more
than the words of a song are contentually related to its music.
A similarity between language and music is that in both of these the
creator of the discourse expresses a meaning and the hearer receives a
meaning, an activity quite different from, say, play or work. Music however
is unlike language in the matching of the author's act and the recipient’s
perception. In language, by virtue of its publicly established phonemic
distinctions and pre-set vocabulary, the hearer is expected to get out of a
sentence precisely what the speaker put into it; any appreciable difference
constitutes a failure in language use—in information transmitting. An
important failure, for example, is the inability of the speaker to indicate, in a
general way within the grammatical apparatus, which word is the antecedent
for a particular occurrence of he or the like. There are also types of
mismatches which are not due to grammar inadequacy. One type is when the
speaker uses circumlocutions or euphemisms because of custom, or of
personal or social discomfort with what he has to say, in which case the
hearer is expected to recognize the shift and to correct for it, understanding
the statement to mean what is intended rather than what is actually said.
Another is when the speaker speaks with indirection, hoping to mislead his
hearer into accepting what he actually says rather than perceiving his intent.
But aside from such special cases, a sentence is correctly understood only
when the hearer receives as closely as possible what the speaker meant or
intended.
In contrast, the arts have no apparatus to ensure that what the maker
meant is what the recipient receives. It is accepted in music and the arts that
the recipient—viewer, hearer, performer—may perceive something different
from (though presumably related to) what the maker meant or intended. It
is acceptable (at least to many) that different historical periods have
different ways of playing Bach, and that music written for Baroque instru¬
ments may be played on modern ones, and certainly that different performers
may interpret a composition differently—that is, play it differently, see
different things in it—even when the composer’s own performance is
known, as for example in Stravinsky’s conducting of his own works. The
importance of performing tradition and individual performance over and
above the score is not just a matter of incomplete notation.
This difference between reception in language and reception in the arts is
understandable when we keep in mind that language transmits more or less
explicit information from person to person, in a system based on full
repeatability of pre-set material, whereas in the arts, the maker expresses an
emotion or a sense of something in ways that call out feelings or sensings, not
necessarily identical, in the recipient; the recipient’s respondent feelings
depend in part on his own emotional nature and experience. Indeed, the
maker can manipulate materials and symbols as a technical or random
Sublanguages 317
activity, without an intended expression of feeling, and yet the recipient may
react with feeling, as he may react to unintended nature. By the same token,
when the work of art contains expression of feeling or intentional structure
from the maker, it is not necessarily the case that the feelings it evokes, or
the structure that is perceived, in the recipient are precisely the same as the
maker’s, or the same for all recipients. Thus expression and communication
are not identical in art, and transmission is only a secondary use of it. In the
absence of systemic control over repetition and sameness, such as phonemic
distinctions provide for language, authentication of what the maker did or
meant, and the communicational problem of maker-recipient differences,
are not systematically resolvable.22
Several activities other than music should be mentioned here in order to
distinguish them from linguistic systems. One is the use of language itself for
purposes of art. This is not a second function of language, but a re-use or
manipulation of language. The language remains in its original and only
definition: words with meanings and operator-argument statuses, sentences
as informational operator-argument combinations. The art use of language
adds to this—and also alters it—by introducing not only additional features
of discourse structure but also non-grammatical features: sound similarities
(onomatopoeia, alliteration, rhyme), time-intervals (syllable count, stress,
rhythm, rhyme), play on word meanings (including allusion, nonce extension
of selection). Modern literary devices also violate some of the grammatical
requirements, from word combinations that are not within selection to ones
that clash with the operator-argument structure or linearization. All this
may be interconnected with the meanings of language material, that is with
its ordinary use; or it may be virtually independent of that. In any case, it
does not constitute an independent system, but rather a set of structural
modifications whose purpose is tangential to the ‘function’ of language or
totally independent of it. These modifications may, of course, follow a
systematic or even conventionalized way of manipulating language or
playing upon it, with emotional or aesthetic effect. The property of acting
within and against rules of a system is more pronounced in music and the
language arts, as it is in puzzles and games, than it is in such arts as painting
and sculpture where the artist creates his independent arrangements of
objects or symbols.
Another activity that has to be mentioned is animal communication,
whether cries or standardized behavior (postures, etc.). There is little
question that some of these are expressive for the individual, and some are in
effect communicative for the group. The only thing that has to be said here is
that these activities differ from human speech not only in degree—the
vocabulary being minute—but also in kind, because there is no operator
argument relation, no predication, no organized likelihood-inequalities, or
22 On authenticity in art, and also on notations in art (10.5.5 above), cf. Nelson Goodman,
Languages of Art, Hackett, Indianapolis, 1976.
318 Subsets of Sentences
Interpretation
•i
11
Information
11.0. Introduction
Language is clearly and above all a bearer of meaning. Not, however, of all
meaning. Many human activities and states have meaning for us, and only
some of this can be expressed in language: feelings and vague sensings can be
referred to only indirectly (10.7); non-public information, such as pro¬
prioceptive sensations, can be named only with difficulty; certain kinds of
non-language information can be translated directly into language (e.g.
graphs and charts); but other kinds (e.g. photographs) can be represented in
language only loosely and selectively. Meaning itself is a concept of no clear
definition, and the parts of it that can be expressed in language are not an
otherwise characterized subset of meaning. Indeed, since language-borne
meaning cannot be organized or structured except in respect to words and
sentences of language (2.4), we cannot correlate the structure and meanings
of language with any independently known catalogue or structure of meaning.
In each language, we do not know a priori which specific aspects of mean¬
ing will be referred to by words, and how much will be included in the
meaning of a single word. Even when the meanings are well-defined, it is not
always possible for words of a language to mirror in their structural relations
the relation among the referents. For example, the relation among the
integers is given in Peano’s axioms, but language cannot thereupon name
them as one, successor of one, successor of successor of one, etc. What
languages do in this case is to give arbitrary names to the first ten (or six, or
whatever), and then to use these and other names to indicate successive
multiples of the initial set: the integers are thus named modulo that initial
set, by a language-based decimal or other expansion. Such cases suggest that
we have to study the specific words and structures of a language if we wish to
see what meanings they cover, and how.
As has been seen (5.4), the phonemes are irrelevant to meaning though
they underlie communicability: the phoneme-sequences that constitute
words are not in general constrained in any way that is relevant to the
meanings of the words. The words are thus a fresh set of primitive elements,
which can be identified without phonemes (spoken or written) as in ideo¬
graphic writing, and which are associated with meanings, in a manner to be
discussed in 11.1 and 11.6. The meanings of words are modified by their
322 Interpretation
The first question is: what are the least parts (forms) in language to which
meaning adheres? Not phonemes: there are no regular ways in which
the meaning of the words of a language can be obtained as combinations
of meanings assigned to the phonemes of those words. Indeed, phonemes do
not in general have any linguistic meanings; and the pair test which establishes
phonemic distinctions is related not to the meanings of the tested words but
only to the recognizability of their repetitions (7.1).1 If in all the occurrences
of a word the phonemes were replaced by others, we would simply have a
variant form of the same word. But if we replaced some or all of the
occurrences of a word by a word which had different selection, i.e. whose
normal occurrences were different, we would have a different meaning.
1 lence onomatopoeia, in which the sounds of a word suggest its meaning, is a
rarity in language. By the same token, the cases in which different morphemes
that are similar in sound are also similar in meaning must be haphazard and
1 As noted at the end of 7.4, the phonemic composition of words is needed because there are
not enough pronounceable and audible distinctions among fixed single sounds to distinguish all
the vocabulary of a language. The structural and semantic properties of words can be carried
even by having a single sign for each word, as in ideographic writing.
Information 323
not regular; and these sound-meaning correlations are then not usable in
any regular way in the grammar.
However, in each language there are certain listable phoneme-sequences,
called words or morphemes, which carry fixed meanings. It has been seen
that these sequences can be isolated out of the flow of speech even without
any knowledge that they are the words of the language, or what their
meaning is (7.3, 11.3). A sequence of phonemes cannot be established as a
word simply on grounds of being judged to have meaning of its own; it must
also have regularities of occurrence relative to established sets of other
words. There are cases of a phoneme-sequence which seems to have a
characteristic meaning but whose immediate environments do not them¬
selves combine in a sufficiently regular way, as morphemes. The morphemic
status of such phonemic sequences may therefore be border-line. Such are,
in English, the si- of slick, slip, slither, slide, slimy, slink, sling, slog, slosh,
slouch, slow, etc., and the gl- of gleam, glow, gloom, glisten, glitter, glare,
glide, and the//- of flick, flip, flit, float, flash, flare, flap, flop, flutter, fling,
flow, flee, fly (7.4). Such is also the case for -le in scores of words; handle,
dazzle, nozzle, nuzzle, muzzle, etc., even though in some words the -le had
been a more regularly combining English morpheme. On the other hand, a
phoneme-sequence may have to be accepted as a morpheme even when it
has no assignable meaning, just on grounds of its regularity of combination;
e.g. the re- of recommend.
In view of all this it is necessary to consider what is the source and status of
word meanings, and how these relate to the combinational regularities of the
words: how do words carry meaning? First, it is clear that most words have
some association with meaning independently of their occurrence in sentences
(i.e. in combination). Since a large vocabulary was presumably in use before
sentences developed (12.3.1), these words must have had meanings in
isolation. In addition, many words that enter the language at various times
get their initial meanings from the experiences that gave rise to them rather
than from any grammatical combinations. I his applies to newly made words
such as gas and boysenberry, to borrowings whose meanings may be adapted
or specialized from the meanings in the source language (e.g. opera,piano),
and also to word combinations that are not made on a syntactic basis (e.g.
flip-flop, wishy-washy, wheeler-dealer). In addition, many words change or
specialize their meanings in the course of time on the basis of cultural and
historical developments, without apparent regard to their etymology or to
the syntactic combinations into which they otherwise enter: examples are
countless, e.g. Quaker, verb, plane. And of course very many syntactic word
combinations have a specialized meaning beyond their composition: whereas
school-books are any books specifically for school, snow-shoes are not merely
any shoes specifically for snow but a particular type ot object.
How words are associated with their non-syntax-based meaning will be
discussed in 11.6. Before reaching that question, however, it is important to
324 Interpretation
3 Even for the many words whose meaning is constant the semantic effect may differ in
different environments. When said of an unknown object, small, large may be used relative to
the size of speakers and hearers; hut said of a known object, these are relative to its class of
objects: small elephant, large flea.
326 Interpretation
Observation of the data makes it clear that the relevant environment for
meaning-specification of a word-occurrence is its immediate operator or
argument, more rarely its farther operators or arguments and its co¬
argument; modifiers of a word are counted as operators on it in a secondary
sentence (3.4.4). Because the environment of a word delimits its meaning,
the meaning of a syntactic construction is sharper than the meaning-ranges
of the component words taken separately. Thus the correlation of meaning
with syntactic constructions of words is more precise than it is with words
alone; and indeed, a dictionary is frequently forced to refer to grammatical
environments if it is to present word-meanings with any precision. The
meaning is in the words, but the precision for each occurrence often comes
with the combination. Cases in which environment differences do not match
meaning-differences are only apparently such: for example, there is much
similarity in the environments of certain words whose meanings are opposites
(left!right, small!large, etc.); it is relevant here that many opposites (good/
bad, young/old, mother!daughter) are known to have important semantic
features in common. Indeed, certain aphasias and slips of tongue can replace
a word by its opposite.
New extendings of environment and meaning are constantly being made.
They are not determined merely by the speaker's intent. In many cases there
are alternative possibilities for the environment, and the extending is
affected by various factors. Thus when a new operator was needed for the
motion of a plane in the air, fly was not the only possible choice (the
Zeppelins sailed): similarly for fly as the moving of a flag. When words are
needed for incapacitating a tank, or for eliminating a bill in Congress, kill is
not the only word with a somewhat appropriate meaning. And steamships
and motor-ships sail, but motor-boats do not. In the many such cases of new
word combinations, the meaning-range of the chosen word (fly, kill, sail,
etc.) changes to accommodate the new use.
A case of this is seen when metaphors, originating as a simile, yield a new
meaning of a word, as in the case of see in the sense of ‘understand’ (/ see
what you're saying), or in the meaning of understand itself as it developed
out of its etymological source. Metaphor is an extreme case of an apparent,
or de facto, change of meaning in a word. It arises in a simile when indefinite
nouns and verbs, and also the as of the simile itself, have been zeroed,
leaving the remaining word to carry the meaning of the zeroed material in
addition to its own. If the metaphoric see above is derived from some source
such as / treat what you're saying as one's seeing things (through an unsaid /
treat-as-one's-seeing-things what you're saying, reduced to I treat-as-seeing
what you're saying, G405), then the metaphoric see has the meaning of the
original physical see plus the meaning of as (while the verb-classifier treat and
the indefinites one, things, carry little meaning), ‘I as-though-see what
you’re saying’. Other cases of words which are left to carry the meanings of
zeroed operators on them are seen for example in the ‘zero causative' of He
Information 327
walked the dog from loughly He took the dog walking (through an unsaid but
grammatically constructible He took walking the dog). Such situations can
even lead to words having apparently opposite meanings, as in He dusted the
furniture from He treated the furniture for (its) dust as against He dusted the
crops from He treated the crops with (insecticidal) dust. A different case is
that of abstract nouns, as nominalized operators, which arise when the
indefinite arguments of an operator are zeroed in a nominalized sentence
(i.e. a sentence under a further operator): e.g. hope from people’s hoping
things (in Hope springs eternal), vehemence from one’s being vehement (as in
Vehemence is uncalled for here, 11.2). ‘Lexical shift’ (i.e. meaning-shift in
words) is thus simply the effect of environment extension plus zeroing.
The most general factor in the varied and changing meanings of words is
simply the constant though small change in likelihoods—what words are
chosen as operators and arguments of other words, and how frequently they
are thus used. Beyond the argument-requirement dependence, the grammar
does not limit each word to particular other words as its operator or
arguments: any On operator can make a sentence—even if a nonsensical
one—with any N word, and so on. Given this, the particular choice (or
grading of likelihoods) of words depends largely on what makes sense (or
intended nonsense), given the current meanings of the words; but it depends
also on marginal distinctions that the speaker wants to make, on attention-
getting surprise, on euphemism, on the analogy of how related words
combine, and on other factors.
The changing frequency and acceptance of certain combinations affect
what comes to be seen as the main meaning of a word, as do increasing
specializations (i.e. frequencies of particular meanings) of word use. Over
time one sees many changes in combination which produce recognizable
differences in meaning, e.g. as between / was flying in a 747 (an extension in
the meaning of fly) and A moth was flying in the 747. There are words in
which successively extending the frequency of the environment-choices has
resulted in large differences in meaning: for example, within the history of
English, fond from ‘tasteless’ to ‘affectionate’, like from ‘to please’ to ‘to be
pleased by, to favor’; in wear, by the side of the continuing meaning ‘to be
clothed in’ there was also a development to ‘to deteriorate’ and ‘to pass’ (The
night wore on).4
Although the meaning of any word at a particular time is a factor in
contemporary choices of operators or arguments for it, the meaning is
clearly not sufficient to determine those choices. But those choices in turn
affect what is understood as the meaning of that word. After new choices of
operator or argument have been made for a word, we can often find an
4 Changes may he so natural, given the changing culture, as not to be noticeable. Thus, the
old verities (literally 'the old truths’) once referred to what users of the language must have
understood as, indeed, ‘the old truths’. By now our suspicions as to the once-accepted truths are
such that when we say the old verities we are close to meaning the old falsehoods .
328 Interpretation
extended core-meaning for that word, such as would fit both its old and its
new environments (e.g. discharge an employee, an obligation, an electric
potential difference). But we would not necessarily have come to this
meaning-core before the new environments had come to be used: consider
to charge the enemy, or the accused, or the purchase, and to charge a battery.
That is, knowing the meaning of a word does not suffice for us to predict
what new environments would or would not be used with that word.
Different resolutions in this respect are found in different languages:
thus although English and French have rather similar systems of kinship
nomenclature, seemingly opposite choices appear in grand-daughter as
against petite-file, each being a reasonable way of adding ‘one stage farther’
to daughter, but grandmother is paralleled by grand'mere (and aieule). (Note
that great-granddaughter is arriere-petite-fille, and for great-grandmother
one can say not only arriere-grand'mere but also bisaieule.) Extending to
new environments is all the more complex when we consider nonce forms,6
metaphor, analogy (especially that), and literary turns of phrase. All these
many-faceted similarities and differences in meanings of words can create
regularities and patterns of meaning-relation among words, which are a
source of interest in lexicography and literary criticism.
Attempts to categorize and to find regularities in extending and changing
meanings of words have not fared well. But what is important is that such
attempts have succeeded far better when directed to the choices of environ¬
ment rather to any inherent dynamics (such as narrowing or widening) in
meanings as such. That is, a word is extended from one environment to
another on environmental grounds, e.g. on the basis of the environments in
which that word already occurs in comparison with the environments in
which variously related words occur.6
The meaning that a word has with a particular operator or argument is
constant for all its occurrences with that operator or argument. Thus not
only is this local meaning more sharply defined than the general meaning-
range of the word, but also it is this meaning that is preserved under further
operators. This meaning is also generally preserved under transformations.
However, a word may seem to change meaning when it incorporates the
meaning of attached words which have been zeroed, as in expect John in the
sense of expect his arrival, and in She’s expecting for being pregnant. In
contrast, when words come to be used in greatly broadened selections, their
meaning becomes less specific (and in some cases their form is reduced)
when they are in the broadened selection. For example, o/is derived from
off, and have is weakened from ‘possess’ to ‘be in (a situation)’ when it
In nonce forms and certain jokes a word is used in a new environment with the under¬
standing that this is not a precedent, that the meaning of the word should not be modified to fit
the new environment.
6 H. M. Hoenigswald, Language Change and Linguistic Reconstruction, Univ. of Chicago
Press, Chicago, 1960.
Information 329
occurs with all verbs to make the perfect tense. (Note that in the French
perfect tense, e.g. J'ai fini hier, the ai is shown by the hier to be no longer
present tense, but in English 1 have finished by now, where there is no I have
finished yesterday, the have is largely still present tense.) Furthermore, in
some cases a word combination changes its meaning so that it no longer
follows in a simple way from the meanings of the component words: these
are the idioms (e.g. kick the bucket, fix his feet). This may take place even if
the non-idiomatic meaning of the combination continues to exist, as in take
care of him both in the sense of caring for him and in the sense of neutralizing
him.
It would seem to follow from all this that meaning has to be stated as a
property of word-occurrences (i.e. of a word ‘token’) rather than of word
(i.e. of its ‘type’). But this is wasteful, since very many word-occurrences
have the same meaning. The regularity of meaning is in terms not of word as
such nor of word-occurrence, but of word combination: the word has fixed
different meanings when with different subsets of its selection (immediate,
or somewhat farther).
In sum: first, the meaning of a word in isolation is in most cases the
starting-point, and its meaning affects its current selection of operators and
arguments, though only in a loose way. Second, the operator-argument
selection of a word, while affected by its current meaning, can be changed by
the pattern of the selections of variously related words (especially in
analogic extension), which in turn affects its ensuing meaning. But it must be
kept in mind here that the selection of a word is not simply the set of
operators and arguments with which it occurs, but the set it occurs with in
above-average frequency (hence e.g. not as jokes); we include here cases
when the frequent co-occurrent is somewhat distant in the dependence
chain. It is thus that selection can be considered an indicator, and indeed a
measure, of meaning. Its approximate conformity to meaning is seen in that
we can expect that for any three words, if two of them are closer in meaning
to each other than they are to the third, they will also be closer in their
selection of operators and arguments.
By speaking of a word’s selection instead of a word’s meaning in isolation
we leave room, first, for different meanings (taken from the range in
isolation, or extended out from it) in the presence of different operators or
arguments, and second, for meanings to change and to change differently in
particular environments. The latter has been shown to occur not only on the
basis of processes of meaning change but also on the basis of environmental
similarities and differences. For example, a word can change meaning in
suffix position while it does not change in its free-word occurrences (compare
-ly with like).
Selection is objectively investigable and explicitly statable and subdivid-
able in a way that is not possible for meanings—whether as extension and
referents or as sense and definition. Indeed, one can take a reasonably large
330 Interpretation
sample of sentences or short discourses and list the complete selections for
various words, although in many cases there may be uncertainty as to
whether a particular operator or argument has selectional frequency for the
given word or is a rarer co-occurrent which does not affect its meaning.
Characterizing words by their selection allows for considering the kind and
degree of overlap, inclusion, and difference between words in respect to
their selection sets—something which might lead to syntax-based semantic
graphs (e.g. in kinship terms), and even to possibilities of decomposition
(factoring) for particular sets of words. Such structurings in the set of words
are possible because in most cases the selection of a word includes one or
more coherent ranges of selection (3.2). The effect of the coherent ranges is
that there is a clustering of particular operators around clusterings of
particular arguments, somewhat as in the sociometric clusterings of acquaint¬
ance and status (e.g. in charting who visits whom within a community).
Selection sidesteps certain boundary difficulties that are met with in
considering meaning. For example, words whose referents have at all times
null extension (unicorn, centaur, spiritualists’ ectoplasm, or for that matter
laissez-faire capitalism) present no problem here, because they each have
characteristic and stable selections, and ones which adequately distinguish,
say, unicorn from centaur. Such words can also be characterized by their
definitions. Indeed, the selection of a word can be used to generate the set of
its partial definitions: a chair is something on which one can sit, which in
most cases has four legs, etc. As to different words which are not really
synonyms but have the same referent (e.g. Sir Walter Scott and the author of
Waverley), these have or can acceptably have the same selection—but not in
sentences where their difference is involved (e.g. not in The king did not
know that Sir Walter Scott was the author of the Waverley novels). Words
whose definitions are not readily statable, or which may have no definition
or meaning, such as the spiritualists’ ectoplasm or there in such sentences as
There's a man here, can nevertheless be characterized by their syntactic
position and their selection.
Selection also treats directly the cases where meaning and phonemic
composition do not correlate: in suppletion, different phoneme-sequences
have the same meaning; here we note that they have the same environments
except for being complementary in respect to a particular part of the
environment (e.g. for is, am, are)—a situation which has a distant similarity
to the issue of Sir Walter Scott above. In free variation, partially different
phoneme-sequences have the same meaning; here we note that they have
the same environments (e.g. the two pronunciations of economics, or the not
and -n’t, and will and wo-, of will not, won't). In science languages (10.5) a
given word symbol, with fixed sentence-type positions, has a sequence of
English phonemes in English articles and a different sequence of French
phonemes in French articles. In homonymy, the same phoneme-sequence
can have different meanings; here the phoneme-sequence occurs in two or
Information 331
particular selection which would support that meaning. Nor can we reach
words in any organized way by starting from their meanings, quite apart
from the fact that there is no known way in which we can objectively and
consensually organize the world of meanings. But for the most part, the
argument-requirement of a word is preserved under translation: that is, On
words (e.g. walk) usually translate into On words, O0 words (e.g. occur) into
00, and so on. And for each word, what the learner or analyzer of a language
does is not to think deeply about what the word ‘really means’ but to see how
it is used, keeping in mind the fact that the various life-situations in which it
is used are largely reflected in the various word combinations in which it
occurs. The grammar cannot be built out of meanings; but it can be built out
of word combinations.
One can readily see that the meaning of a sentence is more than a selection of
the meanings of its words: John saw Bill does not mean the same as Bill saw
John, and the meanings which we can garner from collecting the words John,
Mary, Tom, Smith, call, tell, will, to, and are not the explicit, and varied,
information that we get from Smith will tell John to call Mary and Tom and
Mary will call John Smith and tell Tom to. The sentence information over
and above the meaning of its words is given by the meaning of the operator-
argument relation. To see what this meaning is we note that the structural
property is not merely co-occurrence, or even frequent co-occurrence, but
rather dependence of a word on a set: an operator word does not appear in a
sentence unless a word—one or another—of its argument set is there (or has
been zeroed there). When that relation is satisfied in a word-sequence, the
words constitute a sentence. This statement does not exclude other words or
combinations, especially zero-level words, from constituting exceptional
types of ‘sentence’, as perhaps in Fire\
The operator is thus not something one says by itself, but rather something
one says with—about—the argument, a predication on it.7 A word that
requires the presence of one or another of certain other words means—in
addition to its own lexical meaning—that it waits on those other words, i.e.
that it is said about them. Hence each occurrence of the dependence relation
contributes a predication meaning (between its participants) to the sentence.
Thus the meaning or information of a sentence is not just a sum or
juxtaposition of the meanings of its words. And since the predication
relations in a sentence are a partial ordering of its words, we see that the
meaning of each partial ordering of words (i.e. each sentence) is that same
7 Predication is a relation between two meanings, and is the meaning of a relation between
two word-occurrences. It is not the same as assertion (5.5). In John s accepting is uncertain, the
accept is predicate on John, but not asserted, as also in John will accept if he is invited.
Information 333
11.3.0. Introduction
We have seen that when the constraints on form, i.e. on word combination,
are isolated we can distinguish between constraints that carry information
and a (mostly reductional) residue that does not. The constraints on
combination constitute a linguistic system by themselves, the base sub¬
language (4.1), but the residue does not. Thus the information in language is
the information in a structurally distinguished sublanguage, which consists
of base forms of each sentence in the whole language.
In the base the relation of form to information is much sharper than in the
whole language, and here we can ask precisely what kinds of form carry what
kinds of information. We can take a base sentence, ultimately an elementary
sentence no proper part of which is a sentence, and adjoin to it various word-
sequences which make it into a longer sentence, and then ask what information
has been added by each additional part. (We disregard here the removal
of parts of a sentence, by zeroing, which does not remove any of its
information—up to ambiguity.) A more organized account of such accretion of
information is given by the semilattice structure of the sentence. Here every
word that enters into the making of the sentence has a fixed point in the
partial order, at which its form and its meaning are contributed. Here, then,
every leg of the partial order contributes its own grammatical (ultimately
predicational) meaning at that location; and here it is found that each type of
meaning has its characteristic location. Typically, these contributions include:
zero-level nouns; first-level operators, as their properties and relations; the
linear order of the first (‘subject’) and second (‘object’) arguments of
a relation; the difference between things (nouns) and states or events
Information 335
It should be clear first of all that every vocabulary or grammar choice made
in constructing a message (sentence or discourse) is an application of a
constraint defined as being available at the given point in the construction.
Each chosen constraint makes a fixed contribution, at the point of its
application, to the substantive information carried in the message. For a
given type of choice, either the contribution is always null (e.g. trans¬
formations, 11.3.3), or it is accretively meaningful. Furthermore, the amount
of choice at a given point may be graded: the more expected a given choice-
decision is at that point, the less information is contributed by the expected
choice. And if what has been chosen can be determined (reconstructed)
from the environing choices, no free choice has been exercised, and no
meaning contributed, at that point.
The meanings of all words and grammatical entities can be indicated via
the constraints listed below, which may be grouped as follows:
(1) words in isolation (with their meanings), viewed as a non¬
syntactic constraint on phonemes;
(2-5) the constraints involved in environment-based specifications of
word meaning;
s Cf. the English and Korean examples in Z. Harris, Mathematical Structures of Language,
Wiley-Interscience, New York, 1968, pp. 110, 112.
336 Interpretation
(6-8) the constraints that underlie the meanings of sentences over and
above the meanings of their words;
(8-10) the constraints that yield the further meanings of sequences of
sentences;
(11) the constraints that yield the meanings of word subclasses and
sentence types in sublanguages.
(11) In natural language as a whole, the constraint that creates syntax, i.e.
sentences, is the operator-argument partition of words or word-occurrences
into argument-requirement classes, while the likelihood constraint on word
combination in (4) above is simply a not-well-defined variation within this
partition. However, in science sublanguages word combination is so sharply
limited that the place of likelihoods is taken by definite constraints which
further partition word-occurrences into subclasses of operators and argu¬
ments. This seems a novelty, changing the non-grammatically-differentiating
likelihoods of (4) into grammatically-differentiating subclassifications. How¬
ever, it is actually no more than an extension of the conventionalization of
use which had produced the original operator-argument partition; cf. (5)
above. In the whole language the use of the dependent words was distinct
and conventionally fixed enough to create distinct word classes, but likeli¬
hoods were not; in science sublanguages the likelihoods, i.e. the use of
specific words, are distinct and conventionally fixed enough to create
combinatorially and semantically distinct subclasses of words within the
whole-language classes. This is the major information-bearing constraint of
sublanguages, especially of the languages of sciences.
The result is a characteristic structure for the semilattices of sentences in
each science language, with a strong connection to the information in the
science, and with the various word subclasses occupying particular positions
within the semilattices—the meta-science material, if present, being at the
top of the lattice of each sentence as secondary sentence. In addition, a
science language may have new argument-requirement classes which do not
appear as such in the whole language, especially in respect to quantity
relations (e.g. ratio). In all, the word classes, sentence structures, and
discourse structures of science languages provide not only the specific
information of the science but also its characteristic overall information
structure, much as the structure of the whole language provides both the
information reported in the language and also—in the operator-argument
system and its secondary grammatical relations—the structure of language
information in general.
-mg, which have been analyzed for the present theory as being merely
indicators of operator status or of argument status, rather than being
reduced words such as enter a sentence by virtue of their argument-
requirement status (G37, 40).
Also non-informational are the reductions, as noted above (3.3). Since
these are changes only in the phonemic shape of words, or in their position,
we can say that the word remains in the sentence even if in altered or zero
shape, and that its grammatical relations are unaffected, so that reduction
when so analyzed makes no change in word combination or in information.
Even if we consider the overt appearance of sentences, where reduction of a
word to zero would be taken as absence of it, the loss of the word would not
entail a loss of information because words that are reduced in this way are
reconstructible from the environing words, and thus had contributed no new
information to their sentence. The non-loss of information is also seen in the
fact that the bulk of transformations, and especially reductions, are optional, so
that both the original and the transformed shapes exist, often in the same
discourse-environments, and the transformed are clearly paraphrastic to the
original. In morphophonemic reductions, too, the paraphrastic character is
clear: I will and I'll, going to and gonna, etc.
Nevertheless, reductions can indicate meanings, without contributing
new word meanings. This happens because reductions indicate high likeli¬
hood of a combination and even, as in the case of the compound form or the
adjective order, the relative stability of a combination (G231, 233). When
operators which have high likelihood of undergoing tense variation get the
tense affixed to them, and so become ‘verbs’ (He aged), while for the other
operators the tense is given in a separate word, leaving them as ‘adjectives'
(He was old), the different localization of the tense creates a fixed distinction
between less and more stative operators (G188).
Some reductions affect or indicate the speaker’s or hearer’s relation to the
information in a sentence, without adding to the information or subtracting
from it. The effect is primarily in the way the information is presented. One
such effect is in the amount of information which comes through at a given
point in the sentence. This results even from some morphophonemics: given
the am, is, are forms of the operator be, the person who hears Am /
responsible? knows already from the first word that the argument will be I.
(But for the continued presence of /, one could have analyzed am as is
plus I.) Another effect on presentation of information is the degeneracy
of certain reductions, whereby different reductions in different sentences
produce identical word-sequences, which are thereupon ambiguous (hom¬
onymous): He drove the senator from Ohio to Washington is produced by
one reduction from He drove to Washington the senator who was from Ohio,
and also by another from His driving of the senator was from Ohio to
Washington. The ambiguity did not exist in the sources, before the reductions.
The alternative linearizations of the partially ordered words of a sentence
340 Interpretation
father!brother!sister, or the si- and fl- words). More important, but still not
explicitly informational, are such literary devices as poetic meter, rhyme,
and alliteration, all of which relate one word to a particular other in the text
(whatever else they do); or allusion to absent words by phonemic similarity
to them in words that are present in the sentence (in literature, jokes, swear¬
words), and various kinds of literary and popular word-play.
y One might say that the environing words select this relevant meaning from among the
meanings that the word has. However, the only evidence that the word has the given meaning is
that it has it in the given environment or in reference to such an environment. Hence we cannot
say that the word in and of itself has a stock of meanings from which a given environment
chooses, but only that the word is accorded a given meaning in a given word-environment.
342 Interpretation
We can now attempt a summary of the relation between form and content in
language. As in so many issues which are clouded by being conducted in
catch-phrases, the resolution here is not one or the other of form and
content or even both form and content together, but a particular relation
between certain forms (produced by certain constraints) and certain content.
Specifically, certain types of content-bearing elements enter into certain
types of meaning-affecting constraints on their combinings, and vice versa.
At this level of generality, the situation is not too different from the relation
between form artd expression in music and other arts.
The relation between form and content cannot be a correlation in the
ordinary sense of the word—a covarying of members in two independently
Differently, contrastive or emphatic stress does not relate to any particular other word. It
functions as an additional word in the sentence, a synonym of emphatically or of rather than
anything else, or the like.
Information 343
11 In general, situations which call not for information but for some other kind of expression
may occasion sentences whose overt information is not what is really being communicated. For
example, if people have encountered each other several times in one day, they may say ‘It's a
small world or simply, with a smile, ‘We meet again’, where what is being communicated is not
the facts that are said, but the recognition of the unlikelihood of so many encounters—the
content of the smile.
12 H. Hiz, ‘On the Rules of Consequence for a Natural Language’, The Monist, 57 (1973),
312—27, and Information Semantics and Antinomies', in G. Dorn and P. Weingartner, eds.,
Foundations of Logic and Linguistics, Plenum, New York, 1985.
Information 345
Given all this, we can consider the question: is the theory presented here a
theory of language, or of information? It is in any case a theory of language
structure, since it attempts a reasonably complete survey of the constraints
in language, some of which are essential to it (and must be defined as
independently of each other as possible) because of the lack of any external
metalanguage in which the structure of language could be stated. And it is a
13 Even idioms are included here, their constraint being a special case of a word having
different meanings under different operators.
14 It also has an echo in the use of the term ‘semantics’ as ‘what a system can do' in analyzing
computer programming languages.
346 Interpretation
theory in which meaning was not used in making the analysis. It also
produces a structure of language-borne information because a distinguished
subset of these constraints—ultimately the operator-argument dependence
and the word likelihood within it—are characteristically associated with an
informational interpretation; they each have the effect of information.
These constraints thus create the essential nucleus of language, and also
create its information. The relevant question is whether in choosing the
methods that yielded this theory, other methods might have been available
that would not have yielded this close informational interpretation. But the
search for a least grammar was justified in 2.2 without any consideration of
meaning. That the constraints on equiprobability which yield the least
grammar carry informational interpretation is due to the nature of information:
both word combination and information are characterizable by departures
from randomness. Features which do not give new information, such as the
linearity of language, or the time and quantity categories that are created by
paradigms, are distinguishable structurally from the informational ones.
There follows from this, what is in any case borne out by experience, that
we cannot in general impose our own categories of information upon
language. In a limited way we can arrive at certain categories of information
independently of language, but we cannot locate these types of information
in any regular way in a language unless the language has a structural feature
whose interpretation this information is. We cannot determine in an a priori
way the ‘logical form’ of all sentences. We cannot determine their truth
except to the degree that we can locate them in a proof-like constraint
on sentence-sequences, in respect to known prior sentences or later con¬
sequences. We certainly cannot map them in any regular and non-subjective
way into any informational framework independently and arbitrarily chosen
by us.
The other side of the coin is that we cannot disregard any information that
is carried by a structural feature of a particular language. Various languages
express time-order and quantity with various distinctions and relations. We
need not go only by the most overt expression in the vocabulary but can also
seek second-order evidence, in more subtle restrictions on the environments
of this vocabulary, to see what time relations are expressed in the language.
But we are bound to the regularities of word combination in the given
language. For example, some languages have overt forms (suffixes and the
like) for the ‘aspect’ of verbs (mostly features of duration), while others
mark this only by regular differences in word combination, especially in
respect to time words, as against yet other languages which have no regular
indication of aspect (G274). An informational category of ‘aspect’ either
must be or must not be recognized for the language, accordingly. And
the non-informational constraints (11.3.3) too remain important for each
language.
If in spite of the informational dependence upon the demonstrable
Information 347
11.6.0. Introduction
What gives language the power to mean something? We have seen that the
meaning of a sentence is not a primitive unanalyzable entity, but rather
the product of accretions from the fixed meanings of its words and the
constraints on their combination. We therefore have to ask what gives the
words and the constraints the property of having these accretive meanings.
The usual view is that words are associated (as symbols) with some selectively
perceived aspects of experience (as referents); a less selective representation,
such as a photograph, would not be considered a symbol of what it represents.
As to the meanings of grammatical relations, they have been little studied
and are presumed to be a new primitive element created by language.
In what follows we will see: first, that the dependence relation of words
reflects what one might consider dependences within man’s perceivable
world (11.6.1); and second, that word meanings and co-occurrence selections
express a categorizing of perceptions of the world (11.6.2).
1X There are nevertheless many similarities in the meaning ranges of words, as among
unrelated languages. Many unrelated cultures have quite similar species distinctions in their
words for flora and fauna.
352 Interpretation
the occurrence of the word in other new environments; which are coherent
with those in the new definition. It follows that not only word meanings, but
also word likelihoods in various combinations, are associated with aspects of
the perceived world, except for those word combinations that are fostered
by social and institutional factors (fashion, fiat, etc.) or by intra-linguistic
factors such as analogy; or communicational efficiency. The relation to
people's experience of the world holds also for a word's extra-high likelihood in
particular combinations, that permits reduction of the word.
That meanings of a word are for the most part determinable both by
association with the real world and also by the ‘selection' of the word
becomes explicable if we say that it is the sentence, or word combination,
that has meaning by association with experience; and the parts of the
sentence and also their arrangement have meaning by association with
the perceived parts of that experience and with the way these parts co-occur
in it.
On the basic and universal language structure thus formed, with its
component meanings thus rooted in the perceived world, there appear
various further structures, not all of them universal and not all related in a
direct way to properties of the world's phenomena.
Some of these are local developments based on non-informational regular¬
ities such as phonemic similarities among words: e.g. morphophonemics, or
the differing declensions and conjugations in some languages. Here we must
include the use of a single word for various meanings, differently in different
languages. Others are conveniences in presenting information. This includes
reductions and other grammatical transformations, which clearly do not
reflect anything in the referents of the words and sentences. It also includes
grammatical developments that bring distinguished kinds of information
into common form: e.g. the cross-reference pronouns (5.3.5), or the para¬
digmatic arrangements of tense, person, and number. These last categories,
for example, are certainly involved in the physical world, but not in the same
relation to objects and events in the world as they have to nouns and verbs in
the paradigms of many languages. A slightly different case is that of
transformations that are based on word likelihoods but that create new
grammatical distinctions and partitions out of graded differences in experience-
reflecting likelihoods: e.g. the partitioning of operators into verbs and
adjectives (as in English) by establishing two distances for the tense on an
operator, possibly on the basis of how frequent is tense-change within a
discourse for the different operators (5.5.3, G265).
Still other further structures consist of new constraints on co-occurrences,
expressing new information, but not based on any immediately recognizable
constraints in the physical world. These—all of them constraints on sentence-
sequence—are the word-repetition around conjunctions (9.1), the operator-
argument class-repetition discourses (9.3), and the proof-structure sequences
in mathematics and some science writing (10.5.4). One might say that these
Information 353
sentence and discourse, ultimately the partial ordering of words and con¬
straints thereon, with words themselves characterizable by their status and
likelihoods in the partial ordering. These steps are found to yield in a fixed
way a distinguished part—the lion’s share—of the information carried in
the given sentence and discourse. Thus the information itself is structured
by such accretional steps. Indeed it has been seen that given a system
or message with certain constraints and certain information, the defining
of additional constraints in it makes possible corresponding additional
information (11.3.2, 12.3.3).
First, the choice of a particular word at a given point in constructing a
sentence is a departure from equiprobability of phonemes and words there,
and yields a meaning.
Then, the relation of departures from equiprobability to informational
contribution is seen throughout the grammar. For example, words lessen
their information as they broaden their selection (i.e. as choosing them
constitutes less of a constraint on their argument or operator, 11.1), e.g. in
the process of provided, providing becoming conjunctions (10.3, and n. 2
there). Words with broadest selection add least information, as in the case of
the indefinites (3.2). And words which are reconstructible from the environ¬
ment, i.e. add no information, are zeroable (3.3.2). In addition to word-
choice, each constraint that creates the partial order of words in the sentence
is also a departure from randomness in this language universe, and yields a
meaning. The information in a sentence or a discourse is thus formed by
departures from equiprobability acting upon departures from equiprobability.
The information is not simply an unordered collection of departures from
equiprobability, but a specific structure consisting of one step acting on
another.19
In the mathematical theory of communication (Information Theory), the
total departures from equiprobability (the redundancy) of a message or a
channel is discussed as its informational capacity, indicating the amount of
information it can carry.2" The analysis of amount does not identify or
1969; A. Kolmogorov, ‘Logical Basis for Information Theory and Probability Theory’, IEEE
Transactions on Information Theory, IT-14 (1968) 662-4; C. J. Chaitin, Algorithmic Informa¬
tion Theory, Cambridge Univ. Press, Cambridge, 1987.
12
12.0. Introduction
As for the phenomena which show regularities, they are describable, for
each language, by the following relations and processes:
(1) Words. Sequences of phonemes (the latter defined by phonemic
distinctions, 7.1), bearing (in the great bulk of cases) meanings which are
associated with aspects of the perceived world, and are characterized by
their likelihood of combining with particular other words in the dependence
relation. The elements of communicational distinction are phonemes, but
the elements of meaning-carrying are words and their relations. The words,
however, are not simply symbols assigned by their users. Their ability to
have a particular meaning (i.e. to be such a symbol) is identical with
their having particular clustered (coherent) selections of their co-occurring
operators or arguments, (3) below.
358 Interpretation
these processes suffice for all the information in language. Every sentence in
a language is paraphrastically derivable from a sentence found only by (1)—
(3), or is paraphrasable by such a sentence.
(6) Conditions of speech. There are certain additional properties of
language, which are due not to the processes of making sentences out
of words, but to the conditions of speaking. One is the arbitrariness of
phonemes, i.e. their lack of regular relation to the meaning of words or the
structure of sentences.1 Another is the contiguity of constructions (2.5.4). A
third is the linearity of constructions (2.5.3). Yet another property is the
ability of words to refer not just to extra-linguistic objects and events but
also to words in sentences: this makes a metalanguage possible as a subset of
language. And nothing prevents a word in a sentence from referring to the
occurrence of a word in its neighborhood: this makes cross-reference
(pronouns with antecedent) possible.
(7) Finally, language changes (12.2). Different parts of the structure—
sounds, vocabulary, likelihoods, reductions—change in different ways at
different rates. Languages change somewhat differently in different—
especially in separated—communities, thus creating dialects and ‘daughter’
languages and related languages.
Given the list above, we may note that the processes of sentence-making,
in keeping with the conditions of speech above, not only produce the
sentences of a language but also yield the particular constructions and
grammatical relations of each language, as regularities of those sentences.
These include: major word classes such as conjunctions, adjectives, adverbs;
special small word classes such as quantifiers or the English auxiliaries
(where characterizing them as special combinations of general constraints
circumvents the problem of their peculiar character and fuzzy membership);
modifiers (from secondary sentences); morphology (that this can be derived
via reductions explains how it can be a major yet not universal apparatus);
paradigms and their special categories (person, time, number); special
rules, and exceptions to rules, and in general the special domains of certain
reductions.
In spite of the specificity of the processes and conditions, the set of
sentences in a language is not well defined: there are many marginal
sentences, largely because the normal likelihoods of certain grammatically
important words (e.g. take a —, give a —, 12.2), and the domains of certain
reductions, are only roughly statable. There may also be cases of constructions
and regularities that do not apply to particular words in their expected
domain. Overtly, this situation resembles the ‘exceptions’ so common in the
traditional rules of grammar. However, not only do far fewer ‘exceptions’
remain after reduction analysis than in traditional grammar, but those that
1 The fact that speech is in general continuous plays no role here, because language is built
out of distinctions in speech, which constitute discrete entities.
The Nature and Development of Language 361
remain have more assignable causes or systemic properties. Partly, they are
due to lack of regularization, by uneven applications of analogy or by
retention of old, frequently used forms (as in the morphophonemics of is-
am-are). And partly they are due to language being constantly in process of
change at one point or another, so that when the description is sufficiently
detailed one is bound to come upon reductions and analogies that are in the
process of extending their domain or of becoming reinterpreted into the
operator framework—and perhaps differently among different speakers.
They may also be due to arrested syntactic change, as in the vagaries of do in
the course of the history of English." And they may be due in complex ways
to the overlapping domains of different reductions.
Although non-regular constructions, and exceptions to the regularities,
thus remain, it is necessary to analyze each language in respect to its
operator structure as far as this analysis will go. Only then can we see for
what forms, if any, the operator-argument reconstructions are impossible,
or too costly in complicated use of the basic apparatus, so that we may prefer
to accept the form as part of the finite material that is outside the operator-
argument system. We can also see how numerous these residual forms are—
far fewer than the exceptions and inexplicable forms found in traditional
grammar—and what characteristics and sources they have. And above all,
as is seen below, for the rest of the language we obtain a simple and unified
and procedural—even computable—structure, with everything referrable
to first principles, something which could not have been obtained by eclectic
and episodic analyses or by purely semantic considerations.
In contrast to the constraints of (l)-(4) above, which delimit what can be
and what cannot be in language, the processes described thereafter, including
the special constructions in particular languages, indicate specific structures
in the language without in general saying what is excluded.
In contrast to the derivational theory developed in the preceding chapters, it
is possible to think of a maximally organized accounting of the similarities
among sentences, somewhat along the lines of cladistic classification in
evolutionary theory. Such an approach would not assume the primacy
of initial sentence-formation over analogic back-formation. Rather, all
similarities would be treated on a par. Nevertheless, the fact that most
analogic formations are on the model of existing syntactic word combinations
makes the pre-analogic, derivational, constructions the key syntax. This
derivational syntax has been found to be best described as generated by a set
of constraints, if for no other reason than that the set of word combinations
of a language is too large and fuzzy to be listable, whereas the constraints
that generate them are interrelatable and listable even if sometimes with
fuzzy domains. However, the constraints with their domains do not quite
match the set of word combinations in the sentences of a language, a large
2 Otto Jespersen, Growth and Structure of the English Language, Macmillan, London, 1948,
pp. 218 ff.
362 Interpretation
part of the mismatch being due to analogic processes over parts of the
domains. We are thus back to recognizing the constraints as primary
processes and analogy as secondary. As to the derivations which describe
the relations among the sentences created by the constraints, they are
largely accretive and recursive: one sentence is derived from another by the
accretion of a constraint on that other, even if the constraint itself consisted
of removing something from the sentence rather than adding to it.
present tense have with state, condition (as presumed ‘source’ of -en) for
object (G290): one can hardly say (1) / have written three letters yesterday
evening (as compared with I wrote three letters yesterday evening; but in
French one says II a ecrit trois lettres hier soir)\ however, one can say I have
written three letters by now, like I have considerable experience by now, and I
have written three letters since yesterday and I have this condition since
yesterday. Differently from English, the French perfect has already beome a
past tense. Note that in English, the inability of nominalized sentences to
carry tense is circumvented by using have, which is structurally tenseless
(‘present’), to carry adverbs of past time: My having written three letters
yesterday evening left me with fewer chores today, where have must have
been added to the nominalized My writing three letters yesterday, since one
does not say (1).
The English auxiliary verbs also show various traces of their being the
ordinary verbs that they once were (G298).
Nevertheless, some constructions and distinctions may get lost from
language: in languages which have lost their case-endings on the arguments
of a verb, not all the distinctions have been taken over by prepositions or by
relative positions of the arguments. In addition, some new grammatical
situations may be formed. One example is the English auxiliary verbs which
could once apparently accept on the verb under them a different subject
than their own (as though one said / can for John to win); this is no longer
possible, and we have only I can win (from a required non-extant or no-
longer-extant source I can for me to win). Another example is the definite
article, which is apparently a relatively recent development, and which has
different grammatical requirements in different languages. Also, the English
-ing is a conflation of two very different earlier morphemes (a verbal
adjective and a verbal noun), but the single modern phonemic form has been
broadening its selection so as to become increasingly a single construction
(G41, 294).
We see that any theory of language, even just of language structure, has to
provide for change in language (2.5.7). It is also clear that change is partly
related to the existing structure, though not determined by it, and that it in
turn affects the structure in various ways and degrees. In the present theory,
there are intrinsic relations between structure and development (12.3.0).
Formulating the structure in terms of constraints is different from formulating it
in terms of objects or constructions that are defined by lists or by external
properties. The constraints leave room for particular elements to be unclear
as to their satisfying (i.e. being in the domain of) one constraint or another.
They even leave room for particular elements to satisfy a constraint in more
than one way (depending on how it is analyzed)—either because it is in
process of changing, or simply because it is irregular or not fully domesticated
(reinterpreted) into the system.
Furthermore, the history of sentences, words, and constructions is the
The Nature and Development of Language 365
history of their use, that is, of their combinations in speech. But it will be
seen below (12.3.5) that it is the very use of language that becomes
conventionalized and is finally standardized into the constraints of language
structure, so that the structure of language is an end-product of the history of
language, which indeed could hardly be otherwise. The relation between
structure and history makes it possible to evaluate and compare the change
in words and in grammatical combinations of words (i.e. constructions): for
example, to say wherein a word or construction is or is not the same as a past
word, or the same as a cognate word (i.e. a historically related word in
another language).
There are certain things which apparently do not change. These are: the
dependence-on-dependence relation as the formative relation for sentences,
the existence of stable though changeable likelihood differences among
words in respect to co-occurrence, the ability of words to refer to words, and
apparently the existence of low-information reductions. One indication of
their permanence is that these are apparently universal, and so must
have gone unchanged since their coming to be. A consideration for this
permanence is that if these relations are what created sentences as we see
them now, then they cannot be replaced by any other process except through
an innovation as basic as the ongoing structure of language—something
which is not likely to happen as long as language proves roughly adequate.
The essential adequacy and stability of these basic relations depends in
part on their being hospitable to changes and differences in words and
constructions. If the dependence relation that makes sentences held just
among fixed classes of words, defined by lists or external properties, and
if likelihoods of co-occurrence were fixed, then new words—entering by
borrowing or cultural change, or altered in their interrelations by phonemic
change—might be unworkable within the existing syntax. A pressure might
then grow to develop other ways of forming sentences, or informational
expressions, with these new words. But the fact that the relation is a
dependence-on-dependence of words, that the fuzzy likelihoods are change¬
able and not mutually exclusive, that reductions are made by low-information
properties rather than by a fixed listing, enables new words to enter and
function in the system. It is this that makes the system stable, and unchanging in
its fundamentals.
must have been lost in various languages, since not all languages now have
morphology; forgetting is not unknown in language history, as in culture
history and in politics. It is also reasonable to suppose that pre-syntactic
speech had intonations or special words or affixes for command and question
and the like.
words, over and above the meanings of the individual words. The meaning
of combining is that the parts are relevant to each other. And since the
form of the dependence is that the dependent words are said only in the
presence of the others, the interpretation is that the dependent word is said
about those others, that it predicates something about them. The dependent
word is then the predicator or operator on the words on whose presence it
depends; the latter are its arguments. This is not to say that the idea of
predication underlies the dependence relation. Remembering Piaget, one
does not have to understand in order to do. People do not need an awareness
of predication in order to speak or understand language. They need only to
know that certain words are not said except in the presence of certain
others—aside from later reductions—and that this means that the dependent
word (recognized by its position as well as by its likelihoods) is said ‘about’
those others. The awareness of the relation could only arise from experience
with its use in language, and becomes explicit much later, in grammar and in
logic.
The dependence relation creates sentencehood. Every combining of a
word with words on which it depends constitutes a sentence, and saying it
has the meaning of asserting—either asserting the words themselves, or
asserting that the speaker wishes, wonders about, demands, or asks the
contents of those words. It has thus created, out of a particular dependence
relation of co-occurrence among words, a new object (sentence), with a new
kind of meaning (predication), and with new relations (of requirement)
within its set. Non-predicational utterances exist in languages, chiefly one-
word ones such as Fire!, Ouch, Hello, John (the latter may be a case of
calling, or surprise, or pointing and the like). Whether these one-word
utterances should be called sentences or non-sentential utterances of the
grammar is a matter of terminology. It should also be pointed out that non¬
dependent (i.e. non-sentential) word combinations may or may not have
some rough meaning, according to the semantic possibilities of the words:
more in John Bill Mary saw with than in Up vacuum light eats. But a
sentence, i.e. a satisfaction of the dependence relation, always has meaning
and has it explicitly even if nonsensical: John saw Mary with Bill, Light eats
up vacuum.
The evolving of this dependence relation is the major step in the formation of
language structure as we see it now. Not only does it make sentences, all on
the basis of a single relation among words, but it also brings into existence
structural criteria: criteria for recognizing sentences and relations among
sentences, and criteria for distinguishing nlh-level word classes. It establishes a
structural constraint whose meaning is relevance or aboutness. In addition,
it establishes the concept of information as a particular (predicational)
relation among word meanings.
The separation of pre-syntactic vocabulary (12.3.1) from syntax (12.3.2)
makes it possible to think that different parts and functions of the brain were
370 Interpretation
The arising of structural effects from specializations of language use does not
cease with the initial formation of syntax. Structural development continues, in
long-range trends of change which can be characterized only vaguely. There
is some overall evidence of a trend to syntactic regularization, yielding
simpler grammars. Such simplification may be withstood by some of the
most frequently used entities (e.g. the variants of he: is, am, are, was, were)-,
this is to be attributed to frequency, not to any special semantic importance
of be. There is also evidence that structural irregularities, usually encapsulated,
are created on various occasions as by-products of historical changes.
Types of irregularities include the disappearance of cumbersome or
stylistically disfavored sentences that are the unreduced source of existing
reduced sentences; thereupon the base set of sentences lacks the material of
the reduced sentences (and so no longer contains all information of the
language) unless artificial source sentences are reintroduced, with the
necessary reductions, in formulating a grammar of the language. Another
type of new irregularity is the restricting of the arguments of particular
operator words. For example, it has been seen that the auxiliary verbs
(which are Olla) restrict the first argument (subject) of the operator under
them to be the same as their own subject: in lean solve it, the subject of solve
is / and the reconstructed source has to be l can for me to solve it. This is not a
restriction on the argument domain itself, which would violate the math¬
ematically important condition that the dependence of an operator is only in
respect to the dependence properties of the argument words and not in
respect to any external property or listing. And indeed, solve can have as
subject any zero-level argument (i.e. any word which itself depends on
zero), as can can. The only new requirement is that under can both subjects
must be the same. This more specific restriction is still an irregularity, but a
lesser one: rather than restricting the arguments of solve or can, it restricts
the arguments that solve or any other operator has when under can—and
not to a particular list but to being the same as for that occurrence of can.
There are also variousy other irregularities, as when what is obviously the
same word satisfies the conditions for two different classes (without current
grammatical derivability of one from the other) and has to be multiply
classified: e.g. that as a zero-level argument (in / know that), and that as
argument-indicator (in I hope that he's here) which shows that a sentence
(He's here) is appearing as argument (of hope). An irregularity of a different
type, in the complex conditions for quantity words, is seen in the alternative
forms A small number of people is coming tonight and A small number of
people are coming tonight (in the latter, small number of is a hard-to-derive
adjective on people).
There is another development which could take place only after orderly
372 Interpretation
sentences came to exist, but which modified in detail the nature of sentence
structure. This is the spread of analogic formation and analogic word
combinations, and other extensions of word use, which made word com¬
bination and local structures within sentences (e.g. phrases, paradigms)
more rule-based and less a matter of the semantic appropriateness which
was assumed in 12.3.2 to have underlain the development of the dependence
relation.
In contrast with such slow or small changes, there are great specializations
and further structures in language when the things being talked of (the
subject matter) have specialized relations among their perceived objects.
This is seen in concentrated form in sciences and above all in mathematics,
as noted in 10.2-4. The greater precision and intellectual complexity of
scientific writing is creating minor grammatical innovations, as in the use of
respectively to make cross-connections, e.g. in A and B produce Y and Z
respectively (from A produces Y and B produces Z); such a criss-cross of the
source sentences would be exceptional in natural-language grammar, and
the use of respectively in colloquial English is not restricted to this environ¬
ment.
Other innovations are larger but still can be housed within a generalized
language structure. One innovation arises out of sentence-sequences, i.e.
out of dependence among words of neighboring sentences; these also have
inherently a limiting feature of having a common subject matter. The
simplest case of this is seen under certain (conjunctional) Ono, which
connect two sentences by operating on the pair of them: here the high
likelihood is that there is some word that occurs in both sentences (9.1). A
larger development is that of a double array of word-class dependences in
the sentence-sequences that constitute discourse (9.3). Another develop¬
ment is the singling out of certain constraints on sentence-sequences,
whereby the last sentence has the semantic status of a conclusion from the
others: proof in mathematics (sayable in English if we wish), and—in a far
less specific way—well-constructed argumentation in science. This creates a
new semantic power for language: a distinguishability of non-loss of truth
value or of plausibility (in science sublanguage).
More fundamental innovations in language structure arising from subject-
matter limitation involve the overall properties of the operator-argument
structure, differently in mathematics and in science languages. In mathematics
and mathematical logic, likelihood differences within an argument or operator
domain have been eliminated. In argument status, the differences are
eliminated by having only an indefinite argument, a ‘variable’, meaning in
effect only ‘something’ or ‘a given thing’ (within a domain), it being under¬
stood that within an indicated sequence of sentences each occurrence of the
same variable-symbol is a cross-reference to the first (while each occurrence
of another variable-symbol is not specified as to sameness with the first,
unless stated to be the same or different). And the meaning of operators
The Nature and Development of Language 373
there is related not to likelihood differences but to truth tables that specify
the truth value of the resultant sentence from the truth values of its
arguments; likelihood then has no semantic or structural relevance.
In science languages, as was seen in 10.3, the consistent restriction of
subject matter, and the attention in each set of texts to particular properties
or relations within it, is reflected in particular operators being used only on
particular words out of their possible natural-language argument domain.
What is new here to language structure is the ‘only’. As noted, in English as a
whole, the 0,m eat may never have occurred on the N, N pair vacuum, light;
nevertheless. Vacuum eats light is a sentence and we even know what it
means, though the meaning may be nonsensical if eat is taken as ‘nourishes
itself on’, or simply untrue in a general sense if eat is taken as ‘effects
diminution of. In contrast, in immunology the On„ contain cannot occur on
the ordered N, N pair antibody, cell (though it occurs on the pair cell,
antibody): the possible though nonsensical English sentence Antibody contains
cells is not considered a possible sentence of immunological language.
This creates subclasses, largely superseding likelihood gradation: a novel
language-structure situation in which there are several sentence types
(operator-argument structures) at the same level (e.g. the Onn level, or the
Ooa level). And it creates a novel semantic power for language: a dis¬
tinguishing and eliminating of irrelevance and nonsense.
The syntactic systems of science and of mathematics have lost the math¬
ematical-object status (because their dependence is defined on particular lists
of argument words), and they have to be described in an external meta¬
language, that is, in some natural language. Because of the semantic
differences expressed by these various structural innovations, the formal
notations of mathematics and science languages (as in 10.5) can be translated
into natural language, but natural language as a whole cannot be translated
into them.
Both in the regularizations, irregularizations, and changes in the language
as a whole, and also in the larger innovations of science languages and
mathematics, we see that change and furtherance in structure can go on, that
the development of language is not at an end. We also see that at least the
larger changes and innovations arise from the different external conditions
(e.g. subject matter) and internal linguistic pressures (e.g. analogy) under
which language is used, and from the conventionalization of such usage. The
conventionalization is seen, for example, not only in the development of
explicit rules of mathematical well-formedness and proof, but also in the
peer-pressure criticism by colleagues that keeps scientists’ arguments within
inspectable grounds.4 Both the conjectured initial formation of syntax and
its further development are thus institutionalizations of ways of effective
recording and communicating of information about the world.
4 Cf. P. Latour and S. Woolgan, Laboratory Life, Russell Sage, New York, 1979; K. D.
Knorr-Cetina, The Manufacture of Knowledge, Oxford Univ. Press, Oxford, 1981.
374 Interpretation
5 Of least interest here is the fact that the specific forms of reductions in a particular language
may make it impossible to express certain things in it. For example, a language in which absence
of a plural suffix means 'singular' makes it impossible to leave the difference between singular
and plural unstated. Furthermore, in such a case it becomes unclear whether one can say, for
example. Either John or his co-workers is responsible or Either John or his co-workers are
responsible. Another type of situation is seen when we consider That he failed is clear, formed
by is clear operating on He failed. If now we want another second-level operator (e.g. is not
universally accepted to act on That he failed is clear we find that the grammatically ‘correct’
form, as in That that he failed is clear is not universally accepted, is generally avoided in favor of
That he failed is clear is not universally accepted (which can only be described by an ad hoc
reduction of the two occurrences of that to one).
(1 Another inadequacy involves the antecedents for pronouns. Consider for example the
following sentence: That night he wrote to Tanya telling her that the gipsy woman was in despair
and that he could no longer marry her, Tanya (A. N. Wilson, Tolstoy, Norton, New York,
1988). Here marry her is ambiguous, and its ability to refer to Tanya would have been even
weaker if the author had written saying that instead of telling her that. The alternative, of not
pronouning here, by writing marry Tanya, is virtually unavailable: He wrote to Tanya that he
could no longer marry Tanya would seem to refer to another Tanya. The easiest solution is the
stylistically forced marry her, Tanya.
The Nature and Development of Language 375
7 However, factoring on purely semantic grounds (e.g. kill into cause to die) yields little or
no advantage, and will virtually always run foul of differing combinatorial restrictions or
likelihoods: from The Queen's command caused his words to die unspoken, we cannot obtain
376 Interpretation
. . . killed his words unspoken; and from I'm just killing time here we cannot obtain I'm just
causing time to die here.
x The particular meanings of extant words are in part accidental, although there are many
meanings that are found with little difference in a great many languages. Some types of
meanings are harder for words to come by. For example, many abstract concepts such as (a)
understanding, important, or (b) truth, magnitude, are available in single words either: (a) only
as a result of metaphoric specialization of words of concrete meaning (G405), especially of
morphologically complex words; or (b) only as a result of operator nominalization (from
nominalizing a sentence with zeroable indefinite arguments: truth from things' being true). In
addition, the meaning of a word can be greatly modified by adding modifiers to it, which
changes the selection since the further operators on the given word are then determined by the
word as modified (as in The snow house melted). (It should be pointed out that, as in all
derivations in the present book, ‘from’ above means specifically ‘derivable by established
reductions or other transformations in the grammar’.)
The Nature and Development of Language 377
in the case of walk under O,, (any such occurrence being then called ‘not in
language’ or ‘wrong’).
Changes of usage do not only conventionalize tighter structuring. They
can also create word classes that at least in some respects cut across the O,,,
O0, etc. partition, as in the case of prepositions, which occur in many
operator statuses, and which we can attribute to just one operator class only
by calling upon transformations for deriving all other uses. Indeed, changes
in use can reduce some of the tight structuring of grammar. For example, in
various languages the gradual phonetic subsidence of case-endings, which
had been a structured and required set of operators, left prepositions, which
constitute a less structured set, to fill their syntactic place.
A prominent situation of usage-change-creating tight structure is seen in
the special constraints of special word classes; such processes have some
similarity to what is called phonemicization, wherein changes in the pro¬
nunciation of certain sounds can make them phonemically distinct from
other sounds in ways that they were not before. One syntactic case is the loss
of the second subject under English auxiliary verbs such as can. As has been
seen, those verbs may have had a full sentence as their object (second
argument), as though one could say I can (for) him to do it meaning
something like ‘I know how for him to do it.’ In time these verbs ceased to be
used in cases where the subject of the verb under them was different from
their own subject. As the sameness of subject for can and the verb under it
became a requirement, a new constraint arose in the language. These verbs
also have an unremovable tense, and they zero the to under them. Since
several verbs share these requirements, there rises a small subclass of
‘auxiliary’ verbs different from other verbs. This subclass is thus the product
of a set of constraints acting on the same words. The tight structure here is
lessened because the domain of this set of constraints is a bit fuzzy: some
verbs have only some, not all, of the constraints of the auxiliaries (e.g. I
ought not do it by the side of / ought to do it.)
Another important constraint which presumably grew out of use preference
is the verb-adjective distinction. Both of those word subclasses are operators,
but, as has been seen, those operators that had higher likelihood of tense-
change within a discourse had the tense attached directly to them (e.g.
walked, aged) and are called verbs, while those that did not had the tense
attached to a separate ‘carrier’ be (e.g. was brown, was old) and are
called adjectives (G188). While operators are graded in respect to tense-
changeability, they are partitioned in respect to tense-attachment: sleep is a
verb even for a hibernating bear; ill an adjective even for an intermittent
illness. Note that the same distinction can be made without creating a
grammatical requirement: Chinese has only optional suffixes for ‘past time’,
but these occur more commonly on momentaneous and active predicates
than on stative ones.
In all these cases we see much the same kind of structure-creating process.
380 Interpretation
12.4.0. Introduction
It should first of all be clear that language is not directly a more developed
form of animal communication. Not to enter into the discussions of animal
communication and ethology, we may nevertheless judge that the difference
between human and animal ‘language’ is one of kind, not of degree. Animal
communication contains sounds or behaviors which one might consider the
equivalent of words; but, in present knowledge, it has nothing of the
dependence relation which creates sentences and predication. A less obvious
but fundamental difference is that there is little present evidence of the
respondent animal exercising any choice in type of response (to stimulus or
‘symbol’), other than whether (and perhaps how strongly) to respond.
Existing evidence suggests that animals have developed innate propensities
to particular behavior in response to stimuli; the stimuli include behavior
(sounds, postures, etc.) of others in their species. This can be called
rudimentary communication; but we cannot say that the behaviors are
subjectively signals or symbols, and there is no systemic signalling struc¬
ture with potential for variation and expansion. The apparent behavioral
sequences, as in some aggression and mating behavior, are not innate
sequences within an animal, but are a co-evolved interaction produced by
each given participant responding to the stimulus offered by the other in
response to the animal’s own behavior or presence. The whole is more like
the co-evolution of flowering plants and pollinators such as bees than like the
functioning of human language.'*
y K. von Frisch, The Dance Language and Orientation of Bees, Harvard Univ. Press,
Cambridge, Mass., 1967; G. Baerends, C. Beer, A. Manning, eds.. Function and Evolution in
Behaviour, Oxford Univ. Press, Oxford, 1975; D. J. McFarland, Feedback Mechanisms in
Animal Behaviour, Academic, London, 1971; M. Lindauer, Communication among Social
Bees, Harvard Univ.’Press, Cambridge, Mass., 1971.
The Nature and Development of Language 383
10 L. Weiskrantz, ed.. Thought without Language, Clarendon Press, Oxford, 1988. In the
large and uneven literature on this subject, special reference should be made to two of the early
modern contributions: the writings of Jean Piaget, especially Le Langage et la pensee chez
I'enfanl, Delachaux et Niestle, Paris 1924, and Jean Piaget and Barbel lnhelder, Memoire et
intelligence. Presses universitairesde France, Paris, 1968; and also L. S. Vygotsky, Thought and
Language, MIT Press, Cambridge, Mass., 1962. For the status of language in respect to
thought, it may be relevant that almost all writing systems, though they start with pictures as
direct representations, develop into signs representing words and their sounds.
11 This vague relation of thought to the basic structure of language is quite different from the
unfounded suggestion that the detailed grammar of a particular language affects or determines
the thinking of its speakers, as in the claim that languages which have no grammatical tenses
might block the development of physicists who need t as an essential variable.
384 Interpretation
control is well known: aside from those pervasive social institutions which
serve solely for control over the people, the bulk of social, cultural, and
personal conventions both facilitate as well as limit—often simultaneously—
the behavior of the people who function within them.
What has been seen in 11.1, 11.6, about the meaning of the structural
entities of language suggests that the system developed in language is
consonant with important features of advanced thinking. In overview, the
relation of language structure to thought may be summarized as folows.
(1) For humans to use words (singly), i.e. sounds that are associated
with (‘mean’) the user’s states or received stimuli, is consonant with
animal behavior (even if the animals are not thereby subjectively ‘sending
messages’); hence, this in itself may not constitute thought as a substantively
new phenomenon peculiar to humans. (2) Using word combinations in the
presence of combinations of the stimuli referred to need not be considered
as essentially different from using single words, even if in the objective world
certain of the words in the combination ‘mean’, roughly, objects, while the
other words mean states or properties of those objects. (3) However, the
fixing of an ‘operator-argument’ relation between the object words and
the property words in each combination, even though it presumably was
only a standardization of a naturally-occurring constraint on the combina¬
tions, created predication as a semantic relation between the words in a
combination, whatever they be. This had two semantic effects. (4) One was
that any new combination had meaning by virtue of the predication relation
between known words, not only by virtue of the association of the combina¬
tion with particular experience; thus, the meaning of words shifted from
being extensional (i.e. the specific associated referents) to being intensional
(i.e. anything that could fit the predicational relations in which the word was
used). (5) The further effect of (3) was that predication could be said
independently of actual experience, since a predication would have the
meaning of one word predicating about another even if there was no
experience with which that word combination was associated. It is reason¬
able to see (4) and (5), both of them due to (3), as important features—
perhaps facilitators—of thought going clearly beyond any pre-human level.
Before stating this explicitly, we survey the structural meanings. As to
words: first, words (or morphemes) are discrete entities, since they are
sequences of discrete phonemes (or of phonemic distinctions): this makes
the meaning-bearing symbols explicit. Second, when discrete entities are
used to represent in a regular way the objects or events (and relations) with
which they are associated (end of 1.4), they necessarily come to mean
selected perceived aspects of the objects or events, rather than the less-
selective representation contained in, say, a picture. Third, since by 11.1 all
words can be used not just for a unique object or event but also for various
ones (any ones that fit the meaning the word already has), each word is in
effect defined not by a specified set of referents (objects or events), but by
The Nature and Development of Language 385
the properties that are common either to all its referents or to various subsets
of them. As to sentences: the operator-argument relation over a finite
vocabulary creates sentences that suffice to indicate or represent not the real
world so much as perceptions related to the phonemena of the world.
In addition, the informational structure of language provides structural
possibilities that go beyond information, expressing and perhaps even
fuelling the imaginative capacities of thought. This potentiality in language
arises from the fact that words whose meaning comes from association with
perceptions of the world are naturally restricted in their co-occurrences to
the way perceptions in nature co-occur (11.6.1). As a result, the restriction
of combination (selection) among words reflects the meaning of those
words. But this restriction leaves open the possibility of using the unused
combinations. The unused combinations present new meanings, which may
be nonsensical, interestingly imaginative, etc., in respect to the perceptions
that had restricted those combinations. And old words, or newly made
phoneme-sequences, can have new meanings which are determined by their
new selection of word combinations rather than by direct association with
observed phonemena (e.g. unicorn).
A crucial contribution of language to thought is, then, the separation
of predication (grammaticality) from word-choice (meaning), due here to
distinguishing operator-argument dependence (non-zero co-occurrence like¬
lihood) from all the specific likelihoods. Due to this, when we want to
make a new combination of meanings we find at hand a structure that puts it
into a publicly-recognizable predication, without our having to develop a
new bit of grammar for the new word-combination—a new consensus for
what the new juxtaposition of meanings should mean. This permits one
important type of thought—predication, with the interrogative, optative,
and other modes derivable therefrom—to go beyond the direct meaning-
combinations of our experience. (Compare the lack of useful new meanings
in totally ungrammatical sequences of words.)
A second crucial contribution is the recursive extension of predication to
create predication on predication (more generally, all the second-level
operators, such as Oa). Selectional restrictions on these in respect to their
arguments (especially in the case of connectives) create the machinery for
structured extended thought, discourse, and proof.
We see that the projection of phenomena onto perceptions, the ability to
define by properties rather than by denotational extension, the ability to
imagine beyond what has been observed, all arise in the structuring of
language. But these are also part of the apparatus of advanced thinking. It is
not a question of whether these capacities came from language to thought,
or the converse. One can reasonably suppose that these developments in
language capacity facilitated developments in thinking, and advances in
thinking facilitated the utilization of language-structure potentialities.
It should perhaps be noted here that because of the structural differences
386 Interpretation
between language and logic (primarily due to the different selections that
individual words have, 3.2.1), the attempts to represent the meaningful
structures of language in the structures of mathematical logic have not been
successful. Logic goes beyond language in the capacity to structure proof,
and language goes beyond logic in the capacity to differentiate meaning.
It may also be mentioned that the way complex thinking is expressed in
language structure casts doubt on the claimed potentiality of computers to
carry out an equivalent to thinking. Computers can indeed be programmed
to analyze sentences in respect to the known structure of their language
(ch. 4 n. 1). But, first, we know that this structure is based on the recognition
of phonemic distinctions, the phoneme-sequences that constitute words,
and the operator-argument relation among the words of a sentence. We
have seen (12.3.5) how these could develop in the course of the intricate and
public behavior of conscious living organisms interacting with their environ¬
ment, but we do not know how computers could develop such structures
without being given them in a program. Second, nothing known today would
suggest that computers can carry out, sufficiently for any serious purpose,
the selective perception and the subjective response to the outside world, let
alone the awareness, that determine the ongoing use of human language—
the word-choice of a sentence and the sentence-sequence of a discourse. The
fact that this response and awareness must have material grounds within the
processes of the human organism does not mean that the particular man¬
made mechanisms we know today should be expected to simulate those
processes.12
It follows that language cannot be understood as a further development of
some existing system, but has to be explained de novo. To do so we have to
go by the properties which seem to be necessary and universal for language.
Before doing so, in 12.4.2-4, we note here which universal properties are
not systematically necessary. Some, such as linearity and the limits on
distinguishability of sounds, are, though universal, not in principle necessary to
language but are due to the fact that language developed in speech rather
than, for example, in gesture or markings. Indeed, language has had to
circumvent some properties of speech: to circumvent the continuous sounds
by means of segmentation, the continuity in differences among sounds by
means of distinctions, the paucity of sound distinctions by using phoneme-
sequences for words, and the linearity of word-sequence by taking it as a
projection of a partial order on words (which is retained in writing even
though there a lattice representation would be possible).
Other universal properties are by-products of the linguistic system rather
than being themselves historically formative factors: such are the capacity of
a language to be its own metalanguage, the capacity for cross-reference, the
i: Against the claims that computers can be ‘taught to think’, publicized under the name of
Artificial Intelligence, see Roger Penrose, The Emperor's New Mind, Oxford Univ. Press,
Oxford, 1989.
The Nature and Development of Language 387
To understand the nature of language we may ask what created the order—
the structure—that constitutes that system, and how this structure came to
be. Syntactic order does not come from phonemes or meanings. The
vocabulary by itself has no intrinsic structure that can be stated: the
phonemic structure of words is not relevant to their syntactic behavior, and
their meanings do not fall into a sufficiently sharp and orderly system.
Thereafter, the formation of sentences consists in imposing a meaning¬
bearing ordering (dependence) of word-occurrences. This dependence is
generalized by aggregating words into classes in such a way that all sentences
have the same dependence formation. The formation of elementary sentences
388 Interpretation
the one hand and objects on the other; in the different but relatively stable
likelihoods of particular operators in respect to particular arguments.
2. There are ongoing communicational factors. These are primarily in the
service of effectiveness; in instituting distinctions (first of all, yielding
phonemes); in serving public consensus, so that speaker and hearer under¬
stand language approximately in the same way; to facilitate transmission
beyond face-to-face; to reduce the length (and possibly, in some cases,
the complexity) of messages without loss in their information. Some com¬
municational changes may also be in the service of social convenience: e.g.
the introduction of more attention-catching forms, as in the periphrastic
tenses; or the avoidance of socially undesired forms, as in euphemisms.
These motive sources may come into conflict with each other, as when
certain reductions create ambiguities (primarily degeneracies) that complicate
the hearer’s decoding of what the speaker meant, i.e. complicate their
consensus.
3. There are ongoing influences of one set of forms on another, within the
forms of language and their ways of combining. These include the develop¬
ment of patterns within grammar, and also the many kinds of analogical
formation, including the cases where word-frequency and word combinations
are extended on the basis of how other words are used.
The third type above may be considered ‘internal’, the first two ‘external’
and in various senses ‘adaptive’.
That language is self-contained we know from its containing its own meta¬
language, and from the properties seen in the preceding chapters and in
12.4.1-2. Here we will survey briefly how this condition, though roughly
stable, nevertheless made possible the growth of the complicated systems
which are to be found in language. It will be seen in 12.4.4 that language
structure is accretional, and it has been seen in 12.3.4-5 that both language
structure and language information stay within certain limits. In this sense
we can say that language makes itself, as indeed everything that is not made
or planned by an outside agent must do. The overall self-organizing condition
in language is that on the one hand the existing forms limit what is available
for use, and also to some extent affect (direct) this use, while on the other
hand use favors which forms are preserved or dropped, and in part also how
they change; this includes the ability to reinterpret borrowed and other
external material as though structured by the existing system of the language.
Throughout this study, we have seen that the non-random combinations
of words in sentences are best described by constraints, and that each
constraint is defined on previous constraints, down to the unstructured
392 Interpretation
(1) The self-organizing process begins after the existence of some (arbitrary)
stock of words. In phonemics we see a structure that developed around its
existing elements. Phonemes could not have existed before words; hence
words must have originally existed as sound-sequences without phonemic
distinctions. The development of phonemic distinctions among these sound-
sequences has taken place in all languages, apparently, and has made
specific distinctions among words. In so doing, it has created fundamental
features of structure: discreteness of the ultimate elements of language
(since a difference is a discrete entity), fixedness of word-form, and repetition
as against imitation in transmission of speech (and writing, in contrast to
pictorial messages).
(2) The chief process (in 12.3.5) begins after the existence of word
combinations, which were presumably at first casual and unstructured. The
institutionalization of the actual non-occurrence of combinations into a
restriction on combining creates a structural requirement (dependence) out
of a meaning-related customary behavior.
that creates the English auxiliary verbs, where some of the reductions acted
on can, may, need, etc., while others acted on can, may but not on need, and
soon. And the utility of repeating a grammatical relation between particular
words in successive sentences of a discourse can lead, over the whole
discourse, to a double array of its sentences.
Furthermore, the accretional property, whereby events act on the resultants
of prior events, has in many situations the effect of preserving the structure
created by the prior events, and even of strengthening its importance by
providing substructures within it—in some cases creating the effect of a
direction of development. For example, it has been seen that second-level
operators act on first-level operators (hence on the elementary sentences
created by the latter) in the same way as the first-level operators had acted
on their zero-level arguments in forming those elementary sentences (and
comparably when the second-level operators act on second-level operators).
This recursivity establishes a single structural process for all sentences.
Thereafter, transformations are able to act in the same way on all sentences,
in terms of this common structure, thus strengthening the importance of the
structure. A different kind of example is the extension of prepositions from
their original argument-requirement status to many other syntactic statuses.
Each such extension was probably fostered by all the previous extensions of
the prepositions.
We thus have a positive feedback, in which the formation of structure
favors accretions—including subsidiaries—of various kinds to that structure,
all of which further establishes the structure.
The further development of language constraints—in discourse structure
(9.3), argumentation (10.5.4,10.6), and sublanguage (ch. 10)—has a property
relevant to this survey. For while all these further structures are clearly
fostered by what is being talked about—by the nature of discourse, argument,
and subject matter—it can be seen that the apparatus which is here developed
is not something new unrelated to the language structure so far, but is rather
a set of further constraints in the spirit of language structure and within
openings for further meaningful constraints that had been available in
language structure.
We now consider the motive sources which, given the existence of words and
structures as in 12.4.3, directed the many steps of structural development,
and affected their survival. It will be seen that these steps have features of
evolutionary processes.
(1) Toward efficacy of communication. First, many of the developmental
steps increased the effectiveness of communication and transmission of
words and then of sentences. When words developed from being sound-
The Nature and Development of Language 395
verbs). Without the second level, language would have only elementary
sentences. But by having second-level operators act on second-level operators,
we can get the effect of higher-level operators. The reasons for the length
and depth of arguments stopping at two may thus be not very different from
the evolutionary advantages in biology that let sex differentiation stop at
two.
In respect to the stopping of developments, it may be relevant to note a
similarity and difference between the evolving of language and biological
evolution. In language, communicational ease and distinguishability can be
considered to correspond to the success-in-the-environment in the ‘survival
of the fittest' model; and developments toward such efficacy may be expected
to go no further than the environmentally imposed needs. In contrast, the
old question of why some evolutionary developments seem to go on beyond
what is needed for survival in the environment can be met in biological
evolution by taking into account a second ‘cutting-edge’ which differs
somewhat from the demands imposed by the external environment: namely,
the effect of intra-species competition, e.g. for territory or in mating
selection, and the effect of co-evolution between species or between individual
behaviors within a species (e.g. in animal communication, 12.4.1). In this
latter cutting-edge, the favoring of particular ones among individuals already
changed by favoring up to that point can continue in one direction without
obvious brakes. This situation does not arise in language, except possibly in
such favorings as for constructions or word uses that are more attention-
getting than the current usages. Language therefore does not seem to have
runaway directions of development.
(3) Lack of plan in detail. Although the development in (1) above may
give a global direction to language, many detailed developments are less
consistently directional. This limits the regularity and simplicity of the
system. For example, many formations are not fully carried out. The
differences among the sounds (‘allophones’) which are complementary
alternants of a phoneme are in part regular over various similar phonemes
and in part unique; similarly for the morphophonemic alternants of a word.
Reductions are also in some cases similar to others, and in some cases unique
(e.g. in zeroing to come under expect).
At any one time, various structural developments are incomplete, and
some remain so. Special constructions such as the auxiliary verbs can be the
product of cumulative centuries-long changes in form and in restrictions of
use; but not all of the words have all the characteristic properties, so that the
subclass remains a fuzzy one. Regularization is a tendency, not a plan or a
law, and it can stop short at certain points (e.g. at some of the most
frequently used suppletions such as is-am-are) or at half-way stations (as
happened in the development of do toward being a pro-verb). Most generally,
there is the conjecture of Henry Hiz that every word may have a unique
398 Interpretation
arise from the regularizing pressure of the existing system (12.4.2-3). In any
case, the survival of the changes depends in part on their contribution to the
informational and communicational efficacy of the linguistic system. In this
way, the relation of structure and function, which originates in the form-
content correspondence of 11.4 and 11.6, is maintained through the history
of language change.
In (l)-(5) above, we see processes characteristic of evolving systems. The
whole is of course very different from biological evolution, the more so since
language lacks essential properties of organisms, such as necessary death,
descendants, and speciation. The source of variation is not mutational or
‘random' in the case of language; in language we have to explain not only the
survival-value of a form, but also its source—how it came to be. (The
sources of variation would be considered ‘random’ if they are unrelated to
what affects survival.) Also the driving-force or ‘cutting-edge’ of evolutionary
change is different, though with suggestive similarities. In both cases, a
variant could take hold primarily when it satisfied certain external conditions.
In biological evolution, as noted above, one can distinguish on the one hand
natural selection’s species-survival which includes increased opportunity of
having descendants for the individual or for others in his group (some of
whom may be expected to have much the same as his genetic equipment),
and on the other hand mating-favoring properties which also increase the
chances of having descendants, but which are less limited to what is favored
by the environment, and indeed may be in general more unlimited.
In language, a slightly comparable distinction exists between informational
efficacy and competition within an informational niche. The efficacy criterion
holds that survival of a linguistic form (phonemic distinction, word, word
class, construction, reduction, etc.) is favored if it is more efficacious in
communicating information—including here the cost to speaker (especially
in encoding) and to hearer (in decoding). This efficacy can be understood as
success in eliciting a response appropriate to the speaker’s message; this
formulation skirts the subjectivity issue (in animal communication, is the
‘message’-issuing ‘speaker’s’ cry or posture subjectively communicational,
or is it just an expressive behavior which constitutes a stimulus eliciting
a particular response?). In the competition for niche, sentences containing a
particular form may be favored in a particular language community due to a
variety of properties: they may be clearer, more forceful or contrastive,
easier to say or to understand, more comparable systematically to related
sentences, etc.—all in a never-ending series or escalation of innovation.
The changes due to efficacy are for the most part long-range, universal
and basic for language, more regular with very few exceptions, and may
present the long-range picture of a progression. The changes due to niche-
competition are relatively short-range, and particular to individual languages,
language families, or periods. They are less closely bound to—and limited
by—informational efficacy, and if some have a directional character, it is
402 Interpretation
19 This catalog is not entirely the same as the least-grammar form of characterization
required in the present theory. And neither of these is entirely the same as the way an infant or
an adult learns a language, or the way language came to be historically, or the way language
information is processed in the brain.
2(1 How the brain processes both the speaking and the recognizing of speech, and what
specific capacities of the brain are involved, is still little known. Clinical research in aphasia and
neurological research have both shed some light on the problems. For relevant evidence from
aphasia, see in particular: A. R. Luria, Traumatic Aphasia, Mouton, The Hague, 1970; M.
Critchley, Aphasiology and Other Aspects of Language, Arnold, London, 1970; K. Goldstein.
Language and Language Disturbances, Grune and Stratton, New York, 1948; H. Head,
Aphasia and Kindred Disorders of Speech, Macmillan, New York, 1926. In any case, what has
been seen in the present book about the relevant structure of language suggests that it is not a
direct reflection of the structure of neural action in the brain. Rather, language may be
understood as a particular relation of its own—constraints, above all dependence—among
neural events and mechanisms.
The Nature and Development of Language 405
:i This survey indicates what is needed in order to use a whole language. To understand an
individual sentence, all that is essential is a recognition of the phonemes of the language and the
argument-requirements of the words present, hence (a) and (c) above. This is seen, for
example, in the fact that any finite syllable-satisfying phoneme-sequence of a language is
acceptable as a sentence if we can take an initial sub-sequence in it to be some name or newly
created technical object and the remainder to be some action or state of that person or object;
preferably either the initial segment should end in the phonemes of the plural suffix or the
second segment should end in the phonemes of a singular tense suffix.
22 The distinction made in 12.4.1 between human and animal language does not mean that
human language requires a totally different suigeneris principle, nor does it deny the possibility
of evolutionary continuity from animal communication (cf. D. R. Griffin, The Question of
Animal Awareness, Rockefeller Univ. Press, New York, 1981). The fundamental new relation
(dependence) among ‘symbols’ (words), and the far greater stock of symbols (with their varying
phonemic shapes), can presumably all be learned within the general structure of the so much
more massive and complex human brain. It should be noted in this connection that there is no
present evidence as to how old is human language as a syntactic system. Since a large stock of
words may have existed for a long time before the syntax-creating dependence relation
developed, such evidence as there is for the existence of Broca’s area in the brain of early man
(R. E. Leakey and R. Lewin, Origins, Dutton, New York, 1977, p. 190) may mean only that a
large stock of words was then growing. Syntactic language, with its facilitation of complex
sustained thought, might have developed only much later, perhaps at the time of the rapid
growth of material culture and co-operative production.
.
‘
Index
Zeroed word: Zeroing 82,97, 105, 106, 110, 113, 140, 179,
keeps modifiers 96 210-11
meaning retained 326,328 fixed position 89
status taken over 116 operator 238-9
still present 116,399 under and 82-3,93,101
of which is 96