10 1075@jhp 00017 All

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Lexical bundles from one century

to the next
An analysis of language input
in English teaching texts

Rachel Allan
Mid-Sweden University

This corpus study compares lexical bundles found in the language input of a
selection of historical and current English language teaching materials to see
what insights they can give into changes in spoken language use. English
teaching texts published between 1905 and 1917 were used to construct a
historical corpus, and a collection of English language self-study texts
published between 2004 and 2014 were used for comparison. Both groups of
texts focused on spoken language. The most frequent three-word lexical
bundles extracted from each corpus varied considerably. The contemporary
texts showed both a greater use of formulaic language and more syntactic
complexity within it, while the historical texts relied on simpler structures.
An exploratory analysis of the lexical bundles in the historical texts suggests,
however, that viewed in conjunction with other historical sources, they can
assist in building a picture of spoken language use of the period.

Keywords: corpus, English teaching texts, lexical bundles, spoken English

1. Introduction

At the turn of the twentieth century, a movement was underway to teach the
English language to non-native speakers in a practical way, as a language to be
spoken. This brought about a change in language teaching materials. Textbooks
were not confined to dictionaries and grammars, but took a more functional
approach, with a focus on phrases and dialogues used in everyday situations –
a focus which continues in present-day English coursebooks. As there is little
recorded dialogic everyday speech from this period, this study follows the
example set by A Corpus of English Dialogues 1560–1760 (CED) (2006) in using
written dialogue as an opportunity to explore the kind of language in use at this
time. Patterns in the language are identified by examining lexical bundles (Biber
https://doi.org/10.1075/jhp.00017.all
Journal of Historical Pragmatics 19:2 (2018), pp. 167–185. issn 1566-5852 | e‑issn 1569-9854
© John Benjamins Publishing Company
168 Rachel Allan

2006) – that is, recurring sequences of words – and comparisons are made with
language input in current teaching materials. However, the fact that this is not
raw spoken data, but speech provided by the textbook writers, raises a number of
considerations. Although the aim of these historical textbooks was to teach “the
material of everyday language” (Jespersen 1904: 23), compromises for teaching
ease and perceived “correctness” may have been made. Similarly, modern day
texts do not necessarily reflect authentic spoken language (see Allan 2017). The
target users of the texts and their circumstances must also be considered, as these
may also impact on the language taught. My aim is to explore to what extent
the lexical bundles contained in these textbooks can illustrate language use and
change, considering the constraints outlined.

2. English language teaching past and present

The period from 1880 to 1920 was an important one in language teaching. Known
as the “Reform Period” (see Howatt and Smith 2014 for a comprehensive
summary), it was characterized by a focus on teaching spoken language. This
was not unprecedented, as demonstrated by the language teaching handbooks
included in the Didactic works component of the CED. However, formal language
education in Europe had generally followed classical teaching methods, whereby
modern languages were taught purely as an academic pursuit, like Latin or Greek.
Within the Reform Period, teaching methods were led by the principle that
language is a practical means of communication (Bayley 1998: 43). Compre-
hending and speaking the language was prioritized, grammar tended to be taught
inductively, and in Europe, there was a focus on the new science of phonetics (e.g.,
Sweet 1890). In the United States, where there was a strong influx of non-English
speaking immigrants, the “Americanization Movement” began, whereby all immi-
grants had to be able to speak English to become a citizen (Cavanaugh 1996: 42).
English was needed for daily working life, leading to a growth in evening classes
for adult learners. The “Natural Method” of teaching structured conversations
emerged from this, becoming known as the “Berlitz Method” as it was adopted
and popularized by Berlitz (Howatt and Smith 2014: 83). The same approaches
spread within Europe, where they became generally known as the “Direct
Method” (Howatt and Smith 2014: 84). The historical textbooks used in this study
reflect the features of the period: they focus on dialogues pertaining to everyday
situations; they assume student participation through reading the dialogues and/
or drilling, and grammar is treated as it occurs in context.1

1. For this study, texts using phonemics extensively have not been used.
Lexical bundles from one century to the next 169

In contemporary language teaching, the preoccupation with teaching


language as it is spoken has never been greater. Increasingly, English language
teaching materials draw on large-scale language corpora to access the most
frequently used words and phrases to teach, and input language is based on
native-speaker discourse (see, for example, McCarthy et al. 2014). Language
teaching pedagogy nowadays tends to be learner-centred and takes a communica-
tive approach; a range of methods may be used, acknowledging the situated nature
of teaching, and different learner needs and preferences. Classroom course books
reflect this to a large extent (see reviews by Masuhara et al. 2008, and Tomlinson
and Masuhara 2013). The self-study English books used here differ from class-
room-based books in some respects. Although they aim to teach language as it is
used, they follow an approach that is broadly in keeping with the “Direct Method”,
focusing lessons around situational dialogues – hence their suitability for compar-
ison with the historical teaching texts.

3. Formulaic language and lexical bundles

Formulaic sequences of language are word strings which seem to be processed


holistically rather than generated word-by-word (Wray 2002). Although relatively
little is known about how these strings of words are processed (for a recent
comprehensive review, see Siyanova-Chanturia and Martinez 2014), it is clear
that they are fundamental in language production. Corpus analyses have illus-
trated how prevalent they are, with one study into their use finding that over 50
percent of fluent spoken and written discourse consisted of formulaic language
(Erman and Warren 2000). The term formulaic language includes a wide range of
items (Wray 2002: 9); the focus here is on lexical bundles, which are identified as
“the most frequent recurring sequences of words” (Biber 2006: 132) occurring in
the corpora. Following Biber’s (2006) approach, they are automatically extracted
by the software and are not edited for pragmatic integrity. This means that the
bundles may not be syntactically complete; however, as Biber (2006: 134) points
out, high-frequency word sequences of this kind tend to work as meaningful units
and have clear pragmatic functions. Bundles in spoken discourse tend to revolve
around the organization and management of conversation and the speaker-
listener relationship; for example, among the top three-word bundles in
CANCODE2 are I don’t know, I mean I, I don’t think and do you think (O’Keeffe
et al. 2007: 66). The English teaching texts used in this study are based around

2. The Cambridge and Nottingham Corpus of Discourse in English, which includes five-million
words of spoken English discourse.
170 Rachel Allan

everyday conversations in common situations, and bundles are extracted from


this target language. Scripted discourse of this kind is unlikely to contain prefaces
marking online speech construction like I mean I, but other phrases commonly
used in conversation should be evident in the lexical bundles.

4. Corpora

Sections 4.1 and 4.2 provide a brief description of the sources used to create the
corpora, and I give full bibliographical details in the References section.

4.1 Historical teaching texts

A range of teaching texts from towards the end of the Reform Period was iden-
tified to create a historical corpus (Corpus of Historical Textbooks, or HIST) for
analysis. Details of the texts and their word counts, calculated using AntConc
(Anthony 2016) are given in Table 1. These books were published between 1905
and 1917, and had an emphasis on teaching conversational English, targeting
adult learners with some knowledge of English. The majority were published in
America, but two British texts of a similar type have been included. The imbal-
ance arises from the fact that many of the European texts at this time focused
on phonetics, and were written largely using phonemic script. This made them
unsuitable for inclusion.
The majority of the American texts were targeted at adult learners of English
who were newly living and working in the USA, and were learning English in
an evening class or settlement environment. Topics include concepts such as
numbers, the monetary system, time, seasons, and everyday situations like going
to the grocery store, finding a job and visiting the doctor. One of the books
(Austin 1913) was written for women, covering their specific needs and workplaces
(e.g., A Day’s Work at Washing and A Day’s Work at the Cigar Factory). Berlitz
(1917) is a slightly different type of text, as it was proposed more as a general
“method” for teaching, with a focus on travel rather than on work in the US,
although it contains many of the same general topics as the other texts. Of the
two British texts, the target audience for Thorley (1916) is adult learners (of no
specific nationality) “bent on earning their living by what they learn” (i.e., for
work purposes). Tenney’s (1905) book, in contrast, specifically targets Chinese
learners, and contains some Mandarin translation in the text, but is comparable
in terms of topics and its focus on spoken English. These texts have a strong simi-
larity to those found in the CED, both in terms of their dialogic content and target
audience (see Culpeper and Kytö 2010: 45–49).
Lexical bundles from one century to the next 171

All of the texts have a short introduction explaining that the focus should be
on English as a spoken language, with plenty of repetition and drilling suggested.
With regard to the variety of English, there are some differences, as discussed in
Section 4.3.

Table 1. Composition of the HIST corpus, showing variety of English used, year of
publication, and word count. (Note: BrE = British English and AmE = American English.)
Word
Author Title Variety Year count
Tenney English Lessons BrE 1905 13,479
Wallach A First Book in English for Foreigners AmE 1906 20,897
O’Brien English for Foreigners AmE 1909 24,848
Houghton First Lessons in English for Foreigners at Evening AmE 1911 20,621
Schools
Austin Lessons in English for Foreign Women AmE 1913 20,393
Darling Foreigners’ Guide to English AmE 1914 33,406
Jimperieff Progressive Lessons in English for Foreigners AmE 1915 16,547
Thorley A Primer of English for Foreign Students BrE 1916 29,595
Berlitz Method for Teaching Foreign Languages AmE 1917 24,930
Total 204,716

All of the texts used here were sourced from The Open Library,3 an initiative of
the Digital Archive.4 The digitized text versions were proofread against the .pdf
files which are also available online. Appendices relating to grammar, such as
irregular verb lists, and sections containing extracts from literature (e.g., Thorley
1916: Part IV) were not included.

4.2 Contemporary self-study texts

Five English language teaching books published around a century after the histor-
ical texts were selected to make a small corpus of self-study texts (the Corpus
of Self-Study Texts, or SEST), to provide a sample of language input in contem-
porary materials. These are shown in Table 2. Self-study books were used rather
than classroom texts because they most closely resemble the historical texts, as
mentioned in Section 2. They focus on spoken language in use, and are generally
stand-alone courses, with target language included in the text (although accom-
panying recorded material is available in some cases), and with instructions given

3. See: https://openlibrary.org/.
4. See: https://archive.org/.
172 Rachel Allan

in English only. These texts were published between 2004 and 2014, mainly in
the UK, as few US published texts of this genre were available. They target adults
with some knowledge of English; most of the books describe their level in terms
of B1–B2 of the Common European Framework of Reference (CEFR) (Council of
Europe 2001) – that is, broadly intermediate level. Like their historical counter-
parts, these books cover essential topics such as getting directions, shopping, and
everyday interaction with friends and strangers, although lifestyle changes in the
intervening century mean that both contexts and target audience have changed
significantly. There are also differences in terms of language variety and length, as
can be seen in Table 2.

Table 2. Composition of the SEST corpus, showing variety of English used, year of
publication, and word count. (Note: BrE = British English and AmE = American English.)
Author / publisher Title Variety Year Word count
King Colloquial English BrE 2004 54,152
Living Language English: Essential Edition AmE 2009 27,476
Woodford & Walter Easy Learning English Conversation BrE 2011 35,469
Pelteret Speaking (Collins English for Life Series) BrE 2012 21,793
Stevens Complete English as a Foreign Language BrE 2014 59,406
Total 198,296

The books were scanned and the relevant text put into electronic format using
optical character recognition software, with the resulting files manually checked
for error. All language within the core instructional chapters was included, but
reference materials in other parts of the book (e.g., indexes and grammar
resources) were excluded.

4.3 Limitations of the corpora

Although the corpora are broadly similar in size, there are differences in compo-
sition. The language variety is mixed, both within and across the corpora, with
HIST largely based on American texts and SEST almost exclusively British. Differ-
ences between British English (BrE) and American English (AmE) have been
widely researched, both from historical (e.g., Rohdenburg and Schlüter [eds]
2009) and contemporary (e.g., Algeo 2006) perspectives, and the relationship
between them is complex (see Hundt 2009). In HIST, however, there was no
evidence of systematic differences in the variety of English between the different
texts, with few traits of AmE evident. The SEST texts were predominantly British,
facilitating comparison. A more pertinent consideration may be the target audi-
ences for the two sets of books. The SEST texts are written for people who wish to
Lexical bundles from one century to the next 173

travel to an English-speaking country for work or leisure, and are self-motivated


to learn English. In contrast, the majority of the HIST books were written for
immigrants who were assumed to have a relatively low level of education, working
as manual labourers or in trades which required them to learn English to secure a
future, with only Berlitz (1917) targeting a similar audience to the SEST texts. The
type of interaction required by these groups will differ, both at a social and prac-
tical level, and such differences will inevitably influence the language input in the
books.
Within each corpus there is a range of texts of different lengths. I chose to
use complete texts to ensure that the full range of topics was included. However,
this means that some books potentially carry more weight than others when
the corpus is analysed. Similarly, in each corpus there is a different number of
texts, with a broader range in HIST (nine books), as compared to SEST (five
books), to maintain a similar size overall. The broader range of the HIST corpus
may influence bundle counts, particularly given its more heterogenous nature, as
the individual texts may contain different phrasal language. In terms of teaching
methodology, there is a greater emphasis on language exercises in the self-study
corpus, which is likely to lead to more repetition of input language and a higher
level of lexical bundles. The HIST texts, on the other hand, were intended for
classroom use, and the traditional didactic routines they contain have an influence
on the types of lexical bundles found. Another point to mention is that two of
the SEST books were corpus-informed, Woodford and Walter (2011) and Pelteret
(2012); however, the input for most of the SEST books, like the HIST books, is
selected by the author and/or publisher.

5. Method

For the analysis, the most frequent three-word bundles were extracted from both
corpora using AntConc (Anthony 2016). This length was the most appropriate for
the purposes of this study, making it possible to generate a range of bundles used
in multiple contexts. Four-word bundles occurred in limited numbers because of
the restricted size of the corpora, while it was difficult to identify the functions
of two-word bundles. Contractions were counted as two words. This approach
was adopted because contractions were routinely used in the books in SEST, but
not consistently used in the historical texts. Contracted forms and their corre-
sponding full forms are identified as separate bundles (e.g., I do not is not equated
with I don’t). Range was specified in each case, to ensure that the bundles were
common to the majority of texts in each corpus. This corresponded to three of the
five texts in SEST (60 percent), and five of the nine texts in HIST (56 percent). The
174 Rachel Allan

percentage is lower in HIST, but arguably more restrictive, since the bundles were
required to occur in a greater number of texts.
As the research focus was specifically on language input rather than all
language used in the textbooks, the bundle lists were edited manually. Certain
bundles were immediately eliminated – for example, proper nouns (the United
States was the most frequent bundle in HIST, reflecting the context of immigration
and integration). The concordance lines were then examined to extract those that
were not used as language input, notably bundles relating to headings (e.g., Words
to know), instructions (e.g., check your answers) and in explanations (e.g., in this
unit or of the verb). In some cases, there was some overlap between instruc-
tional language and input (e.g., end of the), in which some uses are explanatory
or instructional (e.g., “add -ed to the end of the verb”) and others illustrate input
language relating to directions (e.g., “Go to the end of the street”). Instances of
each use were identified and the frequency count relating to input language was
used. Frequencies were then normalized to a frequency per 100,000 (f/100,000),
for ease of comparison.
The twenty-five most frequent lexical bundles in the two corpora were then
compared. The phrases were explored from a range of perspectives, including
their functions, collocates and the broader contexts in which they were used.
These top bundles were examined selectively rather than exhaustively, as certain
bundles were more revealing than others. To consider their value as indicators
of spoken language use, I consulted grammars and articles from the early 1900s,
along with contemporary corpus studies.

6. Results

6.1 A frequency survey

From a purely quantitative perspective, and before the editing process described
in Section 5 was undertaken, many more bundles were present in SEST than
HIST. In SEST, 631 bundle types (i.e., different bundles) occurred at least ten
times, with an overall token count (i.e., number of bundles) of 15,323, while HIST
had 382 bundle types and 8,659 tokens. This suggests a greater use of formu-
laic language in SEST overall. Following editing, the top twenty-five language
input bundles occurring in both corpora were identified (see Table 3). Frequencies
remain higher for SEST, but much of this is attributable to the use of exercises, as
hypothesized in Section 4.3. I will begin with an overview of the main similari-
ties and differences found in Table 3 before examining specific shared bundles and
discussing general patterns.
Lexical bundles from one century to the next 175

Table 3. Most frequent three-word bundles occurring in the historical and contemporary
English books, with bundles common to both corpora shown in boldface
HIST Hits (n) f/100,000 SEST Hits (n) f/100,000
what do you 146 71 I don’t 306 154
what is the 139 68 would you like 207 104
go to the 116 57 you want to 161 81
is made of 85 42 I’ve got 120 61
are there in 81 40 I’m going 114 57
I do not 76 37 I’m not 108 54
on the table 76 37 I can’t 105 53
this is a 75 37 I’d like 104 52
it is not 72 35 it’s a 103 52
I have a 67 33 ’m going to 98 49
it is a 64 31 don’t know 96 48
do you know 63 31 we don’t 96 48
is in the 63 31 go to the 91 46
do you do 58 28 what do you 89 45
in the morning 57 28 don’t like 84 42
I go to 55 27 I’m afraid 83 42
there is a 54 26 could I have 81 41
do you like 53 26 do you have 81 41
where is the 53 26 you tell me 80 40
where do you 52 25 have you got 76 38
this is the 51 25 do you like 74 37
to go to 51 25 what’s the 74 37
at what time 50 25 I have to 73 37
I have two 49 24 a lot of 66 33
the names of 47 23 I’m sorry 66 33

Four of the top twenty-five input bundles are present in both corpora: what
do you, what is the, go to the and do you like. Two further bundles are also present
in their full form in HIST and contracted form in SEST: I do not / don’t, and it is
a / it’s a. Many of the bundles in the SEST list include contracted forms, as could
be expected in present-day spoken language. In HIST, none of the top bundles
contain contractions. However, to return to the point made in Section 5 about the
inconsistent use of contractions in HIST, a search for I don’t in HIST generates
twenty-nine hits, predominantly in Thorley (1916) with thirteen hits, but also in
Berlitz (10x), Houghton (3x), Jimperieff (2x) and Tenney (1x); all of these texts also
use the form I do not. All of the other contractions found in the SEST bundles list
176 Rachel Allan

in Table 2 are used, but infrequently, in HIST. I can’t is the only contraction with
over ten hits (11x), and it’s a in HIST is comparatively rare, with only three hits.
Although these uses are scattered throughout the texts, most of them are found in
Thorley (1916), one of the later publications, and this may reflect a growing accep-
tance of shortened forms as the century progresses.
One of the most noticeable differences between the two lists is the number
of personal pronouns appearing in SEST compared to HIST. Ten of the SEST
bundles contain I compared to only four in the HIST bundles, suggesting that the
expression of personal opinion is prioritized in SEST. The interpersonal nature of
the language input is evident in both corpora, with several bundles incorporating
you appearing in both lists. The presence of question frames in the top bundles
also reflects the dialogic nature of much of the language input. There are both
similarities and differences in these, which will be discussed in greater detail in
Section 6.4.
In the HIST bundles, a limited number of verbs and forms of verbs are
presented, but a greater number of nouns occur. The verbs in HIST are: do, go,
made, be, have, know and like; in SEST these also occur and are supplemented with
would (like), can’t, could, want and tell. Furthermore, have is also used to express
obligation in the form of have to. Only made is unique to HIST. In HIST, all of
the other verbs occur in present simple form, while the perfective and progressive
aspects are represented in SEST. The use of modal auxiliaries is noteworthy, and is
discussed in Section 6.5. In contrast, simple sentence frames predominate in the
HIST bundles, such as this is a / the, there is a. Four nouns occur in HIST in the
phrases on the table, in the morning, at what time and the names of, demonstrating
a focus on description; no nouns or similar descriptive phrases occur in the SEST
bundles. At face value, the formulaic language contained in the contemporary
texts shows greater complexity, both syntactically and pragmatically; whether the
HIST texts contain simplified, non-representative forms for teaching purposes is a
question probed in Sections 6.2 to 6.5. First, two of the bundles occurring in both
corpora, what do you and do you like, will be discussed in Sections 6.2 and 6.3,
respectively.

6.2 What do you

Examining the words following the shared bundle what do you demonstrates that
it is used in a much broader range of contexts in HIST than SEST (see Table 4). In
SEST, most uses occur in phrases like What do you think of / about and What do
you do. While what do you do is at the top of the frequency list in HIST, what do
you think only occurs twice. Conversely, what do you call has a high frequency in
HIST but only occurs twice in SEST. What do you call is typically used to quiz a
Lexical bundles from one century to the next 177

student on their vocabulary (i.e., it focuses on form), whereas what do you think
calls for a reaction to something – for example, What do you think of his latest
movie? (Woodford and Walter 2011); in other words, it focuses on meaning.

Table 4. Words following the lexical bundle what do you in HIST and SEST corpus, with
raw frequencies shown in parentheses
what do you…
HIST (146) SEST (89)
do (41) think (40)
call (26) do (24)
say (9) want (6)
hear, see (6) recommend,
eat (5) feel (5)
feel, learn, put, wish (3) mean (4)
drink, give, smell, take, tell, think, use, wash, wear, write (2) call (2)
answer, ask, carry, cut, dig, dry, find, go, lay, lie, live, look, make, pay, seek, speak, get, prefer,
stir, suppose, try, wind, work (1) reckon (1)

Looking more closely at the top bundle from HIST, what do you do is mainly used
to drill students, as in Example (1), although Example (2) shows a potentially more
communicative use.
(1) What do you do with your eyes? I see.
What do you do with your ears? I hear. (Darling 1914)
(2) What do you do in the morning?
I walk in the garden for amusement. (Tenney 1905)

In contrast, in SEST, what do you do is mainly used (nineteen out of the twenty-
four occurrences) to ask about occupation, as illustrated in Examples (3) and (4).
(3) Hello, Marian – what do you do for a job?
I’m a teacher. (King 2004)
(4) Hi, I’m Scott. I’m a friend of Andrew’s.
Oh, pleased to meet you. Andrew has told me a lot about you.
You work with him, don’t you? What do you do, exactly?
I’m a teacher. I teach maths. What about you? (Woodford and Walter 2011)

The HIST texts do not use what do you do to ask about occupation, and in fact,
focus very little on this question. All of the SEST texts include questions about
occupation, but searches of HIST on the keywords work, job, occupation, living
and earn revealed only four questions asking about occupation: What do you work
at? and What work do you do? (Wallach 1906), Where do you work? (Houghton
178 Rachel Allan

2011) and What is your occupation? (Darling 1914). Questions on the topic of
work tended to focus on other aspects (i.e., working days, hours, transportation to
and from work, looking for work, and earnings, etc.). The corpora, then, show a
change not only in the way occupation is asked about, but also the extent to which
it is the focus of conversation, which may well relate to the different circumstances
of the learners in the two periods. The HIST questions are more relevant for immi-
grants who may not be working or who work in low-status jobs.
These examples from HIST also raise questions about language authenticity.
Example (1) in particular seems to be far from representing “everyday conversa-
tion” in that there is no exchange of meaning. If bundles are frequent because
they are being used to drill or illustrate language, it is quite possible that they do
not reflect the kind of speech used outside the classroom. The more meaningful
exchanges in the SEST examples model authentic conversation more closely, and
show consistent use of the same phrase for the same function. However, it should
be noted that their authenticity has also been questioned (see Allan 2017).

6.3 Do you like

A second shared bundle, do you like, offers further insights into the differences in
patterns of language use between the corpora. In both corpora it is mainly used
with nouns or pronouns in simple questions such as do you like beer? (Berlitz
1917) and do you like fast food? (King 2004). However, when followed by verb
forms there is a clear difference; in HIST the infinitive form is generally used, for
example, do you like to drive? (Tenney 1905), whereas in SEST, only the gerund
is used, for example, do you like driving? (Woodford and Walter 2011), as can be
seen in Figure 1. Only one of the HIST texts, the British book by Thorley (1916),
consistently uses the gerund, while the other British text (Tenney 1905) uses only
the infinitive form.
This reflects a well-documented pattern of language variation from infinitive
to gerund (e.g., Fanego 2007; Mair and Leech 2008; Vosberg 2009). However, this
shift in complementation seems to depend on both the specific construction and
variety, with BrE leading in some cases and AmE in others, as Vosberg (2009)
illustrates. In the case of like, Fanego (2007: 178) finds that, along with other
emotive predicates that had not allowed -ing clauses before the eighteenth century,
like regularly takes gerundive complements in late Modern English both in British
and American varieties. With reference to BrE, Egan (2006: 230) comments on
the gerund’s steady rise from the nineteenth century. Fowler’s (1908) (British)
prescriptive grammar points to the gerund’s dominance in general: “the infinitive
being the side that should only be used with caution” (Fowler 1908: 130). This
being the case, Tenney’s (1905) text appears to present a more traditional, perhaps
Lexical bundles from one century to the next 179

Figure 1. Forms following do you like in both corpora. (Figures are percent, based on
fifty-three hits in HIST and sixty-two hits in SEST)

old-fashioned, use of BrE. The fact that it was published at the beginning of the
period in question, and written while the author was in China and for a Chinese
audience may account for this. None of the AmE HIST texts used gerundive
complementation with do you like. This may reflect common usage at the time.
Alternatively, it could indicate a conservative approach to language change by the
textbook writers, or be attributable to efficiency of teaching – that it was simpler
to teach a single form. The reason(s) for this variation cannot be confirmed, but it
highlights the fact that syntactic change emerges gradually, and at different rates,
not only in different varieties but in different contexts of use.
Having considered two of the bundles that are shared by the historical and
contemporary texts, the following sections will discuss more general features
exhibited by the bundles in the different corpora. First, I will consider question
frames found in the bundles, before looking at the use of modal auxiliaries.

6.4 Question frames and prefaces

Differences in question frames used in the two corpora are evident when exam-
ining the bundles that are unique to each list. The most obvious point to make
is that the question frame would you like occurs with very high frequency (104 f/
100,000) in SEST but is not represented in the top bundles of HIST. In fact, this
bundle is only marginally represented in HIST, with only eight occurrences in
total (4 f/100,000), and in only three of the texts (Berlitz 1917; O’Brien 1909, and
Wallach 1906). Could I have, one of the top bundles in SEST, does not appear at
all in HIST. Questions using could are very infrequent in the historical texts; only
180 Rachel Allan

three occur: But could you not come last night? (Berlitz), How much could she earn?
(Austin 1913) and Could I go too? (Austin 1913). Another frequent phrase in SEST,
you tell me, is used almost exclusively as a question preface with a modal verb,
could / can you tell me, with 6 percent of the occurrences using could. The bundle
you tell me occurs only sixteen times (8 f/100,000) in HIST, with fourteen hits
occurring in the phrase can you tell me, and none using could.
Looking at the HIST bundles, question formation is quite different, with a
number of simple interrogative forms occurring: what is the, where is the and
where do you. The majority of questions using what is the and where is the are
display questions, testing knowledge. What is the colour of… is the most
frequently asked question, closely followed by name, difference and, more commu-
nicatively, price. In SEST, the question frames what is the or what’s the are widely
used, but most commonly in meaning exchanges, in what’s the time and what’s
the matter. In HIST, where is the is generally used in asking about the location of
objects, such as where is the book / sun, again with a focus on generating language
rather than meaning. In contrast, in SEST, where is the / where’s the (combined 16
f/100,000) is used to ask for directions. Finally, although it is also used in display
questions, the bundle where do you appears to be used in a more communica-
tive way in HIST, with questions like Where do you live / come from / work – uses
which are shared in SEST (15 f/100,000).
Overall, the bundles relating to questions tend to be less direct in SEST and
are often prefaced with hedging expressions or other signals of politeness. In
contrast, the question forms found in HIST pay little attention to face. Again, this
raises the question of how far this was determined by teaching efficiency, whereby
a simpler form was presented rather than a more socially acceptable but linguisti-
cally complex one. It may also stem from the fact that the HIST texts were written
to be used in a classroom environment where social roles and hierarchies were
more fixed, rather than in social situations where they may need to be negoti-
ated. For the SEST texts, face needs are prioritized, perhaps because they deal with
real-life interaction, where there is a need to attend to face in everyday speech
routines. This is also evident in the high occurrence of other prefaces to speech
acts in SEST, with I’m afraid and I’m sorry being among the top bundles. Both of
these are mainly used here to preface a refusal or soften another potentially face-
threatening statement, for example, The coffee machine’s broken today, I’m afraid
(King 2004) and I’m sorry to bother you (Pelteret 2012). These phrases occur in
HIST, but on few occasions. Sorry has only sixteen hits in total in HIST; I am sorry
occurs three times and the contracted form, I’m sorry, occurs only twice, both
times in Thorley (1916). Only three of these uses have the same sense, for example,
I’m sorry to say that… Similarly, I’m afraid / I am afraid has only seven hits in total,
with only four functioning as a softener. As these are simple, easily-taught fixed
Lexical bundles from one century to the next 181

expressions, it can be inferred that hedging, at least using these expressions, was
not considered to be a priority for everyday conversation by the HIST textbook
writers. However, the same function may have been realised in another form, a
point which will be considered in the following section.

6.5 Modal auxiliary use

Not only are modal auxiliaries much more prominent in SEST in question frames,
as shown in Section 6.4, but their use is more frequent in general. Bundles using
modal verbs, notably would, occur with greater frequency in SEST than HIST,
reinforcing the idea that greater attention is paid to face needs in the content of
the SEST books. The single item would has a frequency of 227 f/100,000 in SEST,
whereas it occurs only 150 times in total in HIST (74 f/100,000), and only a small
fraction of these hits correspond to the kind of uses (I would like and would you
like) found in SEST. Fowler (1926: 549) describes the rules applied to the use of
would in BrE, which appear in this data to apply similarly to AmE:

The verbs like, prefer, care, be glad, be inclined, &c., are very common in first-
person conditional statements (I should like to know &c.). In these, should, not
would is the correct form in the English idiom.

In present-day English, should is no longer used with this function, at least not
in everyday spoken English. Furthermore, other less frequent modal forms (e.g.,
shall and ought to) have become even more infrequent in contemporary speech
(Mair and Leech 2006). In view of this, HIST was explored for further relevant
forms, and was found to contain thirty instances of I should like, with further
occurrences of should be glad (3) and should be happy (1). Furthermore, there were
twelve uses of shall be (very) glad. A random selection showing the use of these
forms is shown in Figure 2. The individual phrases do not have the breadth of
meaning of I’d like, and uses are more fixed; for example, shall be glad is here only
used in response to an invitation. Despite this, many of the these forms could be
replaced by the phrase would like / ’d like in contemporary language use. If the
frequency of these bundles is combined, the occurrence of language with a similar
function to I’d like comes within the range of top bundles in HIST (26 f/100,000),
showing that the function, if not a single form, is well-represented in the books.
This highlights two points. First, it warns against assuming equivalence in
function of the bundles from the different periods. Secondly, it returns to the
theme of “which” English was taught in the early-twentieth century, both in
terms of formality and variety. Regarding the former, the data suggests that a
more formal style of language is represented in the HIST books, informed more
by prescriptive grammar than everyday speech. Concerning variety, while the
182 Rachel Allan

samples of BrE and AmE do not display differences in this case, a broader range
of historical texts of both varieties is needed to confirm this.

Figure 2. Uses of modal forms with a similar function to the contemporary


would like in HIST
have the first choice.” “I 'd like to join a club too,” said Bertha Austin 1913
an eclipse of the sun tomorrow, I should be glad to see it. Tenney 1905
when my family comes, I should like to have you come and spend Austin 1913
join, won’t you? Bob. Well, I should like to, but my pocket money is Thorley 1916
them on to see if they fit.” “I should like, also, to see some neckties.” O’Brien 1909
answer to your advertisement. I should like a position in your factory.” O’Brien 1909
Let us go to the park.” “Yes, I should like to go very much. Let us stop O’Brien 1909
worry about you.” “Thank you, I shall be glad to stay.” “Tomorrow my Darling 1914
about it sometime,” Lucy said. “ I shall be glad to tell you now if you Austin 1913
you come to see us soon ? “ “ Yes, I shall be glad to visit you.” “Good-night.” O’Brien 1909
cars.” “When may we see it?” “I shall be very glad to show it to you Darling 1914
walk with me on Sunday?” “Yes, I shall be very glad to take a walk with Wallach 1906

7. Summary and conclusion

The starting point for this study was the extraction of the top three-word lexical
bundles from language input using two sets of teaching materials from distinct
periods that are a century apart. The resulting lexical bundles were found to
be more different than similar. General features noted were a limitation in the
range of lexis and syntactic forms among the HIST bundles, and a lower level
of contracted forms. Where similarities between the bundles were observed, the
broader contexts were examined and revealed further differences, with the HIST
texts displaying a focus on form over meaning, leading to artificial exchanges. A
further indication that the HIST texts might not reflect authentic use was found in
the use of complementation; gerundive complementation is avoided in the case of
like, whereas research into, and literature of, the period leads us to believe that it
could be expected. Similarly, the simple, direct question forms presented suggest
that teachability may have influenced language input; the lack of indirectness is
notable considering how prominent this is in SEST. Furthermore, modal auxil-
iaries occur in several of the top SEST bundles but not in the HIST ones. However,
the HIST bundles were found to contain different modal forms realising similar
functions, reflecting a documented changing use of language.
Lexical bundles from one century to the next 183

The main aim of this study was to consider the validity of the lexical bundles
in HIST as representative of spoken language of the period. The study was
exploratory; limitations in both scope and data mean that any findings must be
viewed with caution and triangulated with reference to other sources. Although
the Reform Period aimed to teach everyday conversation, this was not necessarily
authentic everyday conversation. Simplification, prescriptivism and a priority of
form over meaning in language input are evident in the HIST texts. Bearing in
mind the aim and target audience of these texts, it is not surprising that simple,
acceptable forms were prioritized. The SEST texts, in contrast, respond to their
target learners by emulating authenticity in their language input, trying to be
communicative and paying attention to face needs. However, they also present
an edited version of authentic language use. In conclusion, these corpora offer
perspectives on language from their respective periods viewed through the filter
of the writer; the language presented is what is deemed to be appropriate and rele-
vant for teaching specific audiences. This both limits and enriches the data; and
whilst it may not be a direct representation of everyday language use, it can offer
insights into language, society and pedagogy. To broaden these insights, further
diachronic studies using texts of this type are to be encouraged.

Sources

Corpus of Historical Textbooks (HIST)


Austin, Ruth. 1913. Lessons in English for Foreign Women: For Use in Settlements and Evening
Schools. New York, Cincinnati, Chicago: American Book Company.
Berlitz, Maximilian Delphinus. 1917. Method for Teaching Foreign Languages. New York: M. D.
Berlitz.
Darling, Alice I. 1914. Foreigners’ Guide to English. Yonkers-on-Hudson, New York: World
Book Company.
Houghton, Frederick. 1911. First Lessons in English for Foreigners in Evening Schools. New York,
Cincinnati, Chicago: American Book Company.
Jimperieff, Mary. 1915. Progressive Lessons in English for Foreigners: First Year. Boston: Ginn
and Company.
O’Brien, Sara Redempta. 1909. English for Foreigners. Boston: Houghton Mifflin Company.
Tenney, C. D. 1905. English Lessons. London: Macmillan and Co. Limited.
Thorley, Wilfrid Charles. 1916. A Primer of English for Foreign Students. London: Macmillan
and Co. Limited.
Wallach, Isabel Richman. 1906. A First Book in English for Foreigners. New York, Boston,
Chicago: Silver, Burdett and Company.
184 Rachel Allan

Corpus of Self-Study Texts (SEST)


King, Gareth. 2004. Colloquial English: A Course for Non-Native Speakers. London: Routledge.
Living Language. 2009. English: Essential Course. New York: Living Language.
Pelteret, Cheryl. 2012. Speaking: B1+ [Collins English for Life: Skills]. London: Collins.
Stevens, Sandra. 2014. Complete English as a Foreign Language. London: Hodder and
Stoughton.
Woodford, Kate and Elizabeth Walter. 2011. Easy Learning English Conversation. London:
Collins.

References

Algeo, John. 2006. British or American English? A Handbook of Word and Grammar Patterns.
Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511607240
Allan, Rachel. 2017. “From do you know to I don’t know: An Analysis of the Frequency and
Usefulness of Lexical Bundles in Five English Language Self-Study Books”. Corpus
Pragmatics 1: 351–372. https://doi.org/10.1007/s41701‑017‑0016‑9
Anthony, Lawrence. 2016. “AntConc. A Freeware Corpus Analysis Toolkit for Concordancing
and Text Analysis”. Tokyo, Japan: Waseda University.
Bayley, Susan N. 1998. “The Direct Method and Modern Language Teaching in England
1880–1918”. History of Education 27 (1): 39–57. https://doi.org/10.1080/0046760980270104
Biber, Douglas. 2006. University Language: A Corpus-Based Study of Spoken and Written
Registers. Amsterdam: John Benjamins. https://doi.org/10.1075/scl.23
Cavanaugh, M. P. 1996. “History of Teaching English”. The English Journal 85 (8): 40–44.
https://doi.org/10.2307/820039
A Corpus of English Dialogues 1560–1760. 2006. Compiled under the supervision of
Merja Kytö. (Uppsala University) and Jonathan Culpeper (Lancaster University).
Council of Europe. 2001. Common European Framework of Reference for Languages: Learning,
Teaching, Assessment. Cambridge: Cambridge University Press.
Culpeper, Jonathan and Merja Kytö. 2010. Early Modern English Dialogues: Spoken Interaction
as Writing. Cambridge: Cambridge University Press.
Egan, Thomas. 2006. “Emotion Verbs with to-Infinitive Complements: From Specific to
General Predication”. In Maurizio Giotti, Marina Dossena and Richard Dury (eds),
English Historical Linguistics 2006: Syntax and Morphology, 223–240. Amsterdam: John
Benjamins.
Erman, Britt and Beatrice Warren. 2000. “The Idiom Principle and the Open Choice
Principle”. Text – Interdisciplinary Journal for the Study of Discourse 20 (1): 29–62.
https://doi.org/10.1515/text.1.2000.20.1.29
Fanego, Teresa. 2007. “Drift and the Development of Sentential Complements in British and
American English from 1700 to the Present Day”. In Javier Pérez-Guerra,
Dolores González-Álvarez, Jorge L. Bueno-Alonso and Esperanza Rama-Martínez (eds),
‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English,
161–235. Bern: Peter Lang.
Fowler, Henry W. 1908. The King’s English. (Second edition.) Oxford: Clarendon Press.
Fowler, Henry W. 1926. A Dictionary of Modern English Usage. Oxford: Clarendon Press.
Lexical bundles from one century to the next 185

Howatt, A. P. R. and Richard Smith. 2014. “The History of Teaching English as a Foreign
Language, from a British and European Perspective”. Language and History 57 (1): 75–95.
https://doi.org/10.1179/1759753614Z.00000000028
Hundt, Marianne. 2009. “Colonial Lag, Colonial Innovation or Simply Language Change?” In
Günter Rohdenburg and Julia Schlüter (eds), One Language, Two Grammars?: Differences
between British and American English, 13–37. Cambridge: Cambridge University Press.
https://doi.org/10.1017/CBO9780511551970.002
Jespersen, Otto. 1904. How to Teach a Foreign Language. London: Swan Sonnenschein / Allen
& Unwin. [Translation by S. Yhlen-Olsen Bertelsen of Sprogundervisning, 1901,
Copenhagen: Schuboteske Forlag.]
Mair, Christian and Geoffrey Leech. 2008. “Current Changes in English Syntax”. In Bas Aarts
and April McMahon (eds), The Handbook of English Linguistics, 318–342. Oxford:
Wiley-Blackwell.
Masuhara, Hitomi, Naeema Hann, Yong Yi and Brian Tomlinson. 2008. “Adult EFL Courses”.
ELT Journal 62 (3): 294–312. https://doi.org/10.1093/elt/ccn028
McCarthy, Michael, Jeanne McCarten and Helen Sandiford. 2014. Touchstone. (Second
edition.) Cambridge: Cambridge University Press.
O’Keeffe, Anne, Michael McCarthy and Ronald Carter. 2007. From Corpus to Classroom:
Language Use and Language Teaching. Cambridge: Cambridge University Press.
https://doi.org/10.1017/CBO9780511497650
Rohdenburg, Günther and Julia Schlüter (eds). 2009. One Language, Two Grammars?
Differences between British and American English. Cambridge: Cambridge University
Press. https://doi.org/10.1017/CBO9780511551970
Siyanova-Chanturia, Anna and Ron Martinez. 2014. “The Idiom Principle Revisited”. Applied
Linguistics 2014: 1–22. https://doi.org/10.1093/applin/amt054
Sweet, Henry. 1890. A Primer in Phonetics. Oxford: Clarendon Press.
Tomlinson, Brian and Hitomi Masuhara. 2013. “Adult Coursebooks”. ELT Journal 67 (2):
233–249. https://doi.org/10.1093/elt/cct007
Vosberg, Uwe. 2009. “Non-Finite Complements”. In Günter Rohdenburg and Julia Schlüter
(eds), One Language, Two Grammars?: Differences between British and American English,
212–227. Cambridge: Cambridge University Press.
https://doi.org/10.1017/CBO9780511551970.012
Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University
Press. https://doi.org/10.1017/CBO9780511519772

Address for correspondence

Rachel Allan
Department of Humanities
Mid-Sweden University
SE-851 70 Sundsvall
Sweden
[email protected]

You might also like