A Classifier-Based Approach To Preposition and Determiner Error Correction in L2 English

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

A classifier-based approach to preposition and determiner error

correction in L2 English

Rachele De Felice and Stephen G. Pulman


Oxford University Computing Laboratory
Wolfson Building, Parks Road, Oxford OX1 3QD, UK
{rachele.defelice|stephen.pulman}@comlab.ox.ac.uk

Abstract then present our proposed approach on both L1 and


L2 data and discuss the results obtained so far.
In this paper, we present an approach to the
automatic identification and correction of
preposition and determiner errors in non- 2 The problem
native (L2) English writing. We show that
models of use for these parts of speech 2.1 Prepositions
can be learned with an accuracy of 70.06%
Prepositions are challenging for learners because
and 92.15% respectively on L1 text, and
they can appear to have an idiosyncratic behaviour
present first results in an error detection
which does not follow any predictable pattern even
task for L2 writing.
across nearly identical contexts. For example, we
1 Introduction say I study in Boston but I study at MIT; or He is
independent of his parents, but dependent on his
The field of research in natural language process-
son. As it is hard even for L1 speakers to articulate
ing (NLP) applications for L2 language is con-
the reasons for these differences, it is not surpris-
stantly growing. This is largely driven by the ex-
ing that learners find it difficult to master preposi-
panding population of L2 English speakers, whose
tions.
varying levels of ability may require different types
of NLP tools from those designed primarily for
native speakers of the language. These include 2.2 Determiners
applications for use by the individual and within
Determiners pose a somewhat different problem
instructional contexts. Among the key tools are
from prepositions as, unlike them, their choice is
error-checking applications, focusing particularly
more dependent on the wider discourse context
on areas which learners find the most challenging.
than on individual lexical items. The relation be-
Prepositions and determiners are known to be one
tween a noun and a determiner is less strict than
of the most frequent sources of error for L2 En-
that between a verb or noun and a preposition, the
glish speakers, a finding supported by our analysis
main factor in determiner choice being the specific
of a small error-tagged corpus we created (deter-
properties of the noun’s context. For example, we
miners 17% of errors, prepositions 12%). There-
can say boys like sport or the boys like sport, de-
fore, in developing a system for automatic error
pending on whether we are making a general state-
detection in L2 writing, it seems desirable to focus
ment about all boys or referring to a specific group.
on these problematic, and very common, parts of
Equally, both she ate an apple and she ate the ap-
speech (POS).
ple are grammatically well-formed sentences, but
This paper gives a brief overview of the prob-
only one may be appropriate in a given context, de-
lems posed by these POS and of related work. We
pending on whether the apple has been mentioned
c 2008. Licensed under the Creative Commons previously. Therefore, here, too, it is very hard to
Attribution-Noncommercial-Share Alike 3.0 Unported li-
cense (http://creativecommons.org/licenses/by-nc-sa/3.0/). come up with clear-cut rules predicting every pos-
Some rights reserved. sible kind of occurrence.

169
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 169–176
Manchester, August 2008
3 Related work Head noun ‘apple’
Number singular
Noun type count
Although in the past there has been some research Named entity? no
on determiner choice in L1 for applications such as WordNet category food, plant
generation and machine translation output, work to Prep modification? yes, ‘on’
Object of Prep? no
date on automatic error detection in L2 writing has Adj modification? yes, ‘juicy’
been fairly limited. Izumi et al. (2004) train a max- Adj grade superlative
POS ±3 VV, DT, JJS, IN, DT, NN
imum entropy classifier to recognise various er-
rors using contextual features. They report results Table 1: Determiner feature set for Pick the juiciest
for different error types (e.g. omission - precision apple on the tree.
75.7%, recall 45.67%; replacement - P 31.17%,
R 8%), but there is no break-down of results by POS modified verb
individual POS. Han et al. (2006) use a maxi- Lexical item modified ‘drive’
WordNet Category motion
mum entropy classifier to detect determiner errors, Subcat frame pp to
achieving 83% accuracy. Chodorow et al. (2007) POS of object noun
present an approach to preposition error detection Object lexical item ‘London’
Named entity? yes, type = location
which also uses a model based on a maximum en- POS ±3 NNP, VBD, NNP
tropy classifier trained on a set of contextual fea- Grammatical relation iobj
tures, together with a rule-based filter. They report
Table 2: Preposition feature set for John drove to
80% precision and 30% recall. Finally, Gamon et
London.
al. (2008) use a complex system including a deci-
sion tree and a language model for both preposi-
tion and determiner errors, while Yi et al. (2008) Corpus (BNC) as we believe this offers a represen-
propose a web count-based system to correct de- tative sample of different text types. We represent
terminer errors (P 62%, R 41%). training and testing items as vectors of values for
The work presented here displays some similar- linguistically motivated contextual features. Our
ities to the papers mentioned above in its use of a feature vectors include 18 feature categories for
maximum entropy classifier and a set of features. determiners and 13 for prepositions; the main ones
However, our feature set is more linguistically so- are illustrated in Table 1 and Table 2 respectively.
phisticated in that it relies on a full syntactic anal- Further determiner features note whether the noun
ysis of the data. It includes some semantic compo- is modified by a predeterminer, possessive, nu-
nents which we believe play a role in correct class meral, and/or a relative clause, and whether it is
assignment. part of a ‘there is. . . ’ phrase. Additional preposi-
tion features refer to the grade of any adjectives or
4 Contextual models for prepositions and adverbs modified (base, comparative, superlative)
determiners and to whether the items modified are modified by
more than one PP1 .
4.1 Feature set In De Felice and Pulman (2007), we described
The approach proposed in this paper is based on some of the preprocessing required and offered
the belief that although it is difficult to formulate some motivation for this approach. As for our
hard and fast rules for correct preposition and de- choice of features, we aim to capture all the ele-
terminer usage, there is enough underlying regu- ments of a sentence which we believe to have an
larity of characteristic syntactic and semantic con- effect on preposition and determiner choice, and
texts to be able to predict usage to an acceptable which can be easily extracted automatically - this
degree of accuracy. We use a corpus of grammat- is a key consideration as all the features derived
ically correct English to train a maximum entropy rely on automatic processing of the text. Grammat-
classifier on examples of correct usage. The classi- ical relations refer to RASP-style grammatical re-
fier can therefore learn to associate a given prepo- lations between heads and complements in which
sition or determiner to particular contexts, and re- the preposition occurs (see e.g. (Briscoe et al.,
liably predict a class when presented with a novel 1
A full discussion of each feature, including motivation
instance of a context for one or the other. for its inclusion and an assessment of its contribution to the
The L1 source we use is the British National model, is found in De Felice (forthcoming).

170
Author Accuracy Proportion of training data Precision Recall
Baseline 26.94% of 27.83% (2,501,327) 74.28% 90.47%
Gamon et al. 08 64.93% to 20.64% (1,855,304) 85.99% 81.73%
Chodorow et al. 07 69.00% in 17.68% (1,589,718) 60.15% 67.60%
Our model 70.06% for 8.01% (720,369) 55.47% 43.78%
on 6.54% (587,871) 58.52% 45.81%
Table 3: Classifier performance on L1 prepositions with 6.03% (541,696) 58.13% 46.33%
at 4.72% (424,539) 57.44% 52.12%
2006)). Semantic word type information is taken by 4.69% (421,430) 63.83% 56.51%
from WordNet lexicographer classes, 40 broad se- from 3.86% (347,105) 59.20% 32.07%
mantic categories which all nouns and verbs in
Table 4: L1 results - individual prepositions
WordNet belong to2 (e.g. ‘verb of motion’, ‘noun
denoting food’), while the POStags are from the
Penn Treebank tagset - we note the POS of three on). Furthermore, it should be noted that Gamon
words either side of the target word3 . For each et al. report more than one figure in their results,
occurrence of a preposition or determiner in the as there are two components to their model: one
corpus, we obtain a feature vector consisting of determining whether a preposition is needed, and
the preposition or determiner and its context, de- the other deciding what the preposition should be.
scribed in terms of the features noted above. The figure reported here refers to the latter task,
as it is the most similar to the one we are evalu-
5 Acquiring the models ating. Additionally, Chodorow et al. also discuss
some modifications to their model which can in-
5.1 Prepositions
crease accuracy; the result noted here is the one
At the moment, we restrict our analysis to the nine more directly comparable to our own approach.
most frequent prepositions in the data: at, by, for,
from, in, of, on, to, and with, to ensure a sufficient 5.1.1 Further discussion
amount of data for training. This gives a training To fully assess the model’s performance on the L1
dataset comprising 8,898,359 instances. We use data, it is important to consider factors such as per-
a standard maximum entropy classifier 4 and do formance on individual prepositions, the relation-
not omit any features, although we plan to experi- ship between training dataset size and accuracy,
ment with different feature combinations to deter- and the kinds of errors made by the model.
mine if, and how, this would impact the classifier’s Table 4 shows the classifier’s performance on in-
performance. Before testing our model on learner dividual prepositions together with the size of their
data, it is important to ascertain that it can correctly training datasets. At first glance, a clear correlation
associate prepositions to a given context in gram- appears between the amount of data seen in train-
matical, well-edited data. We therefore tested the ing and precision and recall, as evidenced for ex-
model on a section of the BNC not used in train- ample by of or to, for which the classifier achieves
ing, section J. Our best result to date is 70.06% a very high score. In other cases, however, the cor-
accuracy (test set size: 536,193). Table 3 relates relation is not so clear-cut. For example by has
our results to others reported in the literature on one of the smallest data sets in training but higher
comparable tasks. The baseline refers to always scores than many of the other prepositions, while
choosing the most frequent option, namely of. for is notable for the opposite reason, namely hav-
We can see that our model’s performance com- ing a large dataset but some of the lowest scores.
pares favourably to the best results in the literature, The absence of a definite relation between
although direct comparisons are hard to draw since dataset size and performance suggests that there
different groups train and test on different preposi- might be a cline of ‘learnability’ for these prepo-
tion sets and on different types of data (British vs. sitions: different prepositions’ contexts may be
American English, BNC vs. news reports, and so more or less uniquely identifiable, or they may
2
have more or fewer senses, leading to less confu-
No word sense disambiguation was performed at this
stage.
sion for the classifier. One simple way of verify-
3
In NPs with a null determiner, the target is the head noun. ing the latter case is by looking at the number of
4
Developed by James Curran. senses assigned to the prepositions by a resource

171
Target prep Confused with
at by for from in of on to with
at xx 4.65% 10.82% 2.95% 36.83% 19.46% 9.17% 10.28% 5.85%
by 6.54% xx 8.50% 2.58% 41.38% 19.44% 5.41% 10.04% 6.10%
for 8.19% 3.93% xx 1.91% 25.67% 36.12% 5.60% 11.29% 7.28%
from 6.19% 4.14% 6.72% xx 26.98% 26.74% 7.70% 16.45% 5.07%
in 7.16% 9.28% 10.68% 3.01% xx 43.40% 10.92% 8.96% 6.59%
of 3.95% 2.00% 18.81% 3.36% 40.21% xx 9.46% 14.77% 7.43%
on 5.49% 3.85% 8.66% 2.29% 32.88% 27.92% xx 12.20% 6.71%
to 9.77% 3.82% 11.49% 3.71% 24.86% 27.95% 9.43% xx 8.95%
with 3.66% 4.43% 12.06% 2.24% 28.08% 26.63% 6.81% 16.10% xx

Table 5: Confusion matrix for L1 data - prepositions

such as the Oxford English Dictionary. However, Classifier choice Correct phrase
we find no good correlation between the two as the demands of the sector demands for. . .
preposition with the most senses is of (16), and condition for development condition of. . .
that with the fewest is from (1), thus negating the travel to speed travel at. . .
idea that fewer senses make a preposition easier look at the USA look to. . .
to learn. The reason may therefore be found else-
where, e.g. in the lexical properties of the contexts. Table 6: Examples of classifier errors on preposi-
A good picture of the model’s errors can be tion L1 task
had by looking at the confusion matrix in Table 5,
which reports, for each preposition, what the clas-
sifier’s incorrect decision was. Analysis of these Author Accuracy
errors may establish whether they are related to the Baseline 59.83%
dataset size issue noted above, or have a more lin- Han et al. 06 83.00%
guistically grounded explanation. Gamon et al. 08 86.07%
From the table, the frequency effect appears evi- Turner and Charniak 07 86.74%
dent: in almost every case, the three most frequent Our model 92.15%
wrong choices are the three most frequent prepo-
Table 7: Classifier performance - L1 determiners
sitions, to, of, and in, although interestingly not in
that order, in usually being the first choice. Con-
versely, the less frequent prepositions are less of-
ten suggested as the classifier’s choice. This effect
but the overall meaning of the phrases changes
precludes the possibility at the moment of draw-
somewhat. For example, while the demands of
ing any linguistic conclusions. These may only be
the sector are usually made by the sector itself,
gleaned by looking at the errors for the three more
the demands for the sector suggest that someone
frequent prepositions. We see for example that
else may be making them. These are subtle dif-
there seems to be a strong relation between of and
ferences which it may be impossible to capture
for, the cause of which is not immediately clear:
without a more sophisticated understanding of the
perhaps they both often occur within noun phrases
wider context.
(e.g. book of recipes, book for recipes). More pre-
The example with travel, on the other hand,
dictable is the confusion between to and from, and
yields an ungrammatical result. We assume that
between locative prepositions such as to and at, al-
the classifier has acquired a very strong link be-
though the effect is less strong for other potentially
tween the lexical item travel and the preposition to
confusable pairs such as in and at or on.
that directs it towards this choice (cf. also the ex-
Table 6 gives some examples of instances where ample of look at/to). This suggests that individual
the classifier’s chosen preposition differs from that lexical items play an important role in preposition
found in the original text. In most cases, the clas- choice along with other more general syntactic and
sifier’s suggestion is also grammatically correct, semantic properties of the context.

172
%of training data Prec. Recall Target det Confused with
a 9.61% (388,476) 70.52% 53.50% a the null
the 29.19% (1,180,435) 85.17% 91.51% a xx 92.92% 7.08%
null 61.20% (2,475,014) 98.63% 98.79% the 80.66% xx 19.34%
null 14.51% 85.49% xx
Table 8: L1 results - individual determiners
Table 9: Confusion matrix for L1 determiners
5.2 Determiners
ride any true linguistic information the model has
For the determiner task, we also consider only the acquired, otherwise the predominant choice would
three most frequent cases (a, the, null), which always be the null case. On the contrary, these re-
gives us a training dataset consisting of 4,043,925 sults show that the model is indeed capable of dis-
instances. We achieve accuracy of 92.15% on the tinguishing between contexts which require a de-
L1 data (test set size: 305,264), as shown in Ta- terminer and those which do not, but requires fur-
ble 7. Again, the baseline refers to the most fre- ther fine tuning to perform better in knowing which
quent class, null. of the two determiner options to choose. Perhaps
The best reported results to date on determiner the introduction of a discourse dimension might
selection are those in Turner and Charniak (2007). assist in this respect. We plan to experiment with
Our model outperforms their n-gram language some simple heuristics: for example, given a se-
model approach by over 5%. Since the two ap- quence ‘Determiner Noun’, has the noun appeared
proaches are not tested on the same data this com- in the preceding few sentences? If so, we might
parison is not conclusive, but we are optimistic that expect the to be the correct choice rather than a.
there is a real difference in accuracy since the type
of texts used are not dissimilar. As in the case of 6 Testing the model
the prepositions, it is interesting to see whether this
6.1 Working with L2 text
high performance is equally distributed across the
three classes; this information is reported in Ta- To evaluate the model’s performance on learner
ble 8. Here we can see that there is a very strong data, we use a subsection of the Cambridge
correlation between amount of data seen in train- Learner Corpus (CLC)5 . We envisage our model to
ing and precision and recall. The indefinite arti- eventually be of assistance to learners in analysing
cle’s lower ‘learnability’, and its lower frequency their writing and identifying instances of preposi-
appears not to be peculiar to our data, as it is also tion or determiner usage which do not correspond
found by Gamon et al. among others. to what it has been trained to expect; the more
The disparity in training is a reflection of the dis- probable instance would be suggested as a more
tribution of determiners in the English language. appropriate alternative. In using NLP tools and
Perhaps if this imbalance were addressed, the techniques which have been developed with and
model would more confidently learn contexts of for L1 language, a loss of performance on L2 data
use for a, too, which would be desirable in view of is to be expected. These methods usually expect
using this information for error correction. On the grammatically well-formed input; learner text is
other hand, this would create a distorted represen- often ungrammatical, misspelled, and different in
tation of the composition of English, which may content and structure from typical L1 resources
not be what we want in a statistical model of lan- such as the WSJ and the BNC.
guage. We plan to experiment with smaller scale,
6.2 Prepositions
more similar datasets to ascertain whether the issue
is one of training size or of inherent difficulty in For the preposition task, we extract 2523 instances
learning about the indefinite article’s occurrence. of preposition use from the CLC (1282 correct,
In looking at the confusion matrix for determin- 1241 incorrect) and ask the classifier to mark them
ers (Table 9), it is interesting to note that for the 5
The CLC is a computerised database of contemporary
classifier’s mistakes involving a or the, the erro- written learner English (currently over 25m words). It was
neous choice is in the almost always the other de- developed jointly by Cambridge ESOL and Cambridge Uni-
versity Press. The Cambridge Error Coding System has been
terminer rather than the null case. This suggests developed and applied manually to the data by Cambridge
that the frequency effect is not so strong as to over- University Press.

173
Instance type Accuracy model on texts written by students of different lev-
Correct 66.7% els of proficiency, as their grammar may be more
Incorrect 70% error-free and more likely to be parsed correctly.
Alternatively, we could modify the parser so as to
Table 10: Accuracy on L2 data - prepositions. Ac- skip cases where it requires several attempts before
curacy on incorrect instances refers to the classifier producing a parse, as these more challenging cases
successfully identifying the preposition in the text could be indicative of very poorly structured sen-
as not appropriate for that context. tences in which misused prepositions are depen-
dent on more complex errors.
as correct or incorrect. The results from this task A different kind of problem impacting our accu-
are presented in Table 10. These first results sug- racy scores derives from those instances where the
gest that the model is fairly robust: the accuracy classifier selects a preposition which can be cor-
rate on the correct data, for example, is not much rect in the given context, but is not the correct one
lower than that on the L1 data. In an application in that particular case. In the example I received
designed to assist learners, it is important to aim a beautiful present at my birthday, the classifier
to reduce the rate of false alarms - cases where the identifies the presence of the error, and suggests
original is correct, but the model flags an error - to the grammatically and pragmatically appropriate
a minimum, so it is positive that this result is com- correction for. The corpus annotators, however,
paratively high. Accuracy on error identification is indicate on as the correct choice. Since we use
at first glance even more encouraging. However, if their annotations as the benchmark against which
we look at the suggestions the model makes to re- to evaluate the model, this instance is counted as
place the erroneous preposition, we find that these the classifier being wrong because it disagrees with
are correct only 51.5% of the time, greatly reduc- the annotators. A better indication of the model’s
ing its usefulness. performance may be to independently judge its de-
cisions, to avoid being subject to the annotators’
6.2.1 Further discussion bias. Finally, we are beginning to look at the rela-
A first analysis of the classifier’s decisions and its tions between preposition errors and other types of
errors points to various factors which could be im- error such as verb choice, and how these are anno-
pairing its performance. Spelling mistakes in the tated in the data.
input are one of the most immediate ones. For ex- An overview of the classifier’s error patterns for
ample, in the sentence I’m Franch, responsable on the data in this task shows that they are largely sim-
the computer services, the classifier is not able to ilar to those observed in the L1 data. This sug-
suggest a correct alternative to the erroneous on: gests that the gap in performance between L1 and
since it does not recognise the adjective as a mis- L2 is due more to the challenges posed by learner
spelling of responsible, it loses the information as- text than by inherent shortcomings in the model,
sociated with this lexical feature, which could po- and therefore that the key to better performance
tentially determine the preposition choice. is likely to lie in overcoming these problems. In
A more complex problem arises when poor future work we plan to use L2 data where some
grammar in the input misleads the parser so that of the spelling errors and non-preposition or deter-
the information it gives for a sentence is incor- miner errors have been corrected so that we can
rect, especially as regards PP attachment. In this see which of the other errors are worth focussing
example, I wold like following equipment to my on first.
speech: computer, modem socket and microphone,
the missing the leads the parser to treat following 6.3 Determiners
as a verb, and believes it to be the verb to which the
preposition is attached. It therefore suggests from Our work on determiner error correction is still in
as a correction, which is a reasonable choice given the early stages. We follow a similar procedure to
the frequency of phrases such as to follow from. the prepositions task, selecting a number of both
However, this was not what the PP was meant correct and incorrect instances. On the former (set
to modify: impaired performance from the parser size 2000) accuracy is comparable to that on L1
could be a significant negative factor in the model’s data: 92.2%. The danger of false alarms, then, ap-
performance. It would be interesting to test the pears not to be as significant as for the prepositions

174
task. On the incorrect instances (set size ca. 1200), glish L2, were the classifier to display student-like
however, accuracy is less than 10%. error patterns, insights into ‘error triggers’ could
Preliminary error analysis shows that the model be derived from the L2 pedagogical literature to
is successful at identifying cases of misused deter- improve the classifier. The analysis of the types
miner, e.g. a for the or vice versa, doing so in over of errors made by human learners yields some in-
two-thirds of cases. However, by far the most fre- sights which might be worthy of further investi-
quent error type for determiners is not confusion gation. A clear one is the confusion between the
between indefinite and definite article, but omitting three locative and temporal prepositions at, in, and
an article where one is needed. At the moment, the on (typical sentence: The training programme will
model detects very few of these errors, no doubt in- start at the 1st August). This type of error is made
fluenced by the preponderance of null cases seen often by both learners and the model on both types
in training. Furthermore, some of the issues raised of data, suggesting that perhaps further attention
earlier in discussing the application of NLP tools to features might be necessary to improve discrim-
to L2 language hold for this task, too. ination between these three prepositions.
In addition to those, though, in this task more There are also interesting divergences. For ex-
than for prepositions we believe that differences in ample, a common source of confusion in learners
text type between the training texts - the BNC - is between by and from, as in I like it because
and the testing material - learner essays - has a sig- it’s from my favourite band. However, this confu-
nificant negative effect on the model. In this task, sion is not very frequent in the model, a difference
the lexical items play a crucial role in class assign- which could be explained either by the fact that,
ment. If the noun in question has not been seen in as noted above, performance on from is very low
training, the classifier may be unable to make an and so the classifier is unlikely to suggest it, or that
informed choice. Although the BNC comprises a in training the contexts seen for by are sufficiently
wide variety of texts, there may not be a sufficient distinctive that the classifier is not misled like the
number covering topics typical of learner essays, learners.
such as ‘business letters’ or ‘postcards to penpals’. Finally, a surprising difference comes from
Also, the BNC was created with material from al- looking at what to is confused with. The model
most 20 years ago, and learners writing in contem- often suggests at where to would be correct. This
porary English may use lexical items which are not is perhaps not entirely unusual as both can occur
very frequently seen in the BNC. A clear exam- with locative complements (one can go to a place
ple of this discrepancy is the noun internet, which or be at a place) and this similarity could be con-
requires the definite article in English, but not in fusing the classifier. Learners, however, although
several other languages, leading to countless sen- they do make this kind of mistake, are much more
tences such as I saw it in internet, I booked it on hampered by the confusion between for and to, as
internet, and so on. This is one of the errors the in She was helpful for me or This is interesting
model never detects: a fact which is not surpris- for you. In other words, for learners it seems that
ing when we consider that this noun occurs only the abstract use of this preposition, its benefactive
four times in the whole of the training data. It may sense, is much more problematic than the spatial
be therefore necessary to consider using alternative sense. We can hypothesise that the classifier is less
sources of training data to overcome this problem distracted by these cases because the effect of the
and improve the classifier’s performance. lexical features is stronger.
A more detailed discussion of the issues arising
7 Comparison to human learners from the comparison of confusion pairs cannot be
had here. However, in noting both divergences and
In developing this model, our first aim was not to similarities between the two learners, human and
create something which learns like a human, but machine, we may be able to derive useful insights
something that works in the best and most effi- into the way the learning processes operate, and
cient possible way. However, it is interesting to what factors could be more or less important for
see whether human learners and classifiers display them.
similar patterns of errors in preposition choice.
This information has twofold value: as well as be-
ing of pedagogical assistance to instructors of En-

175
8 Conclusions and future directions Han, Na-Rae, Martin Chodorow, and Claudia Leacock.
2006. Detecting errors in English article usage by
This paper discussed a contextual feature based non-native speakers. Natural Language Engineer-
approach to the automatic acquisition of models ing, 12(1):115–129.
of use for prepositions and determiners, which
Izumi, Emi, Kiyotaka Uchimoto, and Hitoshi Isahara.
achieve an accuracy of 70.06% and 92.15% re- 2004. SST speech corpus of Japanese learners’
spectively, and showed how it can be applied to an English and automatic detection of learners’ errors.
error correction task for L2 writing, with promis- ICAME, 28:31–48.
ing early results. There are several directions that Turner, Jenine and Eugen Charniak. 2007. Language
can be pursued to improve accuracy on both types modeling for determiner selection. In NAACL-HLT
of data. The classifier can be further fine-tuned to Companion volume.
acquire more reliable models of use for the two Yi, Xing, Jianfeng Gao, and William Dolan. 2008. A
POS. We can also experiment with its confidence web-based English proofing system for ESL users.
thresholds, for example allowing it to make an- In Proceedings of IJCNLP.
other suggestion when its confidence in its first
choice is low. Furthermore, issues relating to the
use of NLP tools with L2 data must be addressed,
such as factoring out spelling or other errors in the
data, and perhaps training on text types which are
more similar to the CLC. In the longer term, we
also envisage mining the information implicit in
our training data to create a lexical resource de-
scribing the statistical tendencies observed.
Acknowledgements
We wish to thank Stephen Clark and Laura Rimell for stim-
ulating discussions and the anonymous reviewers for their
helpful comments. We acknowledge Cambridge University
Press’s assistance in accessing the Cambridge Learner Corpus
data. Rachele De Felice was supported by an AHRC scholar-
ship for the duration of her studies.

References
Briscoe, Ted, John Carroll, and Rebecca Watson.
2006. The second release of the RASP system. In
COLING-ACL 06 Demo Session.
Chodorow, Martin, Joel Tetreault, and Na-Rae Han.
2007. Detection of grammatical errors involv-
ing prepositions. In Proceedings of the 4th ACL-
SIGSEM Workshop on Prepositions.
De Felice, Rachele and Stephen Pulman. 2007. Au-
tomatically acquiring models of preposition use. In
Proceedings of the 4th ACL-SIGSEM Workshop on
Prepositions.
De Felice, Rachele. forthcoming. Recognising prepo-
sition and determiner errors in learner English.
Ph.D. thesis, Oxford University Computing Labora-
tory.
Gamon, M., J. Gao, C. Brockett, A. Klementiev,
W. Dolan, D. Belenko, and L. Vanderwende. 2008.
Using contextual speller techniques and language
modeling for ESL error correction. In Proceedings
of IJCNLP.

176

You might also like