Some Corpora of English Linguistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Some corpora of English for linguistic research

Gregory Garretson
Department of English, Uppsala University

Many corpora of English have been compiled over the years. Some of these are freely
available, while others require special permission. Some are linked to an online interface,
while others must be downloaded and searched with corpus tools. This document is meant
to give you some ideas for how to find material for your corpus study.

Places to start
Two good websites with several corpora each are the BYU corpus website and the CQPweb
website. Both have powerful interfaces, though they allow you to do different things:
https://www.english-corpora.org
à For this, try the Uppsala University subscription link:
http://ezproxy.its.uu.se/login?url=http://corpus.byu.edu
https://cqpweb.lancs.ac.uk
CQPweb has many different corpora, but some are password-protected. If there is a corpus
there you are interested in but don’t have access to (e.g. ARCHER), let us know and we can
see if it’s possible to get access. But first, you will need to create a free account on the site.
There are also many corpora available locally in the department. See the handout “What is
corpus linguistics?” for a description of these.
What follows is a list of a few selected corpora in different categories.

Corpora of present-day English:

The British National Corpus (BNC)


BYU interface: https://www.english-corpora.org/bnc
CQP interface: http://bncweb.lancs.ac.uk

The British National Corpus 2014 (BNC2014)


http://corpora.lancs.ac.uk/bnc2014 (only spoken part finished; requires application)

The Corpus of Contemporary American English (COCA)


https://www.english-corpora.org/coca

The International Corpus of English, including the ICE-GB Corpus


(available from the department; look in \\user.uu.se\epit\stud_lib\Wsicame\ICE)
info: http://www.ucl.ac.uk/english-usage/projects/ice.htm

Fairly recent (present-day) historical corpora

The Corpus of Historical American English (COHA)


https://www.english-corpora.org/coha

The TIME Magazine Corpus


https://www.english-corpora.org/time

The Brown Family of corpora


(available from the department)
info: http://www.helsinki.fi/varieng/CoRD/corpora/BROWN

Older historical corpora:

The Old Bailey Corpus


http://www.uni-giessen.de/oldbaileycorpus

The ARCHER corpus (requires permission)


https://cqpweb.lancs.ac.uk/archer_untagged
info: http://www.alc.manchester.ac.uk/subjects/lel/research/projects/archer

The Corpus of English Dialogues (CED)


https://cqpweb.lancs.ac.uk/engdia
info: http://www.helsinki.fi/varieng/CoRD/corpora/CED/index.html

The Helsinki corpus


(available from the department)
info: http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus

Corpora of learner English

The ULEC corpus (high-school student writing) and The USE Corpus
(university student writing)
(available from the department)

The ICLE corpora


(available from the department, only in rooms 9-1070 and 9-1064)
info: http://www.uclouvain.be/en-cecl-icle.html

Corpora of academic speech or writing (professional or student)

MICASE (spoken):
https://quod.lib.umich.edu/cgi/c/corpus/corpus?c=micase;page=simple

MICUSP (written): http://eli-corpus.lsa.umich.edu

Corpus of Research Articles 2007: http://rcpce.engl.polyu.edu.hk/RACorpus

Concordancers (search tools)


For corpora that you search on your own computer (i.e. not via a Web interface), you have
access to WordSmith Tools if you work on campus. If you do not plan to work on campus,
you might try AntConc, which is similar, almost as powerful, and free:
http://www.laurenceanthony.net/antconc_index.html

Other resources
For information on other corpora and tools, this website contains a wealth of information:
http://tiny.cc/corpora
(the same as http://martinweisser.org/corpora_site/CBLLinks.html )
To find lists of corpora on that site, look in the menu under:
CBL Links à Corpora Section
To find lists of corpus tools on that site, look in the menu under:
CBL Links à Software Section

You might also like