Some Corpora of English Linguistics
Some Corpora of English Linguistics
Some Corpora of English Linguistics
Gregory Garretson
Department of English, Uppsala University
Many corpora of English have been compiled over the years. Some of these are freely
available, while others require special permission. Some are linked to an online interface,
while others must be downloaded and searched with corpus tools. This document is meant
to give you some ideas for how to find material for your corpus study.
Places to start
Two good websites with several corpora each are the BYU corpus website and the CQPweb
website. Both have powerful interfaces, though they allow you to do different things:
https://www.english-corpora.org
à For this, try the Uppsala University subscription link:
http://ezproxy.its.uu.se/login?url=http://corpus.byu.edu
https://cqpweb.lancs.ac.uk
CQPweb has many different corpora, but some are password-protected. If there is a corpus
there you are interested in but don’t have access to (e.g. ARCHER), let us know and we can
see if it’s possible to get access. But first, you will need to create a free account on the site.
There are also many corpora available locally in the department. See the handout “What is
corpus linguistics?” for a description of these.
What follows is a list of a few selected corpora in different categories.
The ULEC corpus (high-school student writing) and The USE Corpus
(university student writing)
(available from the department)
MICASE (spoken):
https://quod.lib.umich.edu/cgi/c/corpus/corpus?c=micase;page=simple
Other resources
For information on other corpora and tools, this website contains a wealth of information:
http://tiny.cc/corpora
(the same as http://martinweisser.org/corpora_site/CBLLinks.html )
To find lists of corpora on that site, look in the menu under:
CBL Links à Corpora Section
To find lists of corpus tools on that site, look in the menu under:
CBL Links à Software Section