TWiki> Main Web>NlpCorpora (2015-03-25, olzama)EditAttach

Corpus Usage Guidelines

Access Policies

In order to ensure compliance with the licenses for the various corpora we have installed, we have instituted the following policies.

  1. Compling laboratory members are granted access to corpora solely for coursework and research projects in the context of their affiliation with the UW.
  2. Corpora may not be copied from the servers, nor used in commercial applications, unless permitted by the corpus license agreement.
  3. Many of the corpora have additional licensing conditions (see the CompLing Database.) Before you access any particular corpus, you are responsible for reading and understanding the license. For LDC corpora, you should also read the general membership agreement.
  4. For some of the corpora, we must maintain a list of individuals granted access and/or have each user sign an individual license agreement. This is indicated in the "Restriction" column in the database. To access these corpora, you'll need to click the "Request Access" button and agree to the license agreement.
  5. Whenever you use a corpus for course work or for a paper, you should cite the corpus among your references. The proper citation information should be found in the license or README file of the corpus.
  6. Failure to follow these policies could result in loss of access to the corpora, or to the lab/servers in general.

Available corpora

For a list of currently available corpora, along with their licensing and access information, see the CompLing Database.

(If your browser prompts you with a certificate warning, you need to install the UW root certificate.)


  • Installed means the corpus is currently installed on the server and ready to use.
  • Available means the corpus is immediately available, but not currently installed on the server.
  • Requested means that a request has been put in to LDC for the corpus, but it's not immediately available.

We can obtain any LDC corpus, but there may be a lead time of several weeks for corpora that are not listed in the database.

Requesting additional corpora

Lab members who would like access to a corpus listed as "Available" in the database should send an email to linghelp@u with a request for it to be installed.

Lab members who would like access to a corpus not listed in the database should send an email to Emily (ebender at u) with the request.

Topic revision: r16 - 2015-03-25 - 21:57:36 - olzama

This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Privacy Statement Terms & Conditions