  POS tagging Syntax parsing NER NED/EL SRL iSRL Entity Coref. Event Coref. WSD Quote Attrib.
Reuters Reuters CoNLL-2003[3] AIDA-CoNLL[4] NP4E[5] (partial) NP4E[5] (partial)
WSJ PENN Treebank[6] Constituent: PENN Treebank[6] BBN[7] Constituent: PropBank[8], NomBank[9];

Dependency: CoNLL-2008[10]

Pronoun: BBN[7] SemEval-2007 Task 17 (3,500 word)[11] PARC[12][13]
FrameNet 1.5 (WSJ+AQUAINT +MASC+LUcorpus+misc.) FrameNet FrameNet
OntoNotes 4.0 (?) Moor et al. (2013)[14](partial)
OntoNotes 5.0 (WSJ 300K+TDT4+LCD+Web) OntoNotes OntoNotes (PropBank-style) OntoNotes OntoNotes (coarse-grained)
Sherlock Holmes SemEval-2010[15] SemEval-2010[15]
Brown Brown Constituent: PENN Treebank[6] PropBank-style: CoNLL-2005[16]
SemCor (part of Brown) Brown SemCor
WSMT (13 articles) Semeval-2013 task 12 (BabelNet 1.1.1)[17]
RSS-500 NIF NER N3[18] N3

See also Edit

From Hovy et al. (2006)[19]

An example of the latter type is the Salsa project (Burchardt et al., 2004), which produced a German lexicon based on the FrameNet semantic frames and annotated a large German newswire corpus. A second example, the Prague Dependency Treebank (Hajic et al., 2001), has annotated a large Czech corpus with several levels of (tectogrammatical) representation, including parts of speech, syntax, and topic/focus information structure. Finally, the IL-Annotation project (Reeder et al., 2004) focused on the representations required to support a series of increasingly semantic phenomena across seven languages (Arabic, Hindi, English, Spanish, Korean, Japanese and French). In intent and in many details, OntoNotes is compatible with all these efforts, which may one day all participate in a larger multilingual corpus integration effort."

