Skip to main content

Table 4 The sizes of corpora about genes and diseases

From: CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations

Corpus

# documents

# words

Annotation

CoMAGC

408

26177

821 sets of fourannotation concepts

Craven

1677

333845

829 gene-diseasepairs

PolsySearch

522

116380

341 gene-diseasepairs

GETM

150

38355

267 gene expression-anatomical locationpairs

MLEE

262

56588

6677 events

ID

30

153153

4150 events

CG

600

129878

17248 events

  1. All the corpora contain PubMed abstracts, except for the ID corpus which contains full text documents. For the Craven and PolySearch corpora, we show the number of positive gene-disease pairs only.