Skip to main content

Table 1 Characteristic of corpora

From: Investigating heterogeneous protein annotations toward cross-corpora utilization

  

AIMed

GENETAG

GENIA

Size

abstracts

225

 

1,999

 

sentences

1,987

10,000

18,554

Entity

scope

human P/G

P/G/R

human P/G/R

 

number

4,075

11,739

34,264(P)/10,002(G)/944(R)

 

coverage

specific occurrence

specific occurrence

all occurrences

 

type

no

no

Ontology

    

7 types(P)/5 types(G)/5 types(R)

  1. Legend:
  2. Size: Number of abstracts or sentences in the corpus used in this research
  3. Entity scope: Types of the named entities identified in the corpus: (P)rotein, (G)ene, (R)NA
  4. Entity number: Number of the annotated in-scope entities in the corpus
  5. Entity coverage: Coverage of in-scope entity occurrences in each sentence
  6. Entity type: Explicit identification of the types of the annotated in-scope entities