Skip to main content

Table 1 Characteristic of corpora

From: Investigating heterogeneous protein annotations toward cross-corpora utilization

   AIMed GENETAG GENIA
Size abstracts 225   1,999
  sentences 1,987 10,000 18,554
Entity scope human P/G P/G/R human P/G/R
  number 4,075 11,739 34,264(P)/10,002(G)/944(R)
  coverage specific occurrence specific occurrence all occurrences
  type no no Ontology
     7 types(P)/5 types(G)/5 types(R)
  1. Legend:
  2. Size: Number of abstracts or sentences in the corpus used in this research
  3. Entity scope: Types of the named entities identified in the corpus: (P)rotein, (G)ene, (R)NA
  4. Entity number: Number of the annotated in-scope entities in the corpus
  5. Entity coverage: Coverage of in-scope entity occurrences in each sentence
  6. Entity type: Explicit identification of the types of the annotated in-scope entities