Skip to main content

Table 1 Corpora

From: Comparative analysis of five protein-protein interaction corpora

  

AIMed

BioInfer

HPRD50

IEPA

LLL

 

size

1955

1100

145

486

77

Entity

scope

human P/G

P/G/R and related

human P/G

Chemicals

P/G

coverage

all occurrences

all occurrences

NER system

list of 16 names

list of 116 names

 

types

no

111 types (ontology)

no

no

P/G

 

PPI

types

no

68 types (ontology)

no

no

3 types

binding

no

yes

no

yes

no

 

directed

no

yes

no

yes

yes

 

complex

no

yes

no

no

no

 

negative

no

yes

no

no

no

 

certainty

no

no

yes

no

no

 
  1. Legend:
  2. Size: Number of sentences in the corpus
  3. Entity scope: Types of the named entities identified in the corpus: (P)rotein, (G)ene, (R)NA
  4. Entity coverage: Coverage of in-scope entity occurrences in each sentence
  5. Entity types: Explicit identification of the type of the annotated named entity occurrences
  6. PPI types: Explicit indication of the type of the annotated interactions
  7. PPI binding: Identification of the specific text spans that entail the annotated interactions
  8. PPI directed: Specification of the directionality of the interaction (typically identification of agent vs. patient roles)
  9. PPI complex: Annotation includes nested or n-ary (for n > 2) interactions
  10. PPI negative: Annotation of negative interactions
  11. PPI certainty: Annotation of the levels of certainty, or speculativeness, of interactions