Skip to main content

Table 1 Corpora

From: Comparative analysis of five protein-protein interaction corpora

   AIMed BioInfer HPRD50 IEPA LLL
  size 1955 1100 145 486 77
Entity scope human P/G P/G/R and related human P/G Chemicals P/G
coverage all occurrences all occurrences NER system list of 16 names list of 116 names  
types no 111 types (ontology) no no P/G  
PPI types no 68 types (ontology) no no 3 types
binding no yes no yes no  
directed no yes no yes yes  
complex no yes no no no  
negative no yes no no no  
certainty no no yes no no  
  1. Legend:
  2. Size: Number of sentences in the corpus
  3. Entity scope: Types of the named entities identified in the corpus: (P)rotein, (G)ene, (R)NA
  4. Entity coverage: Coverage of in-scope entity occurrences in each sentence
  5. Entity types: Explicit identification of the type of the annotated named entity occurrences
  6. PPI types: Explicit indication of the type of the annotated interactions
  7. PPI binding: Identification of the specific text spans that entail the annotated interactions
  8. PPI directed: Specification of the directionality of the interaction (typically identification of agent vs. patient roles)
  9. PPI complex: Annotation includes nested or n-ary (for n > 2) interactions
  10. PPI negative: Annotation of negative interactions
  11. PPI certainty: Annotation of the levels of certainty, or speculativeness, of interactions