Skip to main content

Table 2 Statistics of the benchmark data sets for the GE and CO tasks.

From: The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011

  Training Tuning Test
Item Abs. Full Abs. Full Abs. Full
Articles 800 5 150 5 260 4
Words 176146 29583 33827 30305 57256 21791
Proteins 9300 2325 2080 2610 3589 1712
Coreferences 2247 - 463 - 714 -
   Relative pronouns 1193 - 254 - 349 -
   Pronouns 738 - 149 - 269 -
   Definite NPs 296 - 58 - 91 -
   Appositions 9 - 1 - 3 -
   Others 11 - 1 - 2 -
Events 8615 1695 1795 1455 3193 1294
   Gene_expression 1738 527 356 393 722 280
   Transcription 576 91 82 76 137 37
   Protein_catabolism 110 0 21 2 14 1
   Phosphorylation 169 23 47 64 139 50
   (with Site) (67) (0) (27) (12) (81) (15)
   Localization 265 16 53 14 174 17
   (with Loc) (116) (12) (32) (10) (111) (2)
   Binding 887 101 249 126 349 153
   (with Site) (138) (34) (50) (114) (24) (79)
   Regulation 961 152 173 123 292 96
   (with Site) (57) (8) (39) (17) (11) (3)
   Positive_regulation 2847 538 618 382 987 466
   (with Site) (175) (7) (75) (47) (37) (7)
   Negative_regulation 1062 247 196 275 379 194
   (with Site) (27) (9) (6) (18) (10) (7)
  1. The events and the coreferences annotations are used for the GE and CO tasks, respectively.