Skip to main content

Table 3 Statistics of annotations in different sections of text

From: The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011

Item Abstract    Full paper   
All TIAB Intro. R/D/C Methods Caption
Words 267229 80962 3538 7878 43420 19406 6720
Proteins 14969 6580 336 597 3980 916 751
(Density: P/W) (5.60%) (8.13%) (9.50%) (7.58%) (9.17%) (4.72%) (11.18%)
Event triggers 11057 3280 216 312 2659 136 173
Events 13603 4436 272 427 3234 198 278
(Density: E/W) (5.09%) (5.48%) (7.69%) (5.42%) (7.51%) (1.02%) (4.14%)
(Density: E/P) (90.87%) (67.42%) (80.95%) (71.52%) (81.93%) (21.62%) (37.02%)
(Avg. Coord.: E/T) (1.23) (1.27) (1.26) (1.37) (1.23) (1.46) (1.61)
Gene expression 2816 1193 62 98 841 80 112
Transcription 795 204 7 7 140 30 20
Protein catabolism 145 3 0 0 3 0 0
Phosphorylation 355 137 12 12 101 10 2
Localization 492 47 3 15 22 7 0
Binding 1485 380 16 74 266 6 18
Regulation 1426 371 35 30 281 4 21
Positive_regulation 4452 1385 98 131 1087 15 54
Negative_regulation 1637 716 39 60 520 46 51
  1. The Abstract column shows the statistics of the abstraction collection (1210 titles and abstracts), and the following columns show that of the full paper collection (14 full papers). TIAB = title and abstract, Intro. = introduction and background, R/D/C = results, discussions, and conclusions, Methods = methods, materials, and experimental procedures. Some minor sections, supporting information, supplementary material, and synopsis, are ignored. Density = relative density of annotation (P/W = Proteins/Words, E/W = Events/Words, and E/P = Events/Proteins). Avg. Coord = average number of coordinated events (E/T = Events/Triggers).