Skip to main content

Table 8 Comparison of concept assignments in E. coli and human abstracts

From: Construction of an annotated corpus to support biomedical information extraction

E. coli Human
Category Count
(%)
Type Category Count
(%)
Type
Gene 645
(16.41)
G Gene 129
(11.77)
G
Gene_Expression 350
(8.91)
G Protein 112
(10.22)
S
Regulator 287
(7.30)
S Transcription_Factor 107
(9.76)
G
Promoter 255
(6.49)
S Gene_Expression 83
(7.57)
G
Transcription 200
(5.09)
S Cells 61
(5.57)
S
Regulation 199
(5.06)
S Transcription 60
(5.47)
S
Gene_Activation 189
(4.81)
S Gene_Activation 60
(5.47)
S
Protein 170
(4.33)
S Activator 47
(4.29)
S
Repressor 158
(4.02)
S Regulation 43
(3.92)
S
Activator 150
(3.81)
S DNA 33
(3.01)
S
Operon 148
(3.77)
S Promoter 31
(2.83)
S
Gene_Repression 136
(3.46)
S Transcription_Binding_Site 31
(2.83)
G
Locus 99
(2.52)
S Protein_Complex 26
(2.37)
S
Enzyme 82
(2.09)
G Sub_Unit 23
(2.10)
S
DNA 79
(2.01)
S mRNA 22
(2.01)
S
  1. Separate lists are shown for E. coli abstracts and human abstracts. For each category, the total number of identified concepts assigned to the category is indicated, together with the percentage of all events in the corpus section that this figure represents. The Type column indicates whether each category is a (G)eneral category within its hierarchy (meaning that it has its own child concepts) or a (S)pecific category, indicating that it is a bottom-level category with no child concepts within its hierarchy.