Skip to main content

Table 8 Comparison of concept assignments in E. coli and human abstracts

From: Construction of an annotated corpus to support biomedical information extraction

E. coli

Human

Category

Count

(%)

Type

Category

Count

(%)

Type

Gene

645

(16.41)

G

Gene

129

(11.77)

G

Gene_Expression

350

(8.91)

G

Protein

112

(10.22)

S

Regulator

287

(7.30)

S

Transcription_Factor

107

(9.76)

G

Promoter

255

(6.49)

S

Gene_Expression

83

(7.57)

G

Transcription

200

(5.09)

S

Cells

61

(5.57)

S

Regulation

199

(5.06)

S

Transcription

60

(5.47)

S

Gene_Activation

189

(4.81)

S

Gene_Activation

60

(5.47)

S

Protein

170

(4.33)

S

Activator

47

(4.29)

S

Repressor

158

(4.02)

S

Regulation

43

(3.92)

S

Activator

150

(3.81)

S

DNA

33

(3.01)

S

Operon

148

(3.77)

S

Promoter

31

(2.83)

S

Gene_Repression

136

(3.46)

S

Transcription_Binding_Site

31

(2.83)

G

Locus

99

(2.52)

S

Protein_Complex

26

(2.37)

S

Enzyme

82

(2.09)

G

Sub_Unit

23

(2.10)

S

DNA

79

(2.01)

S

mRNA

22

(2.01)

S

  1. Separate lists are shown for E. coli abstracts and human abstracts. For each category, the total number of identified concepts assigned to the category is indicated, together with the percentage of all events in the corpus section that this figure represents. The Type column indicates whether each category is a (G)eneral category within its hierarchy (meaning that it has its own child concepts) or a (S)pecific category, indicating that it is a bottom-level category with no child concepts within its hierarchy.