Skip to main content

Table 3 List of boundary words

From: Investigating heterogeneous protein annotations toward cross-corpora utilization

Category

Word

AIMed

GENIA

GENETAG

  

N a

N n

E b

N a

N n

E b

N a

N n

E b

Adjective

constitutive

0

0

0.0000

12

11

0.9986

2

2

1.0000

 

endogenous

0

0

0.0000

22

11

0.9183

9

9

1.0000

 

exogenous

0

0

0.0000

9

16

0.9427

2

3

0.9710

 

inducible

0

0

0.0000

18

17

0.9994

0

0

0.0000

 

low

0

0

0.0000

14

11

0.9896

2

3

0.9710

 

major

0

0

0.0000

25

15

0.9544

1

5

0.6500

 

putative

0

0

0.0000

15

15

1.0000

0

0

0.0000

 

recombinant

1

8

0.5033

36

24

0.9710

26

2

0.3712

 

soluble

1

10

0.4395

14

15

0.9991

1

4

0.7219

Noun before

factor

0

0

0.0000

5

26

0.6374

17

15

0.9972

 

plasma

0

0

0.0000

13

1

0.3712

17

12

0.9784

 

protein

12

34

0.8281

159

18

0.4743

53

10

0.6313

Noun after

form

0

0

0.0000

21

14

0.9710

0

0

0.0000

 

pathway

0

0

0.0000

0

0

0.0000

8

10

0.9911

 

protein

40

17

0.8791

794

14

0.1262

241

2

0.0688

  1. Here, "Noun before" indicates the noun occurring before an entity as a modifier, "Noun after" indicates the noun occurring after an entity as a head. N a represents the number of the annotated occurrences, and N n represents the number of un-annotated occurrences.