Skip to main content

Table 1 Absolute (and relative) frequencies for NEs in each data set

From: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition

 

protein

DNA

RNA

cell type

cell line

All

Training Set

30,269

9,533

951

6,713

3,830

51,301

 

(15.1)

(4.8)

(0.5)

(3.4)

(1.9)

(25.7)

Test Set

5,067

1,056

118

1,921

500

8,662

 

(12.5)

(2.6)

(0.3)

(4.8)

(1.2)

(21.4)