Skip to main content

Table 4 Species distribution across data sets

From: Multi-stage gene normalization for full-text articles with context-based species filtering for dynamic dictionary entry selection

#

Training Set (32 articles)

Test Set (50 articles)

Test Set (507 articles)

1

S.cereviaiae (27%)

Enterobacter sp.638 (23%)

H.Sapiens (42%)

2

H.sapiens (20%)

M.musculus (14%)

M.musculus (24%)

3

M.musculus (12%)

H.Sapiens (11%)

D.melanogaster (6%)

4

D.melanogaster (10%)

S.pneumoniae TIGR4 (9%)

S.cerevisiae S228c (6%)

5

D.rerio (7%)

S.scrofa (5%)

Enterobacter sp.638 (4%)

6

A.thaliana (5%)

M.oryzae 70-15 (4%)

R.norvegicus (4%)

7

C.elegans (3%)

D.melanogaster (4%)

A.thaliana (2%)

8

x.laevis (3%)

R.norvegicus (3%)

C.elegans (2%)

9

R.norvegicus (2%)

S.cerevisiae S228c(2%)

S.pneumoniae TIGR4 (2%)

10

G.gallus (2%)

E.histolytica HM-l (2%)

S.scrofa (1 %)

11

Other 18 species (9%)

Other 65 species (23%)

Other 91 species (7%)