Skip to main content

Table 4 Species distribution across data sets

From: Multi-stage gene normalization for full-text articles with context-based species filtering for dynamic dictionary entry selection

# Training Set (32 articles) Test Set (50 articles) Test Set (507 articles)
1 S.cereviaiae (27%) Enterobacter sp.638 (23%) H.Sapiens (42%)
2 H.sapiens (20%) M.musculus (14%) M.musculus (24%)
3 M.musculus (12%) H.Sapiens (11%) D.melanogaster (6%)
4 D.melanogaster (10%) S.pneumoniae TIGR4 (9%) S.cerevisiae S228c (6%)
5 D.rerio (7%) S.scrofa (5%) Enterobacter sp.638 (4%)
6 A.thaliana (5%) M.oryzae 70-15 (4%) R.norvegicus (4%)
7 C.elegans (3%) D.melanogaster (4%) A.thaliana (2%)
8 x.laevis (3%) R.norvegicus (3%) C.elegans (2%)
9 R.norvegicus (2%) S.cerevisiae S228c(2%) S.pneumoniae TIGR4 (2%)
10 G.gallus (2%) E.histolytica HM-l (2%) S.scrofa (1 %)
11 Other 18 species (9%) Other 65 species (23%) Other 91 species (7%)