Skip to main content

Table 2 Statistics of species distribution in the different data sets.

From: The gene normalization task in BioCreative III

# Training Set (32 articles) Test Set (50 articles) Test Set (507 articles)
1 S. cereviaiae (27%) Enterobacter sp. 638 (23%) H. Sapiens (42%)
2 H. sapiens (20%) M. musculus (14%) M. musulus (24%)
3 M. musculus (12%) H. Sapiens (11%) D. melanogaster (6%)
4 D. melanogaster (10%) S. pneumoniae TIGR4 (9%) S. cerevisiae S228c (6%)
5 D. rerio (7%) S. scrofa (5%) Enterobacter sp. 638 (4%)
6 A. thaliana (5%) M. oryzae 70-15 (4%) R. norvegicus (4%)
7 C. elegans (3%) D. melanogaster (4%) A. thaliana (2%)
8 X. laevis (3%) R. norvegicus (3%) C. elegans (2%)
9 R. norvegicus (2%) S. cerevisiae S228c(2%) S. pneumoniae TIGR4 (2%)
10 G. gallus (2%) E. histolytica HM-1 (2%) S. scrofa (1%)
11+ Other 18 species (9%) Other 65 species (23%) Other 91 species (7%)