Skip to main content

Table 2 Statistics of species distribution in the different data sets.

From: The gene normalization task in BioCreative III

#

Training Set (32 articles)

Test Set (50 articles)

Test Set (507 articles)

1

S. cereviaiae (27%)

Enterobacter sp. 638 (23%)

H. Sapiens (42%)

2

H. sapiens (20%)

M. musculus (14%)

M. musulus (24%)

3

M. musculus (12%)

H. Sapiens (11%)

D. melanogaster (6%)

4

D. melanogaster (10%)

S. pneumoniae TIGR4 (9%)

S. cerevisiae S228c (6%)

5

D. rerio (7%)

S. scrofa (5%)

Enterobacter sp. 638 (4%)

6

A. thaliana (5%)

M. oryzae 70-15 (4%)

R. norvegicus (4%)

7

C. elegans (3%)

D. melanogaster (4%)

A. thaliana (2%)

8

X. laevis (3%)

R. norvegicus (3%)

C. elegans (2%)

9

R. norvegicus (2%)

S. cerevisiae S228c(2%)

S. pneumoniae TIGR4 (2%)

10

G. gallus (2%)

E. histolytica HM-1 (2%)

S. scrofa (1%)

11+

Other 18 species (9%)

Other 65 species (23%)

Other 91 species (7%)