Skip to main content

Table 1 Training and test dataset: Datasets used for training and testing the support vector machines. The columns are: 1. The number of cDNA sequences for training. 2. The number of cDNA sequences with BLAST hits having GO molecular function terms. 3. The average number of GO molecular function terms per cDNA sequence of the BLAST-hits. 4. The classification of GO terms coming from the hits, positive if the GO terms were similar to original annotation, negative otherwise.

From: Applying Support Vector Machines for Gene ontology based gene function prediction

Organisms Number of cDNAs cDNA with MF GO Number of GO/cDNA Class distribution
     % Positive % Negative
Rat 1039 1036 36.90 25.7 74.3
Fish 1061 1044 32.10 39.2 60.8
Fly 5840 5574 25.47 23.4 76.6
Worm 4272 3458 27.13 39.5 60.5
Plasmodium 274 271 23.67 28.0 72.0
Leishmania 82 82 20.51 35.1 64.9
Yeast 3356 2972 18.60 23.7 76.3
Bacillus 2729 2577 13.63 35.4 64.6
Coxiella 931 900 12.33 37.0 63.0
Shewanella 2413 2303 10.78 33.0 67.0
Vibrio 1832 1804 12.54 31.9 68.1
Arabidopsis 8807 8120 26.66 30.2 69.8