Skip to main content

Table 1 Training and test dataset: Datasets used for training and testing the support vector machines. The columns are: 1. The number of cDNA sequences for training. 2. The number of cDNA sequences with BLAST hits having GO molecular function terms. 3. The average number of GO molecular function terms per cDNA sequence of the BLAST-hits. 4. The classification of GO terms coming from the hits, positive if the GO terms were similar to original annotation, negative otherwise.

From: Applying Support Vector Machines for Gene ontology based gene function prediction

Organisms

Number of cDNAs

cDNA with MF GO

Number of GO/cDNA

Class distribution

    

% Positive

% Negative

Rat

1039

1036

36.90

25.7

74.3

Fish

1061

1044

32.10

39.2

60.8

Fly

5840

5574

25.47

23.4

76.6

Worm

4272

3458

27.13

39.5

60.5

Plasmodium

274

271

23.67

28.0

72.0

Leishmania

82

82

20.51

35.1

64.9

Yeast

3356

2972

18.60

23.7

76.3

Bacillus

2729

2577

13.63

35.4

64.6

Coxiella

931

900

12.33

37.0

63.0

Shewanella

2413

2303

10.78

33.0

67.0

Vibrio

1832

1804

12.54

31.9

68.1

Arabidopsis

8807

8120

26.66

30.2

69.8