Skip to main content

Table 1 The cross-validation results yielded by yeast.

From: TransportTP: A two-phase classification approach for membrane transporter prediction and characterization

Organism

Num of proteins

Predictions by TransportTP

Annotations in TransportDB

Matches

Mismatches

TransportDB unique

Text mining validated

Recall (%)

Precision (%)

Balanced accuracy (%)

E. coli

5411

589

577

456

6

115

61

79.0

77.4

78.2

A. thaliana

26960

1073

1278

996

1

281

38

77.9

92.8

84.7

O. sativa

56278

1230

1283

1061

0

222

88

82.7

86.3

84.4

C. elegans

20051

906

667

601

1

65

87

90.1

66.3

76.4

D. melanogaster

13890

663

646

535

0

111

26

82.8

80.7

81.7

H. sapiens

37742

1272

1466

1140

3

323

79

77.8

89.6

83.3

Average on model proteomes

81.7

82.2

82.0

P. torridus

1535

165

171

137

1

33

15

80.1

83.0

81.5

P. profundum

5489

550

580

445

4

131

35

76.7

80.9

78.8

D. psychrophila

3234

316

305

242

1

62

38

79.3

76.6

77.9

A. fumigatus

9923

671

619

563

1

55

50

91.0

83.9

87.3

Average on non-model proteomes

81.8

81.1

81.4

Average on all testing proteomes

81.7

81.8

81.8

  1. The proteome of the yeast was used for training and the ten non-yeast proteomes were used for testing at the e-value threshold of 0.1. TransportDB was chosen for the benchmark transporter database. The "matches" column represents the number of proteins predicted by TransportTP and curated by TransportDB with the same TC family or superfamily (the third taxonomic level). The "mismatches" column corresponds to the number of proteins predicted as transporters by both methods but with conflicting TC classification. The column of "TransportDB unique" is the number of proteins annotated by TransportDB but absent in the predictions of TransportTP. The number of "text mining validated" corresponds to the number of proteins not annotated by TransportDB but predicted by TransportTP and validated by our text mining program through the functional annotations together with protein sequences.