Skip to main content

Table 1 Prediction accuracy of O-glycosylation sites based on different encoding schemesa

From: Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs

Site

Encoding scheme

Feature selection

Sn (%)

Sp (%)

Ac (%)

MCC

S

Binaryb

No selection

74.2 ± 1.7

81.9 ± 3.0

78.0 ± 1.9

0.567 ± 0.039

 

Binaryc

No selection

76.5 ± 3.5

74.6 ± 3.6

75.6 ± 3.1

0.523 ± 0.060

 

CKSAAP

No selection

77.9 ± 1.7

86.5 ± 3.0

82.2 ± 1.8

0.655 ± 0.037

 

CKSAAPc

No selection

79.0 ± 5.2

83.0 ± 2.4

81.0 ± 2.6

0.628 ± 0.050

 

CKSAAP

CC

80.7 ± 3.3

85.6 ± 3.9

83.1 ± 2.8

0.671 ± 0.055

 

CKSAAP

IE

82.1 ± 2.3

83.9 ± 3.8

83.0 ± 2.4

0.665 ± 0.048

T

Binaryb

No selection

74.8 ± 4.1

78.3 ± 1.7

76.6 ± 2.3

0.536 ± 0.045

 

Binaryc

No selection

77.8 ± 3.4

76.6 ± 3.2

77.2 ± 2.4

0.548 ± 0.048

 

CKSAAP

No selection

80.4 ± 2.2

82.3 ± 2.9

81.3 ± 2.3

0.631 ± 0.045

 

CKSAAPc

No selection

80.3 ± 1.9

85.7 ± 1.9

83.0 ± 1.8

0.666 ± 0.038

 

CKSAAP

CC

80.3 ± 1.8

82.5 ± 2.3

81.4 ± 1.3

0.632 ± 0.026

 

CKSAAP

IE

80.8 ± 1.5

81.9 ± 3.1

81.3 ± 2.2

0.631 ± 0.045

  1. a The SVM based prediction algorithm with the RBF kernel function. The CC-based feature selection resulted in the highest accuracy, and the corresponding values of Ac and MCC were represented in bold types. The corresponding measurement was represented as the average value ± standard deviation. b In this encoding scheme, the window size was optimally set as 41. cThe method was trained and tested on new negative site data sets where <40% identity was not required between positive and negative sites.