Skip to main content

Table 3 The top 20 features selected by correlation coefficient (CC-) and information entropy (IE-) based methods

From: Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs

Site

S

T

Top 20 features

CC

IE

CC

IE

1

ST

PXXS

TXXP a,b

TXXP

2

SXXP

PXP

PT

PT

3

PXXXXS

PS

AXXXP

TT

4

PXXS

SXS

PXT

TXXXT

5

TXXP

SXXP

PP

TXXXXT

6

PXP

ST

TP

TXT

7

PXXXXP

SXXXP

AXP

PXT

8

SXXXP

PXXXP

PXXXXP

TP

9

SXP

SXXXS

TXXXXP

PP

10

TXXXP

PXXXXP

TXA

PXP

11

PS

PXXXXS

PXXXP

TS

12

SXXXT

SXXXXP

SXXXP

TXXXXP

13

TP

TS

SXXT

PXXXXP

14

TXXXXS

SS

TT

SXXT

15

TS

SP

PXXA

PXXXP

16

TXXA

SXP

SA

SXXXXT

17

PXT

PXT

PXA

TXXS

18

PXXXP

TXXXXS

TXXXP

ST

19

PP

TXXP

PXP

TXXXP

20

PXXP

PXXP

AXXXXT

PXXXT

  1. a TXXP represents a 2-spaced amino acid pair of TP, where X stands for any amino acid. The same representation was applied to other k-spaced amino acid pairs. b The k-spaced amino acid pairs in bold type mean they are consistently ranked as the top 20 features by both feature selection methods.