Skip to main content

Table 5 Data statistics of training and testing datasets after the removal of homologous sequences using CD-HIT program

From: iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features

Sequence identity cut-off

Number of phosphoglycerylation sites

Number of non-phosphoglycerylation sites

Raw data

150

3997

90%

107

3031

80%

104

2610

70%

98

2319

60%

96

2040

50%

93

1845

40%

89

1318

Training data

89

178

Independent testing data

37

74