Skip to main content

Table 1 Data statistics of positive and negative training data.

From: A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs

Data resource

Residue

Number of O-GlcNAcylated sites (Positive data)

Number of non-O-GlcNAcylated sites (Negative data)

Number of non-O-GlcNAcylated sites (Balanced negative data)

dbOGAP

Serine

250

18,570

-

Ā 

Threonine

142

11,240

-

OGlycBase

Serine

24

1,013

-

Ā 

Threonine

24

694

-

UniProtKB

Serine

66

4,851

-

Ā 

Threonine

51

3,255

-

Non-redundant dataset

Serine

261

17,381

261

Ā 

Threonine

149

10,587

149

Ā 

Combined

410

27,968

410