Skip to main content

Table 2 Details of crafting different window sizes on the training set

From: Machine learning-based approaches for ubiquitination site prediction in human proteins

Window size

Samples with duplicate sequences and labels (Type 1)

Samples with duplicate sequences (Type 2)

Samples after removing types 1 and 2

All

Positive samples

Negative samples

All

Positive samples

Negative samples

All

Positive samples

Negative samples

5

143,420

3728

139,692

154,835

12,125

142,710

84,861

13,122

71,739

7

6556

213

6343

7150

517

6633

191,493

15,066

176,427

9

2619

134

2485

2728

191

2537

193,891

15,106

178,785

15

1582

87

1495

1623

109

1514

194,544

15,133

179,411

21

1336

72

1264

1361

85

1276

194,699

15,141

179,558

27

1212

60

1152

1233

71

1162

194,776

15,147

179,629

33

1152

54

1098

1136

59

1077

194,829

15,151

179,678

45

1025

48

977

1034

53

981

194,888

15,153

179,735

55

969

44

925

978

49

929

194,920

15,155

179,765

77

854

37

817

863

42

821

194,992

15,159

179,833

99

771

34

737

776

39

737

195,047

15,161

179,886

  1. Two types of duplication appeared in the data: Type1: identical sequences and labels; Type 2: identical sequences but different labels