Skip to main content

Table 2 Data statistics after using CD-HIT.

From: Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities

Sequence identity

Training data set (6259)

Testing data set (35494)

Ā 

Positive

Negative

Positive

Negative

100% (original)

23949

228441

110695

1217977

90%

21621

196808

38739

325640

80%

21165

179691

36647

284713

70%

20709

165560

35165

255134

60%

18588

115296

29810

162044

50%

10216

34428

14210

41700

40%

2658

5532

3267

6214