Skip to main content

Table 2 Data statistics after using CD-HIT.

From: Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities

Sequence identity Training data set (6259) Testing data set (35494)
  Positive Negative Positive Negative
100% (original) 23949 228441 110695 1217977
90% 21621 196808 38739 325640
80% 21165 179691 36647 284713
70% 20709 165560 35165 255134
60% 18588 115296 29810 162044
50% 10216 34428 14210 41700
40% 2658 5532 3267 6214