Sequence identity | Training data set (6259) | Testing data set (35494) | ||
---|---|---|---|---|
Ā | Positive | Negative | Positive | Negative |
100% (original) | 23949 | 228441 | 110695 | 1217977 |
90% | 21621 | 196808 | 38739 | 325640 |
80% | 21165 | 179691 | 36647 | 284713 |
70% | 20709 | 165560 | 35165 | 255134 |
60% | 18588 | 115296 | 29810 | 162044 |
50% | 10216 | 34428 | 14210 | 41700 |
40% | 2658 | 5532 | 3267 | 6214 |