Skip to main content

Table 11 Multi-genome overlaps

From: Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology

   Sequence similarity
   100% 99% 98%
Methodology   Known Novel Known Novel Known Novel
P Cluster 1 1 4 3 25 5 47
  Cluster 2 1 0 6 0 10 1
  Cluster 3 0 0 0 0 0 1
PH Cluster 1 6 7 10 36 14 56
  Cluster 2 6 0 19 2 25 3
  Cluster 3 0 0 0 5 0 6
BLAST Cluster 1 6 7 9 35 13 54
  Cluster 2 7 0 20 2 26 3
  Cluster 3 0 0 0 5 0 6
  1. Using the clustering software CD-HIT, the number of identical proteins (sequences) found in novel and known portions of multiple genomes is listed when thresholding "identical" as each of 100%, 99%, or 98% sequence similarity. The clusters are as follows:
  2. Cluster 1: Human, Chimpanzee, Monkey
  3. Cluster 2: Cattle, Dog, Frog, Rat
  4. Cluster 3: Hedgehog, Armadillo, Opossum, Shrew