Skip to main content

Table 11 Multi-genome overlaps

From: Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology

  

Sequence similarity

  

100%

99%

98%

Methodology

 

Known

Novel

Known

Novel

Known

Novel

P

Cluster 1

1

4

3

25

5

47

 

Cluster 2

1

0

6

0

10

1

 

Cluster 3

0

0

0

0

0

1

PH

Cluster 1

6

7

10

36

14

56

 

Cluster 2

6

0

19

2

25

3

 

Cluster 3

0

0

0

5

0

6

BLAST

Cluster 1

6

7

9

35

13

54

 

Cluster 2

7

0

20

2

26

3

 

Cluster 3

0

0

0

5

0

6

  1. Using the clustering software CD-HIT, the number of identical proteins (sequences) found in novel and known portions of multiple genomes is listed when thresholding "identical" as each of 100%, 99%, or 98% sequence similarity. The clusters are as follows:
  2. Cluster 1: Human, Chimpanzee, Monkey
  3. Cluster 2: Cattle, Dog, Frog, Rat
  4. Cluster 3: Hedgehog, Armadillo, Opossum, Shrew