Skip to main content

Table 1 Statistics used to detect changes in the rate matrix.

From: Using the nucleotide substitution rate matrix to detect horizontal gene transfer

#

Type

Sequences

Normalized

Centered

SV measure

Centered SVD method

Rank

%

1

SVD

2

No

Yes

a

Correlation

44

0.4

2

SVD

2

No

Yes

b

Correlation

40

0.5

3

SVD

2

No

Yes

c

Correlation

12

2.5

4

SVD

2

No

Yes

a

Covariance

41

0.5

5

SVD

2

No

Yes

b

Covariance

42

0.5

6

SVD

2

No

Yes

c

Covariance

37

0.6

7

SVD

2

No

No

a

_

36

0.6

8

SVD

2

No

No

b

_

39

0.5

9

SVD

2

No

No

c

_

35

0.8

10

Mean(d)

2

No

-

-

-

34

0.9

11

Var(d)

2

No

-

-

-

22

1.8

12

SVD

2

Yes

Yes

a

Correlation

23

1.8

13

SVD

2

Yes

Yes

b

Correlation

43

0.5

14

SVD

2

Yes

Yes

c

Correlation

32

1.2

15

SVD

2

Yes

Yes

a

Covariance

14

2.2

16

SVD

2

Yes

Yes

b

Covariance

45

0.3

17

SVD

2

Yes

Yes

c

Covariance

18

2.0

18

SVD

2

Yes

No

a

_

28

1.5

19

SVD

2

Yes

No

b

_

21

1.9

20

SVD

2

Yes

No

c

_

31

1.4

21

Mean(d)

2

Yes

-

-

-

24

1.7

22

Var(d)

2

Yes

-

-

-

33

1.2

23

SVD

3

No

Yes

a

Correlation

7

3.9

24

SVD

3

No

Yes

b

Correlation

26

1.6

25

SVD

3

No

Yes

c

Correlation

20

1.9

26

SVD

3

No

Yes

a

Covariance

6

3.9

27

SVD

3

No

Yes

b

Covariance

19

1.9

28

SVD

3

No

Yes

c

Covariance

2

6.9

29

SVD

3

No

No

a

_

8

3.8

30

SVD

3

No

No

b

_

30

1.5

31

SVD

3

No

No

c

_

13

2.3

32

Mean(d)

3

No

-

-

-

3

6.8

33

Var(d)

3

No

-

-

-

10

3.3

34

SVD

3

Yes

Yes

a

Correlation

5

5.1

35

SVD

3

Yes

Yes

b

Correlation

15

2.2

36

SVD

3

Yes

Yes

c

Correlation

4

5.7

37

SVD

3

Yes

Yes

a

Covariance

1

7.7

38

SVD

3

Yes

Yes

b

Covariance

29

1.5

39

SVD

3

Yes

Yes

c

Covariance

16

2.1

40

SVD

3

Yes

No

a

-

9

3.5

41

SVD

3

Yes

No

b

-

27

1.6

42

SVD

3

Yes

No

c

-

38

0.6

43

Mean(d)

3

Yes

-

-

-

17

2.1

44

Var(d)

3

Yes

-

-

-

25

1.7

45

Var(GC)

-

-

-

-

-

11

3.2

  1. Table of 45 statistics derived from collections of rate matrices. Columns are: #: number of each method, used in the text for reference. Type: type of statistic, either SVD, mean distance of each Q i from the average of all Q i (Mean(d)), variance of the distances of the Q i from the average of all Q i (Var(d)), or variance in GC content (Var(GC)). Sequences: either 2, for pairwise sequence comparisons that assume a time-reversible substitution model, or 3, for three-way sequence comparisons that do not assume a time-reversible model. Normalized: either No, for unnormalized Q i , Yes, for Q i normalized to a trace of 1 (eliminating the contribution of time to the inferred matrix), or N/A, where normalization was not applicable. Centered: Yes, for CSVD, No, for USVD, or N/A, where SVD was not used. SV measure: statistic for characterizing singular values, either a, for ratio of the two largest singular values, or b, for ratio of the largest singular value to the sum of all singular values, or c, for ∑ i ln(1 + σ i ), where the σ i are the singular values (this is the SVD version of the statistic introduced by Weiss et al. [43]), or N/A, where not applicable. CSVD method: Covariance if the covariance matrix was used for CSVD, Correlation if the correlation matrix was used, N/A if not applicable (i.e., if the technique was not centered SVD). Rank: rank of the statistic in contributing to overall classification accuracy using random forests [75]. %: percentage contribution of the statistic to overall classification accuracy using random forests. The statistic used by Devauchelle et al. [42] corresponds to statistic 7; that used by Weiss et al. [43] corresponds to statistic 9.