Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Using the nucleotide substitution rate matrix to detect horizontal gene transfer

Figure 2

Relationship between (a) uncentered SVD and (b) centered SVD. A set of related sequences is used to determine a set of rate matrices, Q i . If the sequences were generated by mutation with the same rate matrix for all species throughout time, then the rate matrices are all scalar multiples of each other. In this case, the singular-value decomposition (SVD) of the set of Q i will show that a single vector (corresponing to the time axis) explains most of the variation in the set of rate matrices. Both USVD (the SVD is performed on the Q i directly) and CSVD (the SVD is performed on the covariance matrix of the Q i ) will find the same dominant axis. If the rate matrix is different for different species or time-varying, no single vector will explain most of the variation in the set of rate matrices. In other words, USVD will not find a single best axis. However, for the case of precisely two rate matrices, CSVD, but not USVD, will still find a single, non-time axis that explains much of the variation in the rate matrices (in the example shown, r2 = 0.46 for the best-fit line through the mixed points on the right panel). This occurs because a single vector explains much of the variance in the set of rate matrices, but this vector does not correspond to a time axis, and hence cannot be found in USVD. Data shown are for a simplified model of rate matrices with a two-character alphabet a and b instead of the four-character alphabet used in DNA. In this simplified alphabet, Q has only two non-negative elements representing ra→band rb→a. These two elements are plotted on the x and y axes of each graph. Data shown are for 16-taxon trees evolving according to the single rate matrix ra→b= 0.9, rb→a= 0.1 (blue points), ra→b= 0.2, rb→a= 0.8 (green points), or an equal mixture of both (red points).

Back to article page