Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains
- Paula Tataru^{1}Email author and
- Asger Hobolth^{1}
DOI: 10.1186/1471-2105-12-465
© Tataru and Hobolth; licensee BioMed Central Ltd. 2011
Received: 1 July 2011
Accepted: 5 December 2011
Published: 5 December 2011
Abstract
Background
Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present.
Results
We describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at http://www.birc.au.dk/~paula/.
Conclusions
We use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.
Background
In this paper we consider the problem of calculating the expected time spent in a state and the expected number of jumps between any two states in discretely observed continuous time Markov chains (CTMCs). The case where the CTMC is only recorded at discretely observed time points arises in molecular evolution where DNA sequence data is extracted at present day and past evolutionary events are missing. In this situation, efficient methods for calculating these types of expectations are needed. In particular, two classes of applications can be identified.
The first class of applications is concerned with rate matrix estimation. [1] describes how the expectation-maximization (EM) algorithm can be applied to estimate the rate matrix from DNA sequence data observed in the leaves of an evolutionary tree. The EM algorithm is implemented in the software XRate [2] and has been applied in [3] for estimating empirical codon rate matrices. [1] uses the eigenvalue decomposition of the rate matrix to calculate the expected time spent in a state and the expected number of jumps between states.
The second class of applications is concerned with understanding and testing various aspects of evolutionary trajectories. In [4] it is emphasized that analytical results for jump numbers are superior to simulation approaches and various applications of jump number statistics are provided, including a test for the hypothesis that a trait changed its state no more than once in its evolutionary history and a diagnostic tool to measure discrepancies between the data and the model. [4] assumes that the rate matrix is diagonalizable and that the eigenvalues are real, and applies a spectral representation of the transition probability matrix to obtain the expected number of state changes.
[5] and [6] describe a method, termed substitution mapping, for detecting coevolution of evolutionary traits, and a similar method is described in [7]. The substitution mapping method is based on the expected number of substitutions while [7] base their statistics on the expected time spent in a state. Furthermore [7] describes an application concerned with mapping synonymous and non-synonymous mutations on branches of a phylogenetic tree and employs the expected number of changes between any two states for this purpose. [8] uses the expected number of state changes to calculate certain labeled evolutionary distances. A labeled evolutionary distance could for example be the number of state changes from or to a specific nucleotide. In [9] substitution mapping is invoked for identifying biochemically constrained sites. In [7] and [8] the summary statistics are calculated using the eigenvalue decomposition method suggested by [1]. In [5, 6] and [9] the substitution mapping is achieved using a more direct formula for calculating the number of state changes. In this direct approach an infinite sum must be truncated and it is difficult to control the error associated with the truncation. An alternative is described in [10] where uniformization is applied to obtain the expected number of jumps. [10] uses the expected number of jumps on a branch to detect lineages in a phylogenetic tree that are under selection.
A third algorithm for obtaining the number of changes or time spent in a state is outlined in [11]. The algorithm is based on [12] where a method for calculating integrals of matrix exponentials is described. A natural question arises: which of the three methods (eigenvalue decomposition, uniformization or matrix exponentiation) for calculating conditional expectations of summary statistics for a discretely observed CTMC should be preferred? The aim of this paper is to provide an answer to this question. We describe and compare the three methods. Our implementations in R [13] are available at http://www.birc.au.dk/~paula/. (Furthermore the eigenvalue decomposition and uniformization methods are also available as a C++ class in the bio++ library at http://biopp.univ-montp2.fr/.) The performance and discussion of the algorithms are centered around two applications. The first application is concerned with rate matrix estimation; we estimate the Goldman-Yang codon model [14] using the expectation-maximization algorithm. The second application is based on the labeled distance estimation presented in [8].
We describe, compare and discuss three methods for calculating Σ(C; t). The evaluation of the integrals (4) takes O(n^{3}) time and therefore a naive calculation, assuming that C contains just one entry different from zero has a O(n^{5}) running time. Even worse, if C contains O(n^{2}) entries different from zero, then the naive implementation has a O(n^{7}) running time. For all three methods our implementations of Σ(C; t) run in O(n^{3}) time.
Results
Applications
Application 1: Rate matrix estimation
where π is the stationary distribution, κ is the transition/transversion rate ratio, ω is the non-synonymous/synonymous ratio and α is a scaling factor. The stationary distribution π is determined directly from the data using the codon frequencies. We estimate the remaining parameters θ = (α, κ, ω) using the expectation-maximization (EM) algorithm [16] as described below.
A similar notation applies for L_{s,tv}, L_{ns,ts}, L_{ns},_{tv}, N_{ns} and N, where the last statistic is the sum of substitutions between all states (i, j) that differ at one position and s, ns, ts and tv subscripts stand for synonymous, non-synonymous, transition and transversion.
where a = -L_{ns,tv}L_{ns,ts}N_{s}, b = L_{ns,tv}L_{s,ts}(N_{ns} - N_{tv}) + L_{ns,ts}L_{s,tv}(N_{ns} - N_{ts}) and c = L_{s,tv}L_{s,ts}N_{ns}.
In reality the data y is only available in the leaves and the times and types of substitutions in all sites and all branches of the tree are unaccessible. The EM algorithm is an efficient tool for maximum likelihood estimation in problems where the complete data log likelihood is analytically tractable but full information about the data is missing.
conditional on the data y and the current estimate of the parameters θ_{0} is calculated. In the M-step the parameters are updated by maximizing G(θ; θ_{0},y). The parameters converge to a local maximum of the likelihood for the observed data.
The E-step thus consists of calculating conditional expectations of linear combinations of times such as $E\left[{L}_{\mathsf{\text{s,ts}}}|{t}_{k},a,b\right]$ and substitutions such as $E\left[{N}_{\mathsf{\text{ts}}}|{t}_{k},a,b\right]$ where L_{s,ts} and N_{ts} are given by (8). In our application n = 61 and the first type of statistics $E\left[{L}_{\mathsf{\text{s,ts}}}|t,a,b\right]$ is (up to a factor p_{ ab }(t)) on the form (5) with diagonal entries ${C}_{ii}={\sum}_{j}{\pi}_{j}1\left(\left(i,j\right)\in {\mathcal{L}}_{\mathsf{\text{s,ts}}}\right)$ and all off diagonal entries equal to zero. The second type of statistics $E\left[{N}_{\mathsf{\text{ts}}}|t,a,b\right]$ is also on the form (5) with off-diagonal entries ${C}_{ij}={q}_{ij}1\left(\left(i,j\right)\in {\mathcal{L}}_{\mathsf{\text{ts}}}\right)$ and zeros on the diagonal.
Application 2: Robust distance estimation
The second application is a new approach for estimating labeled evolutionary distance, entitled robust counting and introduced in [8]. The purpose is to calculate a distance that is robust to model misspecification. The method is applied to labeled distances, for example, the synonymous distance between two coding DNA sequences. As it is believed that selection mainly acts at the protein level, synonymous substitutions are neutral and phylogenies built on these type of distances are more likely to reveal the true evolutionary history. The distance is calculated using the mean numbers of labeled substitutions conditioned on pairwise site patterns averaged over the empirical distribution of site patterns observed in the data. In the conventional method the average is done over the theoretical distribution of site patterns. The robustness is therefore achieved through the usage of more information from the data and less from the model.
Let Q be the rate matrix of the assumed model, P(t) = e^{ Qt }, the labeling be given through a set of pairs ℒ and the data be represented by a pairwise alignment y = (y_{1}, y_{2}) of length m. As data only contains information about the product Qt, where t is the time distance between the sequences, we can set t = 1.
Therefore this application requires evaluating a sum on the form (5) with off-diagonal entries ${C}_{ij}={q}_{ij}1\left(\left(i,j\right)\in \mathcal{L}\right)$ and zeros on the diagonal.
Algorithms
The calculation of Σ(C; t) is based on the integrals ${I}_{cd}^{ab}\left(t\right)$. In this section we present three existing methods for obtaining the integrals and extend them to obtain Σ(C; t).
Eigenvalue decomposition (EVD)
Because Λ is diagonal, e^{Λt}is also diagonal with ${\left({e}^{\Lambda t}\right)}_{ii}={e}^{{\lambda}_{i}t}$.
where ○ represents the entry-wise product.
The eigenvalues and eigenvectors might be complex, but they come in complex conjugate pairs and the final result is always real; for more information we refer to the Supplementary Information in [2]. If the CTMC is reversible, the decomposition can be done on a symmetric matrix obtained from Q (e.g. [15]), which is faster and tends to be more robust. Let π be the stationary distribution. Due to reversibility, π_{ a }q_{ ab }= π_{ b }q_{ ba }, which can be written as ΠQ = Q* Π where Π = diag(π). Let S = Π^{1/2}Q Π^{-1/2}.
where S* is the transpose of S. Then S is symmetric. Let Λ, V be its eigenvalues and eigenvectors, respectively. Then V ΛV^{-1} = S = Π^{1/2}Q Π^{-1/2}, which implies Q = (Π^{-1/2}V)Λ(V^{-1}Π^{1/2}) and it follows that Q has the same eigenvalues as S and Π^{-1/2}V for eigenvectors.
The results can be summarized in the following algorithm:
Algorithm 1: EVD
Input: Q, C, t
Output: Σ(C; t)
Step 1: Determine eigenvalues λ_{ i }.
Determine the eigenvectors U_{ i }for Q and compute U^{-1}.
Step 2: Determine matrix J(t) from (17).
Step 3: Determine matrix Σ(C;t) from (18).
Uniformization (UNI)
The uniformization method was first introduced in [19] for computing the matrix exponential P(t) = e^{ Qt }. In [11] it was shown how this method can be used for calculating summary statistics, even for statistics that cannot be written in integral form. Let μ = max_{ i }(q_{ i }) and $R=\frac{1}{\mu}Q+I$, where I is the identity matrix.
where Φ(·) is the cumulative distribution function for the standard normal distribution. Consequently we can approximate the truncation point s(λ) with $\sqrt{\lambda}{\Phi}^{-1}\left(1-b\right)+\lambda $. If b = 10^{-8} we obtain Φ^{-1} (1 - b) = 5.6.
The linear regression provides an excellent fit to the tail of the distribution.
In summary we have the following algorithm:
Algorithm 2: UNI
Input: Q, C, t
Output: Σ(C; t)
Step 1: Determine μ, s(μt) and R.
Step 2: Calculate R^{ m }for 2 ≤ m ≤ s(μt).
Step 3: Calculate $A\left(m\right)={\sum}_{l=0}^{m}{R}^{l}C{R}^{m-l}$ for 0 ≤ m ≤ s(μt).
using the recursion A(m + 1) = A(m)R + R^{m+1}C.
Step 4: Determine Σ(C; t) from (22).
Exponentiation (EXPM)
where ${1}_{\left\{\left(c,d\right)\right\}}$ is a matrix with 1 in entry (c, d) and zero otherwise. We can use this method to determine ${I}_{cd}^{ab}\left(t\right)$ by simply setting $B={1}_{\left\{\left(c,d\right)\right\}}$, construct the auxiliary matrix A, calculate the matrix exponential of At, and finally read off the integral in entry (a, b) in the upper right corner of the matrix exponential.
Therefore by setting B = C in the auxiliary matrix we obtain Σ(C;t).
The EXPM algorithm is as follows:
Algorithm 3: EXPM
Input: Q, C, t
Output: Σ(C; t)
Step 1: Construct $A=\left[\begin{array}{cc}\hfill Q\hfill & \hfill C\hfill \\ \hfill 0\hfill & \hfill Q\hfill \end{array}\right]$.
Step 2: Calculate the matrix exponential e^{ At }.
Step 3: Σ(C; t) is the upper right corner of the matrix exponential.
Testing
We implemented the presented algorithms in R and tested them with respect to accuracy and speed.
Accuracy
From this, using the symbolic operations in Matlab [20], we obtained the final analytic expression for (4). Using this model we compared for all three methods the true value of (5) for various values of t and randomly generated binary matrices C. For each t we generated 5 different matrices C. The average normalized deviation is shown in Figure 2.
In both cases, all methods showed good accuracy as the normalized deviation was no bigger than 3 × 10^{-9}. We also note that EXPM tended to be the most precise while UNI provided the worst approximation. To further investigate the accuracy, we performed calculations on randomly generated reversible rate matrices: we first obtained the stationary distribution from the Dirichlet distribution with shape parameters equal to 1, then all entries q_{ ij }with i ≥ j from the exponential distribution with parameter 1 and finally calculated the remaining entries using the reversibility property. In all the runs the relative difference between EVD, UNI and EXPM was less than 10^{-5}. This indicated that all three methods have a similar performance in a wide range of applications.
Speed
Partition of computation
Assume we need to evaluate Σ(C; t) for a fixed matrix C and multiple time points t ∈ {t_{1},...t_{ k }}. In each iteration of the EM-algorithm in Application 1 we need this type of calculation while in order to calculate the labeled distance in Application 2 just one time point is required. Using EVD (Algorithm 1) we do the eigenvalue decomposition (Step 1) once and then, for each time point t_{ i }, we apply Step 2 and Step 3. The eigenvalue decomposition, achieved through the R function eigen, has a running time of O(n^{3}). In Step 2 we determine J(t) and this takes O(n^{2}) time. Step 3 has a running time of O(n^{3}) due to the matrix multiplications.
If instead we apply UNI (Algorithm 2), we run Steps 1-3 for the largest time point max(t_{ i }) and then, for each time point t_{ i }, we apply Step 4. Steps 1-3 take O (s(μ max (t_{ i })) n^{3}) time, and Step 4 takes O(s(μt_{ i })n^{2}) time for each i ∈ {1,..., k}. Therefore, even though the total time for both methods is O(n^{3}), the addition of one time point contributes with O(n^{3}) for EVD, but only O(s(μt)n^{2}) for UNI. Recall that the constant s(μt) is the truncation point for the infinite sum in the uniformization method.
In the case of EXPM (Algorithm 3) we need to calculate the matrix exponential at every single time point. We used the expm R package [21] with the Higham08 method. This is a Padé approximation combined with an improved scaling and squaring [22] and balancing [23]. The running time is O(n^{3}).
Running time complexity
Method | EVD | UNI | EXPM |
---|---|---|---|
Precomputation | |||
Steps | 1 | 1-3 | none |
Order | O(n^{3}) | O(s(μt)n^{3}) | |
Main Computation | |||
Steps | 2-3 | 4 | 1-3 |
Order | O(n^{3}) | O(s(μt)n^{2}) | O(n^{3}) |
Experiments
We tested the speed of the algorithms in six experiments based on the presented applications and two more experiments using a non-reversible matrix.
GY
Experimental design
GY | ||||||
---|---|---|---|---|---|---|
Experiment | 2 | 3 | ||||
k | t _{ k } | μt _{ k } | s ( μ _{ tk } ) | t _{ k } | μt _{ k } | s ( μ _{ tk } ) |
1 | 0.0017 | 0.0045 | 5 | 0.1 | 0.2668 | 8 |
2 | 0.0032 | 0.0085 | 5 | 0.2 | 0.5337 | 9 |
3 | 0.0046 | 0.0124 | 5 | 0.3 | 0.8005 | 11 |
4 | 0.0061 | 0.0163 | 5 | 0.4 | 1.0674 | 12 |
5 | 0.0076 | 0.0202 | 5 | 0.5 | 1.3342 | 13 |
6 | 0.0090 | 0.0241 | 5 | 0.6 | 1.6010 | 14 |
7 | 0.0105 | 0.0281 | 6 | 0.7 | 1.8679 | 15 |
8 | 0.0120 | 0.0320 | 6 | 0.8 | 2.1347 | 15 |
9 | 0.0135 | 0.0359 | 6 | 0.9 | 2.4015 | 16 |
10 | 0.0150 | 0.0398 | 6 | 1.0 | 2.6684 | 17 |
GTR | ||||||
Experiment | 5 | 6 | ||||
k | t _{ k } | μt _{ k } | s(μt_{ k }) | t _{ k } | μt _{ k } | s(μt_{ k }) |
1 | 0.1760 | 0.2668 | 8 | 0.1 | 0.1516 | 7 |
2 | 0.3520 | 0.5337 | 9 | 0.6 | 0.9098 | 11 |
3 | 0.5280 | 0.8005 | 11 | 1.1 | 1.6680 | 14 |
4 | 0.7039 | 1.0674 | 12 | 1.6 | 2.4262 | 16 |
5 | 0.8798 | 1.3342 | 13 | 2.1 | 3.1844 | 18 |
6 | 1.0558 | 1.6010 | 14 | 2.6 | 3.9426 | 20 |
7 | 1.2318 | 1.8679 | 15 | 3.1 | 4.7008 | 22 |
8 | 1.4077 | 2.1347 | 15 | 3.6 | 5.4590 | 24 |
9 | 1.5837 | 2.4015 | 16 | 4.1 | 6.2172 | 26 |
10 | 1.7597 | 2.6684 | 17 | 4.6 | 6.9754 | 27 |
UNR | ||||||
Experiment | 7 | 8 | ||||
k | t _{ k } | μt _{ k } | s(μ_{ tk }) | t _{ k } | μt _{ k } | s(μ_{ tk }) |
1 | 0.0379 | 0.1516 | 7 | 0.1 | 0.4 | 9 |
2 | 0.2275 | 0.9098 | 11 | 0.6 | 2.4 | 16 |
3 | 0.4170 | 1.6680 | 14 | 1.1 | 4.4 | 21 |
4 | 0.6066 | 2.4262 | 16 | 1.6 | 6.4 | 26 |
5 | 0.7961 | 3.1844 | 18 | 2.1 | 8.4 | 30 |
6 | 0.9857 | 3.9426 | 20 | 2.6 | 10.4 | 34 |
7 | 1.1752 | 4.7008 | 22 | 3.1 | 12.4 | 38 |
8 | 1.3648 | 5.4590 | 24 | 3.6 | 14.4 | 42 |
9 | 1.5543 | 6.2172 | 26 | 4.1 | 16.4 | 45 |
10 | 1.7439 | 6.9754 | 27 | 4.6 | 18.4 | 49 |
GTR
where r = (0.5, 0.3,0.6, 0.2,0.3, 0.2) and π = (0.2, 0.2,0.3, 0.3). For labeling, we considered the jumps to and from nucleotide A, leading to C_{ ij }= q_{ ij }if i or j represents nucleotide A. For each data set, we estimated the GTR parameters as described in [8] and calculated the robust distance. Experiments 5 and 6 used the same GTR matrix and C_{ ij }= q_{ ij }if i or j represents nucleotide A and zero otherwise, and computed the value of Σ(C;t_{ k }) for 10 equidistant sorted time points t_{ k }with 1 ≤ k ≤ 10 (Table 2).
UNR
Discussion
In order to calculate the integral (4) the expressions (26) and (27) are used. It is evident that this procedure is feasible but also requires much bookkeeping.
(state 4 is an absorbing state). If this process starts in state 1 then the first jump is to state 2 and the second is from state 2 to either state 1 or state 3. This feature can be taken into account by having a so-called adaptive uniformized (AU) jump process where the rate for the first jump is 3ν, for the second is μ + 2ν and, assuming μ + ν > 3ν, the rate for the third jump is μ + ν. From the third jump the rate in the AU jump process is μ + 2ν as in the standard uniformized jump process. The AU jump process has a closed-form expression for the jump probabilities (it is a pure birth process) but is of course more complicated than a Poisson jump process. The advantage is that the AU jump process exhibits fewer jumps. This procedure could very well be useful for codon models where the set of states that the process can be in after one or two jumps are limited because only one nucleotide change is allowed in each state change.
In an application concerned with modeling among-site rate variation, [27] applies the uniformization procedure (20) to calculate the transition probabilities instead of the eigenvalue decomposition method (15). [27] shows, in agreement with our results, that uniformization is a faster computational method than eigenvalue decomposition.
where the infinite sum is truncated at k = 10. The problem with this approach is that it is difficult to bound the error introduced by the truncation. In UNI a similar type of calculation applies but the truncation error can be controlled.
Conclusion
Recall that EVD assumes that the rate matrix is diagonalizable and this constraint means that EVD is less general than the other two algorithms. We have shown in the Discussion how a direct calculation of the integral (4) is actually still possible but requires much bookkeeping. On top of being less general, EVD is dependent on the type of the matrix: reversible or non-reversible. We have shown how this discrepancy can make EVD slower than UNI even when the state space has size of only 4.
We found that the presented methods have similar accuracy and EXPM is the most accurate one. With respect to running time, it is not straightforward which method is best. We found that both the eigenvalue decomposition (EVD) and uniformization (UNI) are faster than the matrix exponentiation method (EXPM). The main reason for EVD and UNI being faster is that they can be decomposed into a precomputation and a main computation. The precomputation only depends on the rate matrix for EVD while for UNI it also depends on the largest time point and the matrix C. We also remark that EXPM involves the exponentiation of a matrix double in size. UNI is particularly fast when the product μt is small because in this case only a few terms in the sum (22) are needed.
Declarations
Acknowledgements
We are grateful to Thomas Mailund and Julien Y. Dutheil for very useful discussions on the presentation and implementation of the algorithms. We would also like to thank the anonymous reviewers for constructive comments and suggestions that helped us improve the paper.
Authors’ Affiliations
References
- Holmes I, Rubin GM: An expectation maximization algorithm for training hidden substitution models. J Mol Bio 2002, 317: 753–764. 10.1006/jmbi.2002.5405View Article
- Klosterman PS, Holmes I: XRate: a fast prototyping, training and annotation tool for phylo-grammars. BMC Bioinf 2006, 7: 428. 10.1186/1471-2105-7-428View Article
- Kosiol C, Holmes I, Goldman N: An empirical codon model for protein sequence evolution. Mol Biol Evol 2007, 24: 1464–79. 10.1093/molbev/msm064View ArticlePubMed
- Minin VN, Suchard MA: Counting labeled transitions in continuous-time Markov models of evolution. J Math Biol 2008, 56: 391–412.View ArticlePubMed
- Dutheil J, Pupko T, Jean-Marie A, Galtier N: A Model-Based Approach for Detecting Co-evolving Positions in a Molecule. Mol Biol Evol 2008, 22: 1919–1928.View Article
- Dutheil J, Galtier N: Detecting groups of co-evolving positions in a molecule: a clustering approach. BMC Evol Biol 2007, 7: 242. 10.1186/1471-2148-7-242PubMed CentralView ArticlePubMed
- Minin VN, Suchard MA: Fast, accurate and simulation-free stochastic mapping. Phil Trans R Soc B 2008, 363(1512):3985–3995. 10.1098/rstb.2008.0176PubMed CentralView ArticlePubMed
- O'Brien JD, Minin VN, Suchard MA: Learning to count: robust estimates for labeled distances between molecular sequences. Mol Biol Evol 2009, 26: 801–814. 10.1093/molbev/msp003PubMed CentralView ArticlePubMed
- Dutheil J: Detecting site-specific biochemical constraints through substitution mapping. J Mol Evol 2008, 67: 257–65. 10.1007/s00239-008-9139-8View ArticlePubMed
- Siepel A, Pollard KS, Haussler D: New methods for detecting lineage-specific selection. Proceedings of the 10th International Conference on Research in Computational Molecular Biology (RECOMB) 2006, 190–205.View Article
- Hobolth A, Jensen JL: Summary statistics for end-point conditioned continuous-time Markov chains. J Appl Prob 2011, 48: 1–14. 10.1239/jap/1300198132View Article
- Van Loan CF: Computing integrals involving the matrix exponential. IEEE Transactions on Automatic Control 1978, 23: 395–404. 10.1109/TAC.1978.1101743View Article
- R Development Core Team: R: A Language and Environment for Statistical Computing.[http://www.R-project.org] R Foundation for Statistical Computing
- Goldman N, Yang Z: A Codon-based Model of Nucleotide Substitution for Protein-coding DNA Sequences. Mol Biol Evol 1994, 11: 725–736.PubMed
- Hobolth A, Jensen JL: Statistical Inference in Evolutionary Models of DNA Sequences via the EM Algorithm. Stat App Gen Mol Biol 2005, 4: 18.
- Dempster AP, Laird NM, Rubin DB: Maximum Likelihood from Incomplete Data via the EM Algorithm. J R Statist Soc B 1977, 39: 1–38.
- Yap VB, Speed T: Estimating Substitution Matrices. In Statistical Methods in Mol Evolution. Edited by: Nielsen R. Springer; 2005:420–422.
- Felsenstein J: Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 1981, 17: 368–376. 10.1007/BF01734359View ArticlePubMed
- Jensen A: Markov chains as an aid in the study of Markov processes. Skand Aktuarietidskr 1953, 36: 87–91.
- MATLAB R2010a Natick, Massachusetts: The MathWorks Incorporated;
- Goulet V, et al.: expm: Matrix exponential.[http://CRAN.R-project.org/package=expm]
- Higham J: The Scaling and Squaring Method for the Matrix Exponential Revisited. SIAM Review 2003, 51: 747–764.View Article
- Stadelmann M: Matrixfunktionen. Analyse und Implementierung. In Master thesis. ETH Zurich, Mathematics Department; 2009.
- Lemey P, et al.: Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain. J Virol 2005, 79: 11981–11989. 10.1128/JVI.79.18.11981-11989.2005PubMed CentralView ArticlePubMed
- Tamura K, et al.: MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol Advance Access 2011.
- Van Moorsel APA, Sanders WH: Adaptive uniformization. Stochastic Models 1994, 10: 619–647. 10.1080/15326349408807313View Article
- Mateiu L, Rannala B: Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation. Systematic Biol 2006, 55: 259–269. 10.1080/10635150500541599View Article
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.