Parameterizing sequence alignment with an explicit evolutionary model

Background Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not based on any explicitly time-dependent evolutionary model. Using one fixed score system (BLOSUM62 with some gap open/extend costs, for example) corresponds to making an unrealistic assumption that all sequence relationships have diverged by the same time. Adoption of explicit time-dependent evolutionary models for scoring insertions and deletions in sequence alignments has been hindered by algorithmic complexity and technical difficulty. Results We identify and implement several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment. Assuming an affine gap cost imposes important restrictions on the realism of the evolutionary models compatible with it, as single insertion events with geometrically distributed lengths do not result in geometrically distributed insert lengths at finite times. Nevertheless, we identify one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary models compatible with standard profile-based alignment. We test different aspects of the performance of these “optimized branch length” models, including alignment accuracy and homology coverage (discrimination of residues in a homologous region from nonhomologous flanking residues). We test on benchmarks of both global homologies (full length sequence homologs) and local homologies (homologous subsequences embedded in nonhomologous sequence). Conclusions Contrary to our expectations, we find that for global homologies a single long branch parameterization suffices both for distant and close homologous relationships. In contrast, we do see an advantage in using explicit evolutionary models for local homologies. Optimal branch parameterization reduces a known artifact called “homologous overextension”, in which local alignments erroneously extend through flanking nonhomologous residues. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0832-5) contains supplementary material, which is available to authorized users.


Evolutionary models compatible with standard affine gap cost
Obtaining a probabilistic model of sequence evolution requires finding a time dependent analytic description for the probability distribution of descendant sequences given an ancestral sequence P t (D | A) as a function of a finite number of parameters. The "time" is a continuous variable that reflects in arbitrary units the degree of divergence of a descendant sequence relative to its ancestor (with time zero meaning no divergence, and time infinity meaning evolved sequences are indistinguishable from having been sampled from a time independent stationary distribution of the model). The parameters of the model (usually referred to as rate constants for reasons that will become clear later) are time-independent nonnegative values. A specific continuous-time analytic description of the probability distribution P t (D | A) as a function of its rate parameters is referred to as an evolutionary model, and because it describes the behavior for any arbitrary evolutionary distance (branch length), we refer to it as the macroscopic evolutionary model. The joint probability of an ancestral and descendant sequence P t (A, D) is simply the product of the conditional distribution P t (D | A) times the proposed distribution of ancestral sequences P (A). In the case of pair HMMs, we assume for P (A) an arbitrary time-independent geometric length distribution of parameter p, as is usually the constraint imposed by pair HMMs. For profile HMMs, the profile consensus plays the role of an ancestral sequence of fixed length, and we calculate the conditional probability P t (D | A) of the (descendant) sequence given the profile (ancestor).
In order to obtain the macroscopic model, we propose for P t (D | A) a suite of evolutionary events allowed at infinitesimal times. The parameters of these instantaneous events that define the behavior at very small times are referred to as the rate constants. We refer to the set of events allowed at infinitesimal time as the microscopic component of the evolutionary model.
Traditionally, the microscopic model is proposed first, from which the macroscopic model follows by solving the corresponding differential equations. Here, we take the opposite approach, as we try to identify the most realistic microscopic models compatible with the macroscopic architecture of a standard affine pair or profile HMM. Unfortunately, it is not always possible to associate a reasonable set of microscopic events with a desired macroscopic architecture.
An immediate consequence of requiring a map to a standard profile HMM is that we should be able to describe separately the fate of each ancestral residue. Profiles force a "memory" of ancestral residues, because profile match states are ancestral residues. For example, M → (I) n → D → (I) m → M is two inserts, not one. The parameters controlling the (I) n insert are different from those controlling the (I) m insert, regardless of whether the intermediate ancestral position is present or not. Thus, we can break down the problem by expressing the probability P t (D | A) as a product of each individual ancestral residue where {d} i is the collection of residues in the descendant sequence that are associated to an given ancestral residue a i .
Another consequence of this ancestral memory is that the fate of an insert becomes independent of whether the associated ancestral position is alive or dead at a given time. Thus, we can write the probability of a given ancestral residue a as a function of a handful of elementary probability functions: γ(t) (or γ t for short) is the probability that an ancestral residue dies, and P n (t) (or P t n ) is the probability of having an insert of length n after an ancestral residue regarding of whether the ancestral residue is dead or alive.
For a macroscopic evolutionary model with ancestral memory, the conditional probability that an ancestral residue a has survived at time t (and mutated to residue d), while producing an associated insert of length n ≥ 0 is given by Here, P t (d | a) corresponds to any arbitrary substitution matrix. q I is the emission distribution for inserted residues, which we assume to be time independent. The distribution q I does not need to be same as that of ancestral residues (π) , nor do they need to match the saturation probabilities of the substitution matrix.
Consequently, the conditional probability that an ancestral residue has died at time t leaving an insert of length n ≥ 0 is given by This macroscopic model admits a generalization of γ(t) in which we can distinguish whether the ancestral residue occurs after another surviving residue (which we designate with M), after an ancestral residue that has disappeared (designated with D), after some residues have been inserted (I) or before any ancestral residue (B). All results presented here for γ(t) generalize to γ {B,M,D,I} (t) as we will point out in different places. This generalization corresponds to transition probabilities from M (match), D (delete), I (insert) and B (begin) states of a profile or pair HMM.
In the type of Markov models that are typically used to describe sequence homology, the probability of an insert P n (t) admits only one of two forms: linear or affine. A linear gap cost assumption corresponds to a particular form for P n (t) given by where β t is a probability function, that corresponds to the cost of any inserted residue.
An affine gap cost assumption corresponds to the particular form for P t n , where β t and η t are both probability functions. The probability of the first residue in an insert given by β t (1 − η t ) is different from that of any other subsequent inserted residues, which is given by η t . This is the case of interest to us because it has been adopted by most popular sequence alignment tools. An affine gap cost model fits under a three state HMM (see Figure 2). A linear model is a particular case of the affine model for β t = η t , and it can be described minimally with a one state HMM.
Next we investigate plausible microscopic evolutionary models and whether their macroscopic solution can fit into a standard (linear or affine) gap cost.

The microscopic Geometric (GM) model
We thought at first that a microscopic model that would allow single insertion and deletion events of geometrically distributed residues would result in a macroscopic model with geometrically distributed inserts (hence an affine cost). To that effect, we proposed the Geometric (GM) model which allows an arbitrary number of residues to appear or disappear in one single event according to geometric distributions.
The evolutionary events allowed by the microscopic GM model are, infinitesimal time events for inserted residues Addition of x residues into an existing insert (5) Deletion of x residues from an existing insert provided that the insert still exists after the deletion The rates µ, λ, µ I , λ I are positive constants, and s I , s D , v D , v I are constant Bernoulli probability parameters.
The microscopic GM model distinguishes whether the added residues "start" a new insert (with rate λ) or simply expand an existing one (with rate λ I ). It also distinguishes whether we are shrinking an insert (with rate µ I ) or removing an insert all together (with rate µ). Notice that residues inserted in one single event do not have any memory of how they were created, and can be deleted at a later time independently from each other (that is, this is not a fragment model). The only memory that an inserted residue retains is that of the ancestral position it belongs to (the ancestral memory effect).
Regarding the fate of ancestral residues, the microscopic GM model is the simplest possible one, as we suspect it is the only one compatible with standard pair and profile HMMs, where ancestral residues are deleted individually. Thus, we propose that at infinitesimal times ancestral residues can be deleted one at a time with positive rate µ A , infinitesimal time events for ancestral residues µ A δt Deletion of an ancestral residue (6) Affine solution for deletions The differential equation for the elementary probability function γ(t) which corresponds to the probability at time t that an ancestral residue has died is given by Fate of ancestral residues differential equatioṅ with the initialization condition That is, an alive residue at macroscopic time t, which has probability (1 − γ t ), can be deleted after an additional infinitesimal time δt with probability µ A δt.
The solution for the probability controlling the fate of an ancestral residue is given by Fate of ancestral residues macroscopic solution In the early work of Bishop and Friday [14], 1 − γ t = e −µt was coined the "DNA reliability" function, and µ was referred to as the "chance failure" rate. This function also appears in the TKF91 and TKF92 models.
A generalization can be added relative to Bishop's DNA reliability still consistent with a standard HMM by considering that ancestral residues could be deleted with different rates depending on their context. For the general solution, we only need to introduce different rates µ X A depending on whether the residue being deleted appears after a preceding "X" state which could be: a begin "B", Match "M", Delete "D" or Insert "I" state. This generalization allows us for instance to choose µ M A < µ D A , then the cost of initiating an ancestral deletion is higher than that of extending one, thus making the model affine for ancestral residues. We anticipate that more realistic treatment of ancestral deletions will fall outside the realm of standard HMM formulations.

Geometrically distributed insertions do not produce geometrically distributed inserts
The proof for this statement is given later in the Section "The General (GM) model does not have an affine macroscopic solution". In that section, we first obtain the differential equations that a finite-time solution of the GM model should satisfy, and then we show that a solution with a geometric form cannot satisfy those differential equations. In fact, we do not know if a closed-form analytic (non-geometric) solution exists for the GM model.
Here, we test by numerical integration the macroscopic of the GM model, and we present examples of its non-geometric macroscopic behavior. Results are provided in Figure 1. Figure 1A shows that for arbitrary values of the parameters geometrically distributed insertions do not produce geometrically distributed macroscopic inserts. Thus, the macroscopic version of the GM model does not correspond to a standard three-state HMM generally used in sequence alignment.
This result holds even for the simplified case (described in Figure 1B) where the microscopic parameters that control the addition/removal of whole inserts are the same as those controlling the addition/removal of residues to an existing inserts, that is, The result also holds when we give up geometric distributions for insertions: s I = s D = v I = v D = 0 ( Figure 1C). The simple generalization that the rate of adding/removing the first/last residue that creates/destroys an insert is different from that of adding/removing one residue to an existing insert (λ = λ I and µ = µ I ) does not produced geometrically distributed inserts.
Only when both simplifications are taken together ( Figure 1D), do we obtain geometrically distributed inserts. The model under those simplifications can be solved analytically, but it is linear (not affine) for insertions, as we describe next.

The AALI model: a particular solution affine for deletions
The AALI model is defined by the restriction that inserted residues are instantaneously added and removed one at the time, that is as well as by the condition that the insertion/deletion rates are identical λ I = λ, µ I = µ.
These differential equations consist of two types of terms: the left-hand side (positive) terms correspond to actual changes in the number of inserted residues from something else to n residues. The right-hand side (negative) terms correspond to changes that keep the number of inserted residues to be n. This is like how in a substitution rate matrix, the off-diagonal terms correspond to actual changes from one residue to a different one, while the negative terms correspond to the rate at which a residue remains unchanged. These terms play the role of an infinite-dimensioned rate matrix for the changes in the length of the inserted residues associated with a given ancestral residue.
The two off-diagonal terms that contribute after an infinitesimal δt to produce the end result of an insert of length n are: (a) An insert of length n + 1 could have lost one residue in one event with rate µ. This could have happened for any of the residues in the n + 1 long insert.
(b) An insert of length n − 1 could have added one residue in one event with rate λ. This could have happened at n − 1 + 1 positions (including before the first residue in the insert) in the n − 1 long insert.
Because ∞ n=0 P n (t) = 1 for all times, then it follows that ∞ n=0Ṗ n (t) = 0 also for all times. That is, as in a substitution rate matrix all changes for a given residue add up to zero, here, all changes for a given insert size also sum to zero.
To solve the AALI model, we propose a linear ansatz of the form, Using the following equalities (derived from the above form), the differential equation becomes AALI model differential equations for linear ansatż with the initialization condition β LI (t = 0) = 0.
This equation can be rewritten aṡ Because the previous equation has to be true for all values of n, it requireṡ The solution for this equation for the above initialization condition is The macroscopic AALI model where A state machine representation of the AALI model is given in Figure 2. The AALI model is linear for insertions, but (technically) affine for deletions in the sense that the fate of ancestral residues is different depending on which state we were at previous to that ancestor.

The Linear (LI) model
When all rates of deleting ancestral residues are identical (µ , we obtain the linear (LI) model. The LI model fits a standard one-state HMM ( Figure S1). The LI model admits a particular solution that is reversible, the Linear Reversible (LR) model. We discuss this simpler model (and its connection to TKF91) next.

The Linear Reversible (LR) model: a particular case of the LI model
The LI model is in general not reversible for arbitrary values of the parameters. Reversibility is an unrealistic but convenient property of evolutionary models because then phylogenetic trees do not need to be rooted, and in algorithms that establish a comparison between two extant sequences, either one of the sequences can be interpreted as the ancestor of the other one, which simplifies some algorithms. There is one particular combination of parameters that produce a reversible model (the LR model), which we derive in this section.
Starting from the LI model, reversibility requires that for the joint pair HMM in Figure S1(B), the role of the insert state "I" and the delete state "D" are interchangeable. That is, for the LI model to be reversible the transition probabilities of the joint pair HMM have to satisfy the conditions, This condition is satisfied (assuming q I = π) by imposing the constraint, That is, under a reversible model, inserting a residue, P t − a = β t LR π(a), is equivalent to creating and deleting an ancestral residue, P t One can see using Eqs. (16,18), that this condition is satisfied for all divergence times if we select rate parameters satisfying, and we assume that the geometric probability to generate ancestral residues instead of being a free parameter as in the LR model is given by Since the equivalence needs to be true for all values of t, in particular for t = ∞, then we have which determines that the reversible model needs to satisfy λ < µ, and the reversible stationary distribution should be Then, the reversibility condition in Eq. (20) becomes That is, which is true for all times under the condition µ − λ = µ A .
Thus, the LR model is given by the particular values The LR model where The linear state machine representation of the more general LI model (and its particular case the LR model) is given in Figure S1.
Comparison of the LR model to the TKF91 model TKF91 is a necessary reference to any probabilistic model of indel evolution [15]. The LR model is the natural model to compare to TKF91, since they are both reversible. Before we do such a comparison, it is convenient to re-tell TKF91 using a formalism as close as possible to the one we have used to describe the LR model.

The microscopic TKF91 model
The TKF91 microscopic model is defined by, Under TKF91, ancestral residues can be deleted instantaneously and independently one at a time. Any inserted residue can also disappear instantaneously with the same rate as ancestral residues. This is one difference with the LR microscopic model which assumes that both deletion rates could be different. In non-position-specific models, TKF91's assumption of one single deletion rate µ makes sense since all residues are equivalent. In position-specific profile models, all deletion rates are going to be different anyway, so it is natural to allow the possibility that the deletion rates of ancestral versus inserted residues could be different as well.
Under TKF91's microscopic model, new insertions of one single residue can occur instantaneously associated either with an existing insertion or with a surviving ancestral residue (plus one more case before any ancestral residue). That is, if an ancestral residue is already dead at time t, no residue can appear associated with it (to its right by convention). Assuming that new residues can only appear associated to currently surviving residues is a reasonable assumption, and a crucial point that differentiates our LR model from TKF91. The LR model makes the assumption that an insert can occur after any ancestral position, regardless or whether the corresponding ancestral residue is still alive or not. This assumption is reasonable within position-specific profile models, where the ancestral sequence is represented by the model consensus (collection of Match states) and those are always present. TKF91's more realistic infinitesimal-time set of events is in turn responsible for the fact that there is a fundamental distinction between ancestral residues that die without any associated insert, versus those that die leaving at least one inserted residue (here we name them the "d 0 " and "d 1 " ancestral residues respectively). A d 1 ancestral residue can become a d 0 ancestor, but not the other way around. This distinction between dead ancestral residues is responsible for TKF91 not being exactly a linear gap cost model, which we discuss below.
All differences between our LR model and TKF91 can be traced back to the differences in their microscopic models.

The macroscopic TKF91 model
The macroscopic TKF91 model was originally defined by these probability functions [15]: p t n The probability that an ancestral residue survives with n − 1 inserted residues associated to it.
q t n The probability of an ancestral residue disappearing with n insertions associated to it (named p n in the original TKF91 paper [15]).
r t n The probability of n residues being generated before any ancestral residue (named p n in the original paper).
These probabilities can be all expressed in terms of four elementary probability functions p t 1 The probability of a surviving ancestral residue.
q t 0 The probability of an ancestral residue that disappears with no associated insertion.
The probability of an ancestral deletion accompanied by one residue insertion.
[β TKF t ] The probability of any additional insertion. That is, the probability of any inserted residue that is not associated with opening an insert after an ancestral deletion.
Not all four elementary probabilities are independent variables (in particular q t 0 ends up being proportional to β TKF t ), but it is convenient to express them separately. The relationships that TKF91 establishes are The time dependencies of the elementary functions are TKF91 also introduces a geometric Bernoulli parameter p to generate ancestral residues, and a residue emission distribution π(a) which is the same for ancestral residues and inserted residues. This residue distribution is usually assumed to be the equilibrium distribution of the substitution matrix used P t (a | b) which we leave unidentified as it is a constant variable in the comparison of the evolutionary models for insertions/deletions. Reversibility imposes that p = λ/µ. We introduce the notation p TKF (a) = λ µ π(a) as the probability of generating ancestral residue 'a'.

TKF91 is not a linear gap cost model
The TKF91 model cannot be parameterized as a one-state linear HMM. The reason is that one needs to distinguish whether a deleted ancestral residue is followed by inserted residues (a D1 deletion) or not (a D0 deletion). The minimal HMM compatible with the TKF91 model has a S 1 state which allows insertions, and a S 0 state which forbids insertions. We represent simultaneously the pair and conditional versions of the HMM. In Figure S2(A), we present a minimal two-state HMM to represent the TKF91 model.
A point of confusion about TKF91 not being represented by a one-state HMM has arisen from the fact that there is a dynamic programming algorithm for TKF91 that does have a one-state Forward recursion representation [16]. However, it is important to notice that this one-state Forward recursion is not a one-state HMM. The transition parameters of this one-state recursion are not probabilities and they do not normalize. For instance, it is not possible to do a Viterbi optimal alignment algorithm or to sample evolutionary histories from the one-state recursion. The algorithm depends on a trick of rearranging dynamic programming cells, only valid for the Forward calculation, but not for Viterbi or sampling. We present such one state recursion in Figure S2(B), and provide its derivation next.
From the TKF91 two-state HMM one can write down the dynamic programming (DP) recursion for calculating the total probability of two sequences being related by any evolutionary history, the so called Forward algorithm. The naive Forward recursion uses two DP matrices, one for each state of the HMM in Figure S2(A), as follows The above recursion can be re-written in terms of just one DP matrix [17].
Here we obtain such one-state recursion after introducingŜ which results in, Introducing X(i, j) = S 1 (i, j) +Ŝ 0 (i, j), the recursions forŜ 0 and X arê Notice that one can rewrite S 1 (i, j − 1) using the recursion forŜ 0 as . Finally, the recursion for X has the following form just depending on X itself, Notice that the X(i−1, j −1) to X(i, j) term in this recursion does not correspond to a match between an ancestral and a descendant position. In addition to a match, the term also includes an ancestral deletion followed by an insertion and also subtracts another term that does not correspond to a single evolutionary event.

Reversibility of TKF91 requires rearrangements in the alignments
Reversibility implies that in a given pairwise alignment of two extant sequences, there is a symmetry such that either extant sequence can take the ancestral or descendant sequence role resulting exactly with the same calculation. That is indeed the case of our LR model, but that simple inversion does not produce identical results for TKF91. For the TKF91 model, in addition to the inversion of sequences, one needs to alter the order of inserted residues precisely so that a descendant residue is always to the right of an ancestral one so that a d 1 deletion does not become a d 0 deletion in the reversed case. That is, The probabilities of these alignments according to TKF91 are (omitting residue emissions) It is easy to see that the original and reversed+rearranged alignments have the same probability by noticing that p TKF γ TKF 0t = β TKF t . There is not any other equivalence that would render the expression for the original alignment equal to that of the "reversed" alignment for all values of t. This rearrangement is "evolutionarily" reasonable, but technically requires doctoring alignments so that such requirement is satisfied. For a pairwise alignment, it can be considered a convention to always rearrange to a delete/insert order instead of an insert/delete order. The problem comes with multiple alignment. Our linear LR model avoids such complications all together by avoiding the distinction between d 0 and d 1 deletions.
For the LR model all three alignments have the same probability given by This happens because the LR model satisfies the reversibility condition for a one-state joint pair HMM, that inserting a residue − a is equivalent to creating and deleting an ancestral residue a − , Affine evolutionary models using fragmentsà la TKF92 The AALI model imposes linear insert cost (bn) for an insert of length n. However, our aim is to allow affine cost (a+bn), such that the cost of adding a new insert (a+b) is usually much larger than the cost of adding one residue to an existing insert (b). A BLAST comparison using the typical ( There is a simple way of converting a macroscopic linear model into an affine model. This method was introduced with the conversion of TKF91 into TKF92 [19], and we give an interpretation of it in Figure 3A. It requires modifying a "linear" self-loop state (where the probability of entering the loop is the same as that of extending the loop) by adding another loop inside. One can see that the two loops can be merged together into a one self-looping state similar to the original, but now the loop extension probability is different from that of entering the loop.
For this state-machine manipulation to be compatible with a microscopic evolutionary model, one needs to invoke the concept of "fragments", such that all residues emitted between the "b" and "e" states in Figure 3A form an indivisible unit that can appears and disappears infinitesimally as a block of residues, but that cannot be broken apart by any subsequent evolutionary events. Also, the r parameter of the internal loop has to be time independent.
The TKF92 model [19] is such an "affine fragment" model, created by adding fragments to the TKF91 model [15]. The TKF92 model adds fragments not just to the Insert state, but also to the Match and Delete states [20]. Similarly, affine models can be created from the AALI model by adding fragments. That is the AFG model, described in Figure 5.

The affine fragment reversible (AFR) model compatible with sequence/sequence comparison algorithms
A particular case of the AFG model allows a symmetric description of insertions and deletions. It requires using (in Figure 3B) the linear reversible LR model as the underlying model, and adding fragments using the same geometric parameters for insertions and deletions, r I = r D = r X . This model is named the affine fragment reversible (AFR) model, and it is described in Figure 4.
The insertion/deletion symmetry of the AFR model is due to the following property of the LR model, In detail, AFR affine reversible model compatible with Smith-Waterman In a more compact form, and using X ={I, D} After introducing β LR ∞ = λ λ+µ A , we have AFR affine reversible model compatible with Smith-Waterman The state-machine representation for the AFR model is given in Figure 4.

The affine insert fragment (AIF) model compatible with standard profile HMMs
The microscopic (evolutionary) interpretation of fragments requires that the indivisible fragments can appear and disappear in one single event. Inserted residues are amenable to such evolutionary events. The AIF model is a fragment model that adds fragments to inserted residues only. The AIF model is the only affine-fragment evolutionary model compatible with profile HMMs, since in profile HMMs only the state for inserted residues is self-looping.
Explicitly, the AIF model is given by AIF affine model compatible with a profile HMM The state-machine representation for the AIF model is given in Figure 5.

The General (GM) model does not have an affine macroscopic solution
The differential equations for the GM model The differential equation for the generation of inserts relative to the ancestral sequence we have, General Model (GM) differential equationṡ [c] + µ I (n + 1) with the initialization conditions P 0 (t = 0) = 1, P n (t = 0) = 0, for n > 0.
These differential equations consist of two types of terms: the left-hand side (positive) terms corresponds to actual changes in the number of inserted residues from something else to n residues. The right-hand side (negative terms) corresponds to changes that keep the number of inserted residues to be n. In the same way that in a substitution rate matrix, the off-diagonal terms corresponds to actual changes from one residue to a different one, while the negative diagonal terms correspond to the rate at which a residue remains itself. These terms play the role of an infinite-dimensioned rate matrix for the changes in the length of the inserted residues associated with a given ancestral residue.
The explanation of the different off-diagonal terms that could contribute after an infinitesimal δt time to produce the end result of an insert of length n is as follows: [b]: The whole n-length insert could have appeared infinitesimally with rate λ (1 − s I )s I n−1 .
[c]: An insert of length n + m could have lost instantaneously m consecutive residues with rate µ I (1 − v D )v D m−1 . This could have happened at n + 1 position in the n + m long insert.
[d]: An insert of length n − m (with n − m ≥ 1, otherwise it would not be a pre-existing insert) could have added instantaneously m consecutive residues with rate λ I (1 − v I )v I m−1 . This could have happened at n − m + 1 position (including before the first residue in the insert) in the n − m long insert.
Because ∞ n=0 P n (t) = 1 for all times, then it follows that ∞ n=0Ṗ n (t) = 0, also for all times. That is, as in a substitution rate matrix all changes for a given residue add up to zero, here, all changes for a given insert size, also sum to zero. Next, we show explicitly how that is the case. We also show a matrix formulation of the general solution for the GM model.
The condition ∞ n=0 P n (t) = 1 for all times, requires that ∞ n=0Ṗ n (t) = 0, also for all times. One can see by inspection that this condition on the time-derivatives is satisfied by the above differential equations. Spelling out all the terms for a few of the cases, For clarity, terms that cancel out are displayed in the same line (all terms in line 1 cancel out, all terms in line 2 cancel out...).
The differential equations above can be written in a matrix forṁ where the infinite-dimensioned matrix M is given by, Notice that the sum of the terms in each column is zero.
The differential equation in (52) is conceptually the same differential equation for substitution of residues, but infinite-dimensioned in this case. One can see that the general solution is Using the initialization conditions P t=0 0 = 1 and P t=0 n≥1 = 0, then one has P t n = e tM n0 .
We have transformed the problem of solving the differential equations into exponentiating a matrix.
An affine solution to the GM model?
An affine macroscopic solution would take the form Notice that ∞ n=1 P AFF n (t) = β AFF t = 1 − P AFF 0 (t) as it should be. In order to see if this ansatz form is compatible with the differential equations Eqs. (50,51), consider the particular simpler case in which inserts as well as inserted residues into inserts are added instantaneously one at the time, that is, The differential equations for the GM model in Eqs. (50,51) simplify to Particular model: λ = λ I , µ = µ I , s I = v I = s D = r D = 0 with the initialization conditions P 0 (t = 0) = 1, P n (t = 0) = 0, for n > 0.
Using the following equalities (which follow from the affine ansatz form), we can rewrite the above equations under the affine ansatz as, Affine ansatz for particular case: λ = λ I , µ = µ I , s with the initialization conditions Introducing Eq. (64) into Eq. (65) and using the relationship the system reduces to two differential equationṡ For Eq. (67) to be consistent, and since the solution has to be valid for all values on n, the terms in n and the terms independent of n should give the same results. Unfortunately that is not the case. The terms in n produce a differential equation for η t that is independent of µ and λ, while the equation that comes out from the terms independent of n (as well as the first equation from β AFF t ) both require a dependency of η t on the rates µ and λ for adding/removing whole inserts, There is a simplified model that produces a macroscopic affine model. Consider the special conditions: which correspond to the rather unrealistic assumption that inserts can only be created as a whole, and cannot be deleted. The differential equations in this case become Particular GM affine case µ = λ I = µ I = 0 with the initialization conditions P n (t = 0) = 0, for n ≥ 1.
There is a simple affine solution given by Particular GM affine case µ = λ I = µ I = 0 For other arbitrary values of parameters of the GM model, we have tested by simulation of the microscopic model that the corresponding macroscopic model produce inserts that do not follow a geometric distribution (Figure 1).
An alternative geometric affine (AGA) evolutionary model compatible with profile HMMs We have already described the AIF fragment model as an affine evolutionary model compatible with profile HMMs. Here, we introduce another model, the alternative geometric affine (AGA) evolutionary model, also compatible with profile HMMs. The AIF model is a fragment model. The AGA model uses a different approach, not necessarily more realistic in its microscopic description, in which inserts (the whole collection of inserted residues between two ancestral positions) can appear or disappear as a whole in one single event, but they cannot grow or shrink. In the AIF model, a given insert can grow in one single event by adding a new fragment or shrink by removing an existing fragment. The state machine representation of the macroscopic versions of both models is given in Figure 5.

The AGA microscopic model
The AGA microscopic model allows for an insert of length n to appear with rate (1 − s I )(s I ) n−1 λ in between two ancestral positions, and an insert can be deleted in one single event with rate µ. The single events associated to the fate of an ancestral residue is the same as for the GM  The AGA macroscopic model The differential equations resulting from the above microscopic model are, Alternative Geometric Affine (AGA) model differential equationṡ with the initialization conditions P 0 (t = 0) = 1, P n (t = 0) = 0, for n ≥ 1.
This is a very simple evolutionary process in which the interpretation of the off-diagonal (positive) terms is as follows, (a) An empty insert (P t 0 ) could be the result of an insert of arbitrary length (m) disappearing in one single event with rate µ.
(b) A whole n-length insert (P t n≥1 ) could have appeared in a single event with rate λ(1 − s I )(s I ) n−1 .
The diagonal (negative) terms are such that the condition ∞ n=0Ṗ n (t) = 0 is satisfied at all times. The solution to these equations is given by The AGA model compatible with a profile HMM This model implies that for an ancestral sequence of length l, the expected number of inserts is (l + 1) β t , and the expected length of an insert is 1/(1 − s I ).  Figure S2. (A) Minimal (two-state) HMM representation of the TKF91 evolutionary model. The TKF91 model cannot parameterize a one-state linear HMM. The reason is that it needs to distinguish whether a deleted ancestral residue is followed by inserted residues (a D1 deletion) or not (a D0 deletion). The minimal HMM compatible with the TKF91 model has an S 1 state that allows subsequent insertions, and a S 0 state that forbids subsequent insertions. As in the previous cases, we represent simultaneously the pair and conditional versions of the HMM. We provide the conditions for the transition probabilities under which this two-state HMM is normalized (both in its pair or conditional version). Those normalization conditions are satisfied by the TKF91 macroscopic parameterization given in the figure. (B) Non-probabilistic one-state recursion for the TKF91 Forward algorithm. From the TKF91 two-state HMM, one can write down a dynamic programming (DP) recursion for calculating the total probability of two sequences being related by any evolutionary history, the so-called Forward algorithm. The naive Forward recursion uses two DP matrices, one for each state of the HMM. The Forward recursion for the TKF91 model can re-written in terms of just one DP matrix [17]. This figure describes the one-state Forward DP recursion for TKF91. It is important to notice that this recursion is not a one-state HMM. The transition parameters of this recursion are not probabilities and they do not normalize. The X x i y j transition is not a match between positions x i and y j .