Two central problems in computational biology are the determination of the alignment and phylogeny of a set of biological sequences. Current methods first align the sequences, and then infer the phylogeny given this fixed alignment. Several software packages are available that deal with one or both of these sub-problems. For example, ClustalW [1] and T-Coffee [2] are popular sequence alignment packages, while MrBayes [3], PAUP* [4] and Phylip [5] all provide phylogenetic reconstruction and inference. Despite working very well in practice, these methods share some problems. First, the separation into a multiple-alignment step and a phylogenetic inference step, is fundamentally flawed. The two inference problems are mutually dependent, and alignments and phylogeny should ideally be co-estimated, a point first made by Sankoff, Morel and Cedergren [6]. Indeed, a proper weighting of mutation events in multiple sequences requires a tree, which in turn can only be determined if a multiple alignment is available. For instance, ClustalW and T-Coffee compute their alignments based on a neighbour-joining guide tree, biasing subsequent phylogenetic estimates based on the resulting alignment. Moreover, fixing the alignment after the first step ignores the residual uncertainty in the alignment, resulting in an overconfident phylogenetic estimate.

This leads on to the second issue, which is that heuristic methods are used to deal with insertions and deletions (indels), and sometimes also substitutions. This lack of a proper statistical framework makes it very difficult to accurately assess the reliability of the alignment estimate, and the phylogeny depending on it.

The relevance of statistical approaches to evolutionary inference has long been recognised. Time-continuous Markov models for substitution processes were introduced more than three decades ago [7]. Inference methods based on these have been considerably improved since then [8], and now have all but replaced older parsimony methods for phylogeny reconstruction. With alignments, progress towards statistically grounded methods has been slower. The idea to investigate insertions and deletions in a statistical framework was first considered by Bishop and Thompson [9]. The first evolutionary model, termed the TKF91 model, and corresponding statistical tools for pairwise sequence alignment were published by Thorne, Kishino and Felsenstein [10]. Its extension to multiple sequences related by a tree has been intensively investigated in the last few years [11–17], and has recently also been extended to RNA gene evolution [18]. Current methods for statistical multiple alignment often computationally demanding, and full maximum likelihood approaches are limited to small trees. Markov chain Monte Carlo techniques can extend these methods to practical problem sizes.

Statistical modelling and MCMC approaches have a long history in population genetic analysis. In particular, coalescent approaches to genealogical inference have been very successful, both in maximum likelihood [19, 20] and Bayesian MCMC frameworks [21, 22]. The MCMC approach is especially promising, as it allows the analysis of large data sets, as well as nontrivial model extensions, see e.g. [23]. Since divergence times in population genetics are small, alignment is generally straightforward, and genealogical inference from a fixed alignment is well-understood [20, 24–26]. However, these approaches have difficulty dealing with indels when sequences are hard to align. Indel events are generally treated as missing data [27], which renders them phylogenetically uninformative. This is unfortunate as indel events can be highly informative of the phylogeny, because of their relative rarity compared to substitution events. Statistical models of alignment and phylogeny often refer to missing data. Not all of these can be integrated out analytically (e.g. tree topology), and these are dealt with using Monte Carlo methods. The efficiency of such approaches depend to a great extent on the choice of missing data. In previous approaches to statistical alignment, the sampled missing data were either unobserved sequences at internal nodes [28], or both internal sequences and alignments between nodes [13], or dealt exclusively with pairwise alignments [29, 30]. In all cases the underlying tree was fixed. In [31] we published an efficient algorithm for computing the likelihood of a multiple sequence alignment under the TKF91 model, given a fixed underlying tree. The method analytically sums out all missing data (pertaining to the evolutionary history that generated the alignment), eliminating the need for any data augmentation of the tree. This methodology is referred to in the MCMC literature as *Rao-Blackwellization* [32]. As a result, we can treat indels in a statistically consistent manner with no more than a constant multiplicative cost over existing methods that ignore indels.

The only missing ingredient for a full co-estimation procedure is an alignment sampler. Unfortunately, there exists no Gibbs alignment sampler that corresponds to the analytic algorithm referred to above. In this paper we introduce a partial importance sampler to resample alignments, based on a proposal mechanism built on a partial score-based alignment procedure. This type of sampler supports the data format we need for efficient likelihood calculations, while still achieving good mixing in reasonable running time (see Results).

We implemented the likelihood calculator and the alignment sampler in Java, and interfaced them with an existing MCMC kernel for phylogenetics and population genetics [22]. We demonstrate the practicality of our approach on an analysis of 10 globin sequences.