One of the most fundamental problems of molecular biology is to annotate all functional elements in the genome. Because of evolutionary pressure, many of the functional elements are believed to be located within regions that evolve more slowly than the overall genome [1, 2]. Regions that evolve more slowly are called evolutionarily conserved elements. Conservation analysis by comparing genomes of related species is a powerful approach for identifying functional elements like protein/RNA coding regions and transcriptional regulatory elements [3–8].

In comparative genomics, one often starts with an alignment of multiple orthologous sequences from several species. The power of a conservation analysis is usually measured by its sensitivity and specificity of detecting conserved elements from given alignments. Several papers have been published recently dealing with the power evaluation of phylogenetic models in comparative genomics studies [9, 10]. Both Eddy [9] and McAuliffe et al. [10] evaluated the power of such studies assuming a symmetric star topology and the Jukes-Cantor nucleotide substitution model, whose simple structure makes it possible to evaluate the power of the method analytically. McAuliffe et al. considered the hypothesis testing problem for the conservation of a single nucleotide site, whereas Eddy considered the classification problem for whether a whole DNA block is conserved or not. Both papers reported the theoretical power mainly as a function of the number of comparative genomes and the branch length from the center of the star topology to each species. However, the approaches they employed are not able to accommodate the substitution rate variation and correlations among nearby nucleotides along a genomic sequence. More realistic models are needed to account for the spatial rate variation and correlation in order to more accurately reflect the nature of real data [11]. Various models have been proposed for this task, including the hidden Markov model [12, 13], Ising chains [14, 15] and Markov random fields [16, 17]. In particular, Felsenstein and Churchill [13] used HMMs to model the substitution rate correlation along the genome. Since then, this model has been adapted to many evolution related problems [7, 18–20].

In this paper we evaluate the power of the phylogenetic hidden Markov model (phylo-HMM), which models the substitution rate variation using an HMM. As introduced collectively in ref. [7, 12, 13] to improve phylogenetic modeling, Phylo-HMM is a generative probability model for aligned multiple orthologous sequences. It models the molecular evolution in both the space dimension along the genome and the time dimension along branches of the phylogenetic tree. Along the genome of the common ancestor, an HMM is used to describe the change from one site to the next. Along each branch of the phylogenetic tree, a continuous-time Markov process is used to model the evolutionary process. Siepel et al. [7] implemented a two-state phylo-HMM to perform a genome-wide detection of evolutionarily conserved elements. They showed that this model can yield biologically meaningful results, but did not evaluate the statistical power of the method. In their study, the transition matrix of the HMM in the phylo-HMM model is assumed to be known and unchanged along the genome. This is equivalent to assuming that the expected length and coverage (defined as the percentage) of conserved elements are known and that the conservation ratio is homogeneous along the genome, which is unrealistic [21].

The goal of this paper is to systematically assess the ability (or power) of the phylo-HMM to identify conserved genomic regions. Because this model is too complex for a theoretical analysis and because the availability of real data with experimentally verified conserved elements is still very limited, we adopt a simulation approach. Given that different regions along the genome vary in terms of their neutral substitution rates, the expected lengths and coverage of the conserved elements [21], we used sequence data from promoter regions of four selected species (human, mouse, rat, and dog) to first estimate the set of key parameters of the phylo-HMM. We call this set of estimated parameter values the *baseline*. We simulated sequences by varying one or a subset of parameters around the *baseline*. Using these simulated sequences, we explore several fundamental questions in comparative genomics with respect to this model, including 1) does the topology of the phylogenetic tree critically impact statistical power?, 2) is the accuracy of the predictions significantly reduced if we simplify the nucleotide substitution model?, 3) is it always beneficial to select more distantly related species for this analysis, thereby increasing the branch lengths of the phylogenetic tree?, 4) how many comparative genomes do we need in order to detect conserved elements of a certain length with satisfactory power?, and 5) is spatial variation of the conservation ratio important? Assuming that the underlying phylo-HMM model is a reasonable approximation of reality, our results reveal a number of insights. First, we demonstrate that the most important factors of a comparative genomics analysis appear to include the number of genomes used and the evolutionary distance between pairs of species, consistent with results for simpler models reported by others. Further, we show that the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. Therefore, our results show that it is important to accurately characterize the inter-species (spatial) variation in the phylo-HMM model. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor.