- Methodology article
- Open Access
A Bayesian model for gene family evolution
- Liang Liu^{1}Email author,
- Lili Yu^{2},
- Venugopal Kalavacharla^{3} and
- Zhanji Liu^{3}
https://doi.org/10.1186/1471-2105-12-426
© Liu et al; licensee BioMed Central Ltd. 2011
- Received: 7 June 2011
- Accepted: 1 November 2011
- Published: 1 November 2011
Abstract
Background
A birth and death process is frequently used for modeling the size of a gene family that may vary along the branches of a phylogenetic tree. Under the birth and death model, maximum likelihood methods have been developed to estimate the birth and death rate and the sizes of ancient gene families (numbers of gene copies at the internodes of the phylogenetic tree). This paper aims to provide a Bayesian approach for estimating parameters in the birth and death model.
Results
We develop a Bayesian approach for estimating the birth and death rate and other parameters in the birth and death model. In addition, a Bayesian hypothesis test is developed to identify the gene families that are unlikely under the birth and death process. Simulation results suggest that the Bayesian estimate is more accurate than the maximum likelihood estimate of the birth and death rate. The Bayesian approach was applied to a real dataset of 3517 gene families across genomes of five yeast species. The results indicate that the Bayesian model assuming a constant birth and death rate among branches of the phylogenetic tree cannot adequately explain the observed pattern of the sizes of gene families across species. The yeast dataset was thus analyzed with a Bayesian heterogeneous rate model that allows the birth and death rate to vary among the branches of the tree. The unlikely gene families identified by the Bayesian heterogeneous rate model are different from those given by the maximum likelihood method.
Conclusions
Compared to the maximum likelihood method, the Bayesian approach can produce more accurate estimates of the parameters in the birth and death model. In addition, the Bayesian hypothesis test is able to identify unlikely gene families based on Bayesian posterior p-values. As a powerful statistical technique, the Bayesian approach can effectively extract information from gene family data and thereby provide useful information regarding the evolutionary process of gene families across genomes.
Keywords
- Gene Family
- Posterior Probability Distribution
- Probabilistic Graphical Model
- Death Model
- MCMC Algorithm
Background
A gene family is a group of genes with similar sequences and biochemical functions [1–3]. Investigation of the evolution of gene families provides valuable information regarding the evolutionary forces that may have shaped the genomes of species [4–6]. Advancing biotechnology provides a vast amount of data for the studies of gene family evolution. Meanwhile, probabilistic models, describing the evolutionary process of gene families along a phylogenetic tree, significantly facilitate the analyses of gene family data [7–12]. The size of a gene family may expand or contract over time due to gene duplication and loss [8, 10, 13–15]. The birth and death (BD) model [16–18], which assumes that the size of a gene family follows a birth and death process [8, 19–21], is one of the most frequently used models for gene family evolution [7, 8, 22, 23]. Given the phylogenetic tree, the probability distribution of the size of a gene family has been derived under a probabilistic graphical model (PGM) [24]. Parameters in the PGM include the birth and death rate λ and the counts of gene copies (i.e., the sizes of ancient gene families) at the internal nodes of the phylogenetic tree. The PGM assumes that the phylogenetic tree is given [5, 8, 25], though the tree is often estimated from other sources of data. The PGM provides a probabilistic judgment of the hypothesis that different evolutionary forces may have acted on particular gene families or particular lineages of the phylogenetic tree [8]. The PGM can be used to simulate gene family data to evaluate the performance of various computational methods for gene family evolution, including comparative phylogenetic methods [26] that estimate gene duplication and loss events by mapping gene trees onto the species tree [27]. In contrast to comparative phylogenetic methods, the maximum likelihood (ML) method [8] under the PGM is able to estimate the birth and death rate λ.
In this study, we develop a Bayesian approach for estimating the birth and death rate λ and the sizes of ancestral gene families at the internal nodes of the phylogenetic tree. Moreover, a Bayesian hypothesis test [28] is developed to identify the gene families that are highly unlikely under the birth and death model. Our major goal is to provide a Bayesian alternative to the ML method for estimating parameters in the birth and death model [8]. Although simulation results suggest that the Bayesian estimates of the model parameters are more accurate than the maximum likelihood estimates, it does not necessarily imply that the Bayesian method developed in this paper, in general, outperforms the ML method. In fact, both methods are useful for making inferences on the evolution of gene families.
Methods
A Bayesian model for gene family evolution
Let X = {X_{ ij },i = 1,...,I and j = 1,...,J} denote gene family data, where X_{ ij } is the size (the number of gene copies) of gene family i for species j, I is the total number of gene families in the data, and J is the number of species. The Bayesian model has the following parameters; ψ: the phylogenetic tree; θ_{ ik }: the size of gene family i at internal node k; and λ: the birth and death rate parameter. We assume that the topology and branch lengths (in millions of years) of the phylogenetic tree are known. The Bayesian model consists of two major components [29]; the prior distribution of model parameters {λ,θ,ψ} and the likelihood function f(X|λ,θ,ψ), i.e., the probability distribution of gene family data X given parameters {λ,θ,ψ}. As the phylogenetic tree is known, the prior distribution of ψ is trivial, i.e., the phylogenetic tree with branch lengths is fixed with probability 1. Given the tree ψ, we assume that the prior distribution f(λ|ψ) of the birth and death rate λ is uniform (0, 1/max(t)), where max(t) is the largest branch length in the tree (see below for the restricted parameter space of λ). We also assume that there is no prior knowledge about θ (the counts of gene copies at the internal nodes of the tree), i.e., the prior f(θ|λ,ψ) of θ is a discrete uniform distribution.
Bayesian estimation of model parameters
After the burn-in period, the Metropolis-Hastings algorithm converges to the posterior probability distribution f(λ,θ|X,ψ). The convergence rate of the Metropolis-Hastings algorithm is largely dependent on the starting values of λ and θ. It follows from (1) that given s and time t, the mean and variance of c are equal to (Bailey 1964):
If the assignment of the root for gene family i is unknown, θ_{ i }^{*} in (6) is replaced by its consistent estimate ${{\widehat{\theta}}_{i}}^{*}$. When λ is constant among all branches of the tree, it is straightforward that the average rate, i.e., $\widehat{\lambda}=\frac{1}{J}\sum _{j=1}^{J}{\widehat{\lambda}}_{j}$is a consistent estimate of λ. We use these consistent estimates as the starting values of λ and θ to improve the convergence rate of the Metropolis-Hastings algorithm. Convergence of the Metropolis-Hastings algorithm may be assessed by comparing the results from two or more independent runs [33, 34]. Running multiple chains, however, will dramatically increase the computational time. More commonly, convergence of the algorithm is evaluated by examining the log likelihood values for a single run [33].
Posterior Predictive P-value for detecting unlikely gene families
Some gene families may have significantly higher or lower birth and death rates than other families in the dataset. These gene families are highly unlikely to be observed under the BD model that assumes a constant birth and death rate among all gene families. The classical p-value for detecting unlikely gene families depends on the assignment of the tree root [8]. Because the size of a gene family at the root of the tree is unknown in most practical situations, the classical p-value cannot be directly calculated. This is generally called "nuisance parameter problem" (the nuisance parameter is the assignment of the root) [28, 35]. To overcome this problem, Hahn et al. [4] proposed to compute the maximum conditional p-value among all possible assignments of the root. Although Hahn et al. [4] have demonstrated that the maximum conditional p-value can be used to detect unlikely gene families, it should be noted that the maximum conditional p-value is no longer the tail-area probability as intended in classical approaches [28].
A random gene family X_{ i }^{ * } is generated from the BD model at each cycle of the MCMC algorithm. The PPP of gene family X_{ i } is estimated by the proportion of cycles at which the likelihood score f(X_{ i }*|λ,θ,ψ) is less than f(X_{ i }|λ,θ,ψ) [28]. Under the null hypothesis, PPP is expected to be near 0.5 [28]. Extreme PPPs (close to 0 or 1) imply that gene family X_{ i } is highly unlikely to be observed under the BD model. Moreover, a gene family with a slow birth and death rate tends to have a higher likelihood score than a gene family with a fast rate. Thus a small PPP (close to 0) indicates that the birth and death rate of the gene family is significantly greater than those of other gene families. A large PPP (close to 1) implies that the birth and death rate of the gene family is significantly less than the rates of other gene families.
Testing homogeneous birth and death rates among branches of the tree
The hypothesis of homogeneous birth and death rates among branches of the tree can be tested under the maximum likelihood framework [1, 27, 36]. Under the Bayesian framework, the evidence for supporting the null hypothesis (H_{0}) against the alternative hypothesis (H_{1}) is evaluated by the Bayes Factor [37], $BF=\frac{f\left(X|{H}_{1}\right)}{f\left(X|{H}_{0}\right)}$, where f(X|H_{0}) is the marginal likelihood under the null hypothesis (homogeneous rates) and f(X|H_{1}) is the marginal likelihood under the alternative hypothesis (heterogeneous rates). In general, Ln(BF)>10 [38] is interpreted as strong evidence for supporting the alternative hypothesis (heterogeneous rates).
Results
Simulation
The estimation error of the proportions of gene families that showed expansions, contractions, and no change.
λ = 0.001 | λ = 0.005 | λ = 0.01 | ||||
---|---|---|---|---|---|---|
# of gene families | Bayesian | CAFE | Bayesian | CAFE | Bayesian | CAFE |
20 | 0.07 | 0.138 | 0.088 | 0.184 | 0.089 | 0.214 |
40 | 0.048 | 0.105 | 0.062 | 0.148 | 0.063 | 0.179 |
60 | 0.032 | 0.089 | 0.051 | 0.134 | 0.052 | 0.170 |
80 | 0.032 | 0.084 | 0.045 | 0.130 | 0.045 | 0.170 |
100 | 0.03 | 0.081 | 0.039 | 0.126 | 0.032 | 0.164 |
Additional simulations were carried out to compare the performance of the hypothesis tests based on the Bayesian p-value and the maximum conditional p-value. A total of 9 gene families were simulated using the phylogenetic tree in Figure 2 with λ = 0.001. Another gene family was generated from the same phylogenetic tree with a higher birth and death rate λ = 0.005 and treated as the unlikely gene family. This represents the scenario that the unlikely gene family has a faster birth and death rate than other gene families. We also considered the scenario where the unlikely gene family has a slower birth and death rate than other gene families. The unlikely gene family was generated with a birth and death rate λ = 0.001, while other gene families were generated with λ = 0.005. The simulated gene families were analyzed by the Bayesian and ML methods (the ML method was implemented in CAFE) respectively to identify unlikely gene families. We carried out two Bayesian hypothesis tests. The one-sided Bayesian hypothesis test identified an unlikely gene family if PPP < 0.1, while the two-sided Bayesian hypothesis test identified an unlikely gene family if PPP < 0.1 or PPP > 0.9. Because a small PPP is associated with the unlikely gene families that have a fast birth and death rate, we expect that the one-sided Bayesian test (PPP < 0.1) is able to identify unlikely gene families with a high birth and death rate (the first scenario described above). However, the one-sided Bayesian test is incapable of identifying unlikely gene families with a slow birth and death rate (the second scenario). In contrast, the two-sided Bayesian hypothesis test works for both scenarios. The type I error was set 0.05 for both Bayesian and classical hypothesis tests. The simulations were repeated 100 times and we calculated the proportion of trials yielding the true unlikely gene families. Finally, we increased the number of simulated gene families from 10 to 20 (including one unlikely gene family) to investigate the effect of the sample size (the number of gene families) on the performance of the Bayesian and classical hypothesis tests.
Overall, the hypothesis tests based on the Bayesian (one-sided and two-sided) and maximum conditional p-values perform almost equally well in identifying the unlikely gene families with a fast birth and death rate (Figure 3d). However, CAFE and the one-sided Bayesian hypothesis test perform poorly in detecting unlikely gene families with a slow birth and death rate (Figure 3e). In contrast, the two-sided Bayesian hypothesis test, as we expected, is capable of identifying gene families with a slow birth and death rate, though the discovery rate is rather low (Figure 3e).
Real data analysis
The Bayesian model was applied to a gene family dataset generated from five Saccharomyces (S. bayanus, S. kudriavzevii, S.mikatae, S.paradoxus, S.cerevisiae) genomes. The dataset contains 3517 gene families. The phylogenetic tree was given by Hahn et al. [8]. The MCMC algorithm ran for 10,000,000 generations. The log-likelihood score reached stationarity at the 5,000,000^{th} generation. With the assumption of a constant birth and death rate along the lineages of the phylogenetic tree, the Bayesian analysis for the yeast dataset estimated the birth and rate $\widehat{\lambda}=0.00213$, which is close to the maximum likelihood estimate $\widehat{\lambda}=0.0023$ in the previous study [8]. However, the consistent unbiased estimates (defined in equation (7)) of the birth and death rates along the lineages leading to the five extant species are 0.004, 0.0046, 0.0028, 0.0025, 0.0038 respectively, indicating that the homogeneous rate model may not be able to adequately explain the yeast dataset. The Bayesian analysis of model selection described in the previous selection confirmed that the BF ( > 100) strongly favors the heterogeneous rate model. Thus the analysis of the yeast dataset is based on the Bayesian heterogeneous rate model.
Unlikely gene families were identified on the basis of their PPP values under the Bayesian heterogeneous rate model. A gene family is identified as an unlikely family if PPP < 0.01 or PPP > 0.99 (the corresponding type I error is < 0.005). A large PPP (> 0.99) suggests that the birth and death rates of the unlikely gene families on some branches of the phylogenetic tree are significantly smaller than those of other gene families. A small PPP (< 0.01) suggests that the birth and death rates of unlikely gene families on some branches are significantly larger than those for other gene families. The two-sided Bayesian hypothesis test suggests that 2263 gene families have PPP values > 0.99. It is not a surprise because all these gene families have no change in size across five yeast species, extremely unlikely to be observed under the BD model. This result suggests that the yeast dataset may reflect two different evolutionary patterns. A majority of gene families (2263) have no change in size across five Saccharomyces species, suggesting a very slow birth and death rate (close to 0), while the sizes of the remaining 1254 gene families are distinct across species, suggesting a relatively fast birth and death rate. It would be more appropriate to analyze the two groups of gene families separately. It is, however, unnecessary to analyze the 2263 gene families with no change in size because these gene families obviously support a very slow birth and death rate λ.
The Bayesian estimates of the numbers of gene families in the reduced yeast dataset (1257 gene families) that showed expansions, no change, or contractions on the eight branches of the phylogenetic tree in Fig.4.
Branch number | Expansions | No change | Contractions |
---|---|---|---|
1 (t = 12) | 84 | 1120 | 50 |
2 (t = 12) | 48 | 1129 | 77 |
3 (t = 22) | 616 | 510 | 128 |
4 (t = 27) | 496 | 635 | 123 |
5 (t = 32) | 51 | 1107 | 96 |
6 (t = 10) | 36 | 1126 | 92 |
7 (t = 5) | 3 | 1146 | 5 |
8 (t = 5) | 50 | 1134 | 70 |
The most unlikely gene families identified by the Bayesian hypothesis test.
Family ID | Gene family | PPP |
---|---|---|
3 | (2 (8 (15 (34 83)))) | 0.000 |
18 | (17 (14 (15 (1 5)))) | 0.000 |
28 | (1 (3 (3 (2 34)))) | 0.000 |
13 | (7 (16 (7 (20 17)))) | 0.002 |
34 | (5 (11 (14 (4 2)))) | 0.003 |
6 | (15 (33 (24 (30 31)))) | 0.004 |
397 | (1 (1 (2 (1 5)))) | 0.006 |
77 | (2 (5 (4 (7 4)))) | 0.019 |
256 | (1 (2 (7 (1 1)))) | 0.019 |
89 | (2 (9 (4 (2 2)))) | 0.021 |
262 | (1 (4 (4 (1 1)))) | 0.025 |
Discussion
Simulation results suggest that the maximum likelihood method tends to underestimate the birth and death rate, while the Bayesian approach is able to produce more accurate estimates of the birth and death rate and other parameters in the BD model. It is not intended in this paper, however, to claim that the Bayesian method is, in general, superior to the maximum likelihood method in estimating model parameters. There might be some cases for which the maximum likelihood method outperforms the Bayesian method and provides more accurate estimates of parameters in the BD model. It demands an extensive simulation study and a sufficient number of empirical data analyses to get a clear picture of how the two methods perform in various situations, which is certainly beyond the scope of this paper.
Recently, Cohen and Pupko [18] developed several probabilistic-evolutionary models for analyzing gene family data. These models assume that the evolution of gene family content follows a continuous time two-state Markov process. The models coupled with stochastic mapping are able to identify horizontal gene transfer events on the lineages of the phylogenetic tree [18]. These models allow the gain and loss rates to vary across gene families [18, 40]. Similarly, the Bayesian model developed in this paper can be extended to handling variable rates over gene families by assuming a probability distribution for the gene-family-specific rates.
Choosing the appropriate prior distribution for model parameters is always challenging in Bayesian analyses. A non-informative prior is desirable if there is no prior knowledge about the probability distribution of parameters, but it is often difficult to find a non-informative prior for model parameters. It is reasonable to specify a flat prior (uniform distribution, see the section "A Bayesian model for gene family evolution") for parameters λ and θ if there is no prior information available for λ and θ. Alternatively, an informative prior may be used in the Bayesian analysis of gene family data. Nevertheless, concerns about the choice of prior distribution will be greatly alleviated when gene family data, especially those from genomic studies, have a large sample size (for example, the yeast dataset analyzed in this paper involves 3517 gene families).
The computational time (seconds) for running the Bayesian analysis (10000 iterations) on a Lenovo notebook T61 (Intel 2 Duo CPU, 2.4 GHz, 2.48 GB of RAM).
number of gene families | 5 species | 10 species | 20 species |
---|---|---|---|
10 | 11 | 22 | 42 |
20 | 20 | 38 | 52 |
40 | 40 | 64 | 104 |
The Bayesian p-value appears to be useful in identifying unlikely gene families. It should be noted, however, that neither the classical p-value nor the Bayesian p-value represents the probability that the null hypothesis is true. Thus they do not provide direct evidence for accepting or rejecting the null hypothesis. The Bayesian p-value can be interpreted as a measure of discrepancy between the observed data and those expected from the assumed probabilistic model under the null hypothesis. Gene families with small (typically < 0.05) or large (> 0.95) Bayesian p-values can be regarded as outliers (or unlikely gene families), which are unlikely to be observed under the null hypothesis. The Bayesian p-value provides a general way to handle the problem of nuisance parameters [28]. Regardless of the type of p-values (the Bayesian p-value or the maximum conditional p-value) in use, the hypothesis test for unlikely gene families does not appear to have much power when the unlikely gene family has a slow birth and death rate (Figure 3e).
Conclusions
Accurately estimating the birth and death rate as well as the numbers of gene copies at the internal nodes of the phylogenetic tree is the major goal of the statistical analyses of gene family data. In this paper, we develop a Bayesian approach for estimating these parameters from gene family data. The results of the simulation study and the empirical data analysis suggest that the Bayesian method can accurately estimate the parameters in the BD model. The source code for implementing the Bayesian analysis is written in C and available at http://code.google.com/p/begfe.
Declarations
Acknowledgements
We thank Scott Edwards, Matthew Rasmussen, and David Liberles for helpful discussion on the first draft of the manuscript. We thank Matthew Hahn for sharing the yeast gene family data. This research was supported by LL's startup grant.
Authors’ Affiliations
References
- Demuth JP, Hahn MW: The life and death of gene families. BioEssays: news and reviews in molecular, cellular and developmental biology 2009, 31(1):29–39. 10.1002/bies.080085View ArticleGoogle Scholar
- Ohta T: Evolution of gene families. Gene 2000, 259: 45–52. 10.1016/S0378-1119(00)00428-5View ArticlePubMedGoogle Scholar
- Ridley M: Evolution. Hoboken: Wiley-Blackwell; 2003.Google Scholar
- Holmes RS, Lim HA: Gene Families: Structure, Function, Genetics and Evolution. Singapore: World Scientific Publisher; 1996.Google Scholar
- Hahn MW, Demuth JP, Han SG: Accelerated rate of gene gain and loss in primates. Genetics 2007, 177(3):1941–9. 10.1534/genetics.107.080077PubMed CentralView ArticlePubMedGoogle Scholar
- Basten CJ, Ohta T: Simulation study of a multigene family, with special reference to the evolution of compensatory advantageous mutations. Genetics 1992, 132(1):247–52.PubMed CentralPubMedGoogle Scholar
- Karev GP, Wolf YI, Berezovskaya FS, Koonin EV: Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models. BMC Evol Biol 2004, 4: 32. 10.1186/1471-2148-4-32PubMed CentralView ArticlePubMedGoogle Scholar
- Hahn MW, De Bie T, Stajich JE, Nguyen C, Cristianini N: Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res 2005, 15(8):1153–60. 10.1101/gr.3567505PubMed CentralView ArticlePubMedGoogle Scholar
- Yanai I, Camacho CJ, DeLisi C: Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. Phys Rev Lett 2000, 85(12):2641–4. 10.1103/PhysRevLett.85.2641View ArticlePubMedGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science 2000, 290(5494):1151–5. 10.1126/science.290.5494.1151View ArticlePubMedGoogle Scholar
- Ohta T: An Extension of a Model for the Evolution of Multigene Families by Unequal Crossing over. Genetics 1979, 91(3):591–607.PubMed CentralPubMedGoogle Scholar
- Arvestad L, Berglund AC, Lagergren J, Sennblad B: Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 2003, 19(Suppl 1):i7–15. 10.1093/bioinformatics/btg1000View ArticlePubMedGoogle Scholar
- Nei M, Rooney AP: Concerted and Birth-and-Death Evolution in Multigene Families. Ann Rev Genet 2005, 39: 121–152. 10.1146/annurev.genet.39.073003.112240PubMed CentralView ArticlePubMedGoogle Scholar
- Ohta T: Simulating evolution by gene duplication. Genetics 1987, 115(1):207–13.PubMed CentralPubMedGoogle Scholar
- Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, Meltesen L, Brenton M, Hink R, Burgers S, Hernandez-Boussard T, Karimpour-Fard A, Glueck D, McGavran L, Berry R, Pollack J, Sikela JM: Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol 2004, 2(7):E207. 10.1371/journal.pbio.0020207PubMed CentralView ArticlePubMedGoogle Scholar
- Csuros M: Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 2010, 26(15):1910–2. 10.1093/bioinformatics/btq315View ArticlePubMedGoogle Scholar
- Csuros M, Miklos I: Streamlining and large ancestral genomes in Archaea inferred with a phylogenetic birth-and-death model. Molecular biology and evolution 2009, 26(9):2087–95. 10.1093/molbev/msp123PubMed CentralView ArticlePubMedGoogle Scholar
- Cohen O, Pupko T: Inference and characterization of horizontally transferred gene families using stochastic mapping. Molecular Biology and Evolution 2010, 27(3):703–13. 10.1093/molbev/msp240PubMed CentralView ArticlePubMedGoogle Scholar
- Feller W: An introduction to probability theory and its application. New York: John Wiley & Sons; 1968.Google Scholar
- Cohen O, Ashkenazy H, Belinky F, Huchon D, Pupko T: GLOOME: gain loss mapping engine. Bioinformatics 2010, 26(22):2914–5. 10.1093/bioinformatics/btq549View ArticlePubMedGoogle Scholar
- Iwasaki W, Takagi T: Reconstruction of highly heterogeneous gene-content evolution across the three domains of life. Bioinformatics 2007, 23(13):i230–9. 10.1093/bioinformatics/btm165View ArticlePubMedGoogle Scholar
- Huynen MA, van Nimwegen E: The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol 1998, 15(5):583–9. 10.1093/oxfordjournals.molbev.a025959View ArticlePubMedGoogle Scholar
- Hahn MW, Han MV, Han SG: Gene family evolution across 12 Drosophila genomes. PLoS Genet 2007, 3(11):e197. 10.1371/journal.pgen.0030197PubMed CentralView ArticlePubMedGoogle Scholar
- Lauritzen SL: Graphical models. Oxford, UK: Clarendon Press; 2001.Google Scholar
- Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW: The evolution of mammalian gene families. PLoS One 2006, 1: e85. 10.1371/journal.pone.0000085PubMed CentralView ArticlePubMedGoogle Scholar
- Page RD: GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics 1998, 14(9):819–20. 10.1093/bioinformatics/14.9.819View ArticlePubMedGoogle Scholar
- Hahn MW: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol 2007, 8(7):R141. 10.1186/gb-2007-8-7-r141PubMed CentralView ArticlePubMedGoogle Scholar
- Meng X: Posterior predictive p-value. Ann Statist 1994, 22: 1142–1160. 10.1214/aos/1176325622View ArticleGoogle Scholar
- Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian data analysis. 2nd edition. New York: Chapman and Hall/CRC; 2003.Google Scholar
- Hastings WK: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57: 97–109. 10.1093/biomet/57.1.97View ArticleGoogle Scholar
- Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of State Calculations by Fast Computing Machines. J Chem Phys 1950, 21: 1087–1092.View ArticleGoogle Scholar
- Barber MN, Ninham BW: Random and Restricted Walks: Theory and Applications. New York: Gordon and Breach Publisher; 1970.Google Scholar
- Cowles MK, Carlin BP: Markov Chain Monte Carlo convergence diagnostics. JASA 1996, 91: 883–904. 10.1080/01621459.1996.10476956View ArticleGoogle Scholar
- Johnson VE: Studying convergence of Markov Chain Monte Carlo algorithms using coupled sample paths. JASA 1996, 91: 154–166. 10.1080/01621459.1996.10476672View ArticleGoogle Scholar
- Wallace DL: The Behrens-Fisher and Feiller-Creasy problems. In RA Fisher: An appreciation. Edited by: Fienberg SF, Hinkley DV. Springer: New York; 1980:119–147.View ArticleGoogle Scholar
- Novozhilov AS, Karev GP, Koonin EV: Biological applications of the theory of birth-and-death processes. Briefings in bioinformatics 2006, 7(1):70–85. 10.1093/bib/bbk006View ArticlePubMedGoogle Scholar
- Bernardo J, Smith AFM: Bayesian Theory. New York: John Wiley; 1994.View ArticleGoogle Scholar
- Jeffreys H: The Theory of Probability. UK: Oxford Publisher; 1961.Google Scholar
- De Bie T, Cristianini N, Demuth JP, Hahn MW: CAFE: a computational tool for the study of gene family evolution. Bioinformatics 2006, 22(10):1269–71. 10.1093/bioinformatics/btl097View ArticlePubMedGoogle Scholar
- Cohen O, Rubinstein ND, Stern A, Gophna U, Pupko T: A likelihood framework to analyse phyletic patterns. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2008, 363(1512):3903–11. 10.1098/rstb.2008.0177PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.