Skip to main content

GENOMEPOP: A program to simulate genomes in populations

Abstract

Background

There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation.

Results

I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios.

Conclusion

GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTR×MG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page [1].

Background

There are several situations in population biology research where simulation of DNA sequences is useful. Simulations have been used to for hypothesis testing [24], to study the impact of differing demographic scenarios on patterns of human diversity [5], or to simulate the evolution of complex diseases in human populations [6, 7]. In addition, population simulation of genetic datasets is also used to estimate population parameters [810].

One of the most exciting research areas in the current context of population genetics is the HapMap project. Knowledge about patterns of linkage disequilibrium (LD) in humans is very important from a genomic point of view. The existence of linkage or haplotype blocks [11] or, at least, networks of SNPs in high LD [12], will facilitate the assembly of human genome haplotype maps [1315] that will enormously improve, among other things, the efficiency of disease gene mapping. It seems that these blocks are mainly defined by recombination hot spots [16, 17], but haplotype blocks can also be generated by genetic drift in regions of uniform recombination if rates is low enough [18]. We have now growing empirical knowledge about haplotype block and tagSNP diversity, but less is known about the effect of population demographic history. Though important work has been undertaken in the application of population genetics to LD mapping [1922] and its relevance to human populations [2325], we still have an incomplete understanding of how the combined effect of genetic drift, mutation, recombination and migration, affect LD and tagSNP patterns, although it is known that they do [26]. Moreover, recombination is an important evolutionary process to understand how genetic diversity is generated and maintained in populations. Jointly with positive selection, recombination allows for very high rates of evolution [27]. However, the impact of recombination is dependent on other forces, such as selection and demography. Developing tools that allow simultaneous simulation of natural selection, recombination and complex demographic patterns will be of great help in trying to better understand the trace left on the DNA by the different interacting evolutionary forces.

Simulation of biological populations under different evolutionary genetic models can be done following backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally very efficient because they are based on the history of lineages with surviving offspring in the current population and ignore all individuals that are not ancestral to the present-day population [20]. Hence, coalescent is a sample-based theory relevant to the study of population samples and DNA sequence data. From its beginnings, the basic coalescent has been extended in several useful ways. For example, to include structured population models [2832], changing population size [3335], recombination [36, 37] and selection [3843].

On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. The first of these is the same feature that causes its efficiency, namely, the coalescent does not keep track of the complete ancestral information i.e. only takes into account ancestries that survived to form the present-day sample. Thus, if the interest is focused on the evolutionary process itself, rather than on its outcome, forward simulations should be preferred [44]. Second, coalescent simulations are complicated by simple genetic forces such as selection, and although different evolutionary scenarios have been incorporated (see above) it is still difficult to implement models incorporating complex evolutionary situations with selection, variable population size, recombination, complex mating schemes, and so on. In fact, we can only simulate limited forms of recombination and selection under the coalescent. It is known that recombination has a major impact for detecting positive natural selection [45, 46]. Shriner et al studied the impact of recombination under a neutral model. Anisimova et al studied the recombination effect under a coalescent codon-based model i.e. the unit of change was the codon instead of the nucleotide. In the latter case, recombination was not simulated at the intracodon level. Therefore, we still ignore the importance of intracodon recombination under a given codon-based model. Moreover, coalescent methods cannot yet simulate realistic samples of complex human diseases [6]. Indeed, when simulating non-neutral scenarios and/or complex models under the coalescent, much of its computational efficiency is lost (however, see recent work by Marjoram [47] and Liang [48]). Furthermore, the coalescent model is based on specific limiting values and relationships between some important parameters [49]. Hence, there is increasing interest in forward population genetic simulation and new efficient tools have been recently developed [5052]. Therefore, a program that allows the simulation forward in time, of different Markovian nucleotide or codon models of DNA mutation combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios, will be of great interest. Here I will introduce the program GenomePop that fulfills the mentioned requirements.

Implementation

GenomePop uses a simple and efficient algorithm to perform forward simulation of populations and/or genomes. The basic idea considers an individual as the differences (mutations) between this individual and a reference or consensus genotype. Thus, each individual is no longer represented by its complete sequence or genotype but by the mutations it carries with respect to the consensus. A more detailed explanation of the algorithm is provided at the program web page. Taking advantage of the efficiency of this approach, GenomePop can simulate, forward in time, DNA sequences under specific Markov models. The program allows the simulation of recombination under both nucleotide and codon models of evolution, providing a way to simulate recombination at inter and intracodon levels under codon models. It also permits arbitrary migration models, simulation of SNPs, recombination hot-spots, fitness-based selection and many other features that are detailed in the program web-page. GenomePop has different output formats as GenePop for SNPs and Phylip or Nexus for DNA sequences.

Markov models of DNA mutation

Markov processes are used in molecular evolution to describe the change between nucleotides, aminoacids or codons over evolutionary time. Usually, time is measured as the number of substitutions because molecular sequence data does not allow the separate estimation of the rate and the time, but only of their product [53]. In the context of forward simulation we are not interested in the transition after an arbitrary time t (branch length) but just in the transition from a nucleotide or codon to another, given that a mutation occurs. An advantage of this approach is that we need to compute the transition matrix just once at the beginning of the evolutionary process. Therefore, consider a given instantaneous substitution rate matrix Q, which allows for a complete definition of any Markovian substitution model [53], the matrix M = -qQ + I is the conditional transition matrix to go from i to j provided that a substitution occurs, where q = diagonal (1/q i ) and I is the identity matrix [54]. Then, given an instantaneous substitution matrix Q, estimated for example using PAUP [55] or Hyphy [56] programs, we can obtain the corresponding transition matrix M that can be used to produce the necessary mutation process in a forward in time evolutionary model.

Biological models

There are two basic biological models implemented in GenomePop, namely "viral" and "non-viral". The only difference that distinguishes them is just that in the viral model the initial sequences are different in each population, as the different viruses infect different individuals. Thus, the user can define a viral model indicating the percentage of sequence identity (0–100) between the sequences of the distinct populations. By default the sequence identity is zero i.e. the sequences at each population are randomly settled. In the non-viral model the initial sequence is the same for every population (identity of 100%).

DNA models, recombination and selection

There are different DNA models implemented in GenomePop (Table 1). In any of them, the user can decide to allow recurrent mutation, i.e. multiple site hits or not. Models can be haploid or diploid. Population size can be constant or variable. In the four-allele models, the sequences can be generated by the program or provided by the user. In the case of the 2-allele model (SNPs) just one or several chromosomes can be considered. In this same model, recombination can be constant or a hot spot recombination model can be defined. In the latter, the recombination rate r is per haploid region and generation. If no hot spots are defined, the expected number of recombination events between any two sites i and j will be 2rdij/(L-1) where dij is the implied region length and L is the chromosome length. The number of recombination events between the two chromosome extremes 0 and L -1 will be 2rdij/(L-1) = 2r. In GenomePop, the effect of natural selection can be modelled in two different ways: 1) by its effects on the dN/dS ratio i.e. by defining a codon model, and 2) via the fitness effect of mutation on specific loci. The user can run either of two models. The codon model option runs a MG94 codon model [57] with a given dN/dS combined with any defined nucleotide model. This model of codon evolution will be implemented by the instantaneous rate matrix to go from codon i to j. That is, Qij = θ mn n where θ mn accounts for biased nucleotide, m to n substitutions; k = 1 or ω for synonymous or nonsynonymous mutation rates respectively and π n is the equilibrium frequency of the target nucleotide. This corresponds to the MG94 model [57] with the restriction of α = 1. Nucleotide equilibrium frequencies are used instead of codon frequencies. To simulate a given dN/dS we simply set ω = dN/dS. Alternatively, the user can set the codon model option to false (default option) and define specific sites under directional selection with a given selective coefficient which will apply when a mutation occurs at such site. The user can also force all sites to undergo selection. The selection coefficient, s, can be constant or sampled from a gamma distribution with user-defined shape parameter β and scale parameter β/s. The β parameter allows for modelling of the fitness effects distribution, e.g. a low value of β (0.1) will sample many mutations with low effect and few with high. A β parameter of 1 corresponds to the exponential distribution. If we set β to 0 then a constant effect model is applied. Moreover, GenomePop permits the combination of both kinds of models of selection, codon and fitness-based, though the biological meaning of such a mixture is not clear.

Table 1 GenomePop DNA models

Migration models

Two basic migration schemes, island model and one-dimensional stepping stone, are pre-defined in GenomePop. However, the user can define any migration model of interest (Figure 1). To do this, set the flow model to 'user' in the standard input file and then just introduce a scheme similar to that of Figure 1 in a file called MigrationModel.txt. In this file, the lines beginning with '#' are comments. To indicate how individuals will migrate from a given population just begin the line with the word "pop". The order of appearance of each population in the file will correspond with its index i.e. the first population that appear is the population number one, etc. The number below "pop" refers to the migration level, i.e. the number of different migration rates defined from this population. The next line should begin with a migration rate (between 0 and 1) followed, in the same line, by the target population(s). We should have as many of these kinds of lines as the migration level indicates, i.e. if the migration level is 2 we should have two lines beginning with a migration rate. More detailed explanation and specific examples are given in the program web page.

Figure 1
figure 1

Example of a user-defined migration model.

Scaling

Clearly, the more complex the model defined, the slower the simulation. To avoid high computation times, GenomePop incorporates a scaling option based on the fact that, under neutral models, we can scale the population size N and the time t, provided the consequent correction to the mutation (μ), migration (m) and recombination (r) rates holds the corresponding compound products , Nr, Nm, etc., constant.

Results

Input file

The input file should be called GenomePopInput.txt. In this file, lines beginning with '#' are comments and will be ignored. In Figure 2 we can see an example of an input file. Note that the input is flexible, i.e. the minimum input for GenomePop to work appropriately corresponds to the first line and the values below it. This line must begin with the identifier 'chromsize' and the line below with the corresponding desired values. Note that, in lines with identifiers, only the first word matters for the program.

Figure 2
figure 2

Input file to generate 100 datasets under a GTR model.

Thus, the input in Figure 2 generates 100 datasets under a GTR model with substitution rates typical for HIV [58]. Both recurrent and retromutation are allowed. The system will evolve 1 chromosome of 1 Kb under the given model over 20,000 generations. As can be seen in Figure 2, a scaling of 10 was used, which implies that both, population size and the number of generations, was divided by 10 and mutation was multiplied by the same factor. A more exhaustive explanation of the input facilities of GenomePop is provided at the program web page.

Example and validation of the Markov mutation method

For each obtained dataset from the input in Figure 2, the best-fit model of nucleotide substitution under the Akaike information criteria (AIC) was estimated with Modeltest v3.6 [59], using maximum likelihood (ML) estimates from PAUP* [55]. The percentage of correct model estimation (GTR) was 97% although some datasets, about 29%, were also assigned invariable sites or rate heterogeneity among sites. The substitution pattern and equilibrium frequencies were correctly estimated.

Examples and validation of other general features

As GenomePop has many different features and models it is difficult to validate every possibility or circumstance. However, strong effort has been made to validate the program as thoroughly as possible. For example, both unscaled and scaled simulations were performed under a Jukes-Cantor model with diversity θ = 4 = 0.004 over 104 generations and then θ was estimated using the finite-sites correction of Watterson θ [60]. The accuracy was quite good, obtaining estimates of 0.0043 ± 0.00015 and 0.0037 ± 0.00016 for the unscaled and scaled cases respectively. Recombination was also tested by evolving datasets for 6N generations under a Jukes-Cantor 4-allele model with different values for the parameter ρ = 4NrL, where N is population size, r is recombination rate per site and L is the DNA sequence length (the corresponding parameter in GenomePop is 'Rec' = r × L). Namely, we ran cases with ρ equal to 0, 50 and 100. Recombination was then accurately estimated using the program Kpairwise [58]. GenomePop allows also studying 2-allele SNPs at different frequencies in different populations. In Figure 3 we define a 2-allele model (JC2) with different initial composition at each population (viral model) and 10 independent SNPs (recombination 'Rec' = 10 × 0.5 = 5). The populations have different sizes (100 and 120) and migration occurs under the island model. Note that when defining different population sizes, the original population size provided in the 'chromsize' line under the 'popsizeKmax' identifier is overwritten.

Figure 3
figure 3

Input file to generate 10 independent SNPs at different frequencies in different populations.

We ran this example over 200 generations and then analyze the output with the GenePop 4.0 program [61]. As expected the SNPs were detected as independent. We then changed the value of recombination to 0 ('Rec' = 0) and then GenePop 4.0 tell us that the 10 SNPs are linked, as expected. Note the many possibilities that the program provides in the context of studying SNPs under complex evolutionary situations. We can define any number of populations under any user-defined migration model. We can set any number of SNPs with the desired linkage relationships. The SNPs can be set at distinct initial frequencies in the different populations, for example, 'SNPfreqs' at 1.0 and 0.0 defines the first population with allele 1 fixed and the second with allele 2 fixed.

Impact of recombination on estimation of positive selection

We performed a simple experiment to test the impact of recombination on dN/dS estimation. We ran 50 replicates, with and without population recombination per gene, 4Nr = 40 and 0, respectively. The runs were performed under a MG94 × JC model both with dN/dS = 1 and dN/dS = 2.5 evolving 333 codons for 10N generations with an effective population size of N = 103 to get samples of 20 sequences. The dN/dS ratio was estimated with the FEL (Fixed effects Likelihood) model of Hyphy [62] which computes global and site by site dN/dS ratio. A p value of 0.1 was used to infer sites under positive selection. As can be seen in Table 2 a dN/dS of 2.5 provokes the detection of some sites under positive selection (1 or 2, not shown) in only 30% of the replicates (NSS = 0.3 in Table 2). Furthermore in the strictly neutral case (dN/dS = 1), one positive selected site was assigned in 10% of the replicates as expected given the p value used. If we correct by this 10% of false positive tests then positive selected sites were detected only in 20% of the replicates under a dN/dS value of 2.5 and no recombination. This is in agreement with the conservative nature of the FEL method [62]. Also noteworthy is that recombination had no impact on global dN/dS estimation but had important effects on the number of sites detected under positive selection as is evident upon inspecting Table 2. It seems also that the effect of intracodon recombination is negligible. Interestingly, it appears that the effect of recombination is somewhat higher under non-neutral dN/dS than in the neutral case. The impact of recombination on positive selection detection has already been studied [45, 46]. However, as far as we know, the comparison of the impact of recombination under neutral or positve dN/dS jointly with the effect of intracodon recombination has never been studied before. The significance of this effect should be studied with more replicates and cases, which is out of the scope of the present work.

Table 2 Impact of recombination on dN/dS estimation under a Jukes Cantor model.

Conclusion

GenomePop has interesting characteristics for simulating SNPs or DNA sequences under complex models of evolution and demography. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under GTR mutation or GTR × MG94 codon models with intra-codon recombination, simulation of any user-defined migration pattern, diploid or haploid models, constant or variable population sizes, fitness-based selection, etc. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments and has a scaling option to save computation time when simulating large sequences or population sizes under complex demographic and evolutionary situations. It has many other features that are detailed in the web page [1].

Availability and requirements

Project name: GenomePop v. 1.0

Project home page: http://webs.uvigo.es/acraaj/GenomePop.htm

Operating system(s): Windows and Linux (the source will be provided to compile for Mac)

Programming language: C++

License: GNU GPL.

References

  1. Carvajal-Rodríguez A: GenomePop: software to simulate the evolution of genomes and populations.[http://webs.uvigo.es/acraaj/GenomePop.htm]

  2. Liu Y, Nickle DC, Shriner D, Jensen MA, Gerald H, Learn J, Mittler JE, Mullins JI: Molecular clock-like evolution of human immunodeficiency virus type 1. Virology 2004, 329: 101–108. 10.1016/j.virol.2004.08.014

    Article  CAS  PubMed  Google Scholar 

  3. Liu Y, Mullins JI, Mittler JE: Waiting times for the appearance of cytotoxic T-lymphocyte escape mutants in chronic HIV-1 infection. Virology 2006, 347(1):140–146. 10.1016/j.virol.2005.11.036

    Article  CAS  PubMed  Google Scholar 

  4. Caballero A, Cusi E, Garcia C, Garcia-Dorado A: Accumulation of deleterious mutations: Additional Drosophila melanogaster estimates and a simulation of the effects of selection. Evolution 2002, 56(6):1150–1159.

    Article  CAS  PubMed  Google Scholar 

  5. Carvajal-Rodriguez A, Rolan-Alvarez E, Caballero A: Quantitative variation as a tool for detecting human-induced impacts on genetic diversity. Biological Conservation 2005, 124(1):1–13. 10.1016/j.biocon.2004.12.008

    Article  Google Scholar 

  6. Peng B, Amos CI, Kimmel M: Forward-Time Simulations of Human Populations with Complex Diseases. PLoS Genet 2007, 3(3):e47. 10.1371/journal.pgen.0030047

    Article  PubMed Central  PubMed  Google Scholar 

  7. Peng B, Kimmel M: Simulations provide support for the common disease-common variant hypothesis. Genetics 2007, 175(2):763–776. 10.1534/genetics.106.058164

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Keightley PD: Inference of genome-wide mutation rates and distributions of mutation effects for fitness traits: a simulation study. Genetics 1998, 150(3):1283–1293.

    PubMed Central  CAS  PubMed  Google Scholar 

  9. Wakeley J: Nonequilibrium migration in human history. Genetics 1999, 153(4):1863–1871.

    PubMed Central  CAS  PubMed  Google Scholar 

  10. Wakeley J: The coalescent in an island model of population subdivision with variation among demes. Theor Popul Biol 2001, 59(2):133–144. 10.1006/tpbi.2000.1495

    Article  CAS  PubMed  Google Scholar 

  11. Goldstein DB: Islands of linkage disequilibrium. Nat Genet 2001, 29: 109–111. 10.1038/ng1001-109

    Article  CAS  PubMed  Google Scholar 

  12. Nothnagel M, Rohde K: The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates. Am J Hum Genet 2005, 77(6):988–998. 10.1086/498175

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  13. International-HapMap-Consortium: The International HapMap Project. Nature 2003, 426(6968):789–796. 10.1038/nature02168

    Article  Google Scholar 

  14. International-HapMap-Consortium: A haplotype map of the human genome. Nature 2005, 437(7063):1299–1320. 10.1038/nature04226

    Article  Google Scholar 

  15. International-HapMap-Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449(7164):851–861. 10.1038/nature06258

    Article  Google Scholar 

  16. Jeffreys AJ, Holloway JK, Kauppi L, May CA, Neumann R, Slingsby MT, Webb AJ: Meiotic recombination hot spots and human DNA diversity. Philos Trans R Soc Lond B Biol Sci 2004, 359(1441):141–152. 10.1098/rstb.2003.1372

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Greenawalt DM, Cui X, Wu Y, Lin Y, Wang HY, Luo M, Tereshchenko IV, Hu G, Li JY, Chu Y, Azaro MA, Decoste CJ, Chimge NO, Gao R, Shen L, Shih WJ, Lange K, Li H: Strong correlation between meiotic crossovers and haplotype structure in a 2.5-Mb region on the long arm of chromosome 21. Genome Res 2006, 16(2):208–214. 10.1101/gr.4641706

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Liu N, Sawyer SL, Mukherjee N, Pakstis AJ, Kidd JR, Kidd KK, Brookes AJ, Zhao H: Haplotype block structures show significant variation among populations. Genet Epidemiol 2004, 27(4):385–400. 10.1002/gepi.20026

    Article  PubMed  Google Scholar 

  19. Nordborg M, Tavare S: Linkage disequilibrium: what history has to tell us. Trends Genet 2002, 18(2):83–90. 10.1016/S0168-9525(02)02557-X

    Article  CAS  PubMed  Google Scholar 

  20. Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet 2002, 3(5):380–390. 10.1038/nrg795

    Article  CAS  PubMed  Google Scholar 

  21. Stumpf MPH, McVean GAT: Estimating recombination rates from population-genetic data. Nature Reviews Genetics 2003, 4: 959–968. 10.1038/nrg1227

    Article  CAS  PubMed  Google Scholar 

  22. Hein J, Wiuf C, Schierup MH: Gene genealogies, variation and evolution : a primer in coalescent theory. Oxford , Oxford University Press; 2005:XIII, 276 s..

    Google Scholar 

  23. Kruglyak L: Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999, 22: 139–144. 10.1038/9642

    Article  CAS  PubMed  Google Scholar 

  24. Pritchard JK, Przeworski M: Linkage disequilibrium in humans: models and data. Am J Hum Genet 2001, 69(1):1–14. 10.1086/321275

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  25. McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P: The fine-scale structure of recombination rate variation in the human genome. Science 2004, 304(5670):581–584. 10.1126/science.1092500

    Article  CAS  PubMed  Google Scholar 

  26. Gu S, Pakstis AJ, Li H, Speed WC, Kidd JR, Kidd KK: Significant variation in haplotype block structure but conservation in tagSNP patterns among global populations. Eur J Hum Genet 2007, 15(3):302–312. 10.1038/sj.ejhg.5201751

    Article  CAS  PubMed  Google Scholar 

  27. Marais G, Charlesworth B: Genome evolution: recombination speeds up adaptive evolution. Curr Biol 2003, 13(2):R68–70. 10.1016/S0960-9822(02)01432-X

    Article  CAS  PubMed  Google Scholar 

  28. Bahlo M, Griffiths RC: Coalescence time for two genes from a subdivided population. J Math Biol 2001, 43(5):397–410. 10.1007/s002850100104

    Article  CAS  PubMed  Google Scholar 

  29. Bahlo M, Griffiths RC: Inference from gene trees in a subdivided population. Theor Popul Biol 2000, 57(2):79–95. 10.1006/tpbi.1999.1447

    Article  CAS  PubMed  Google Scholar 

  30. Beerli P, Felsenstein J: Maximum likelihood estimation of a migration matrix and efective population sizes in n subpopulations by using a coalescent approach. Proceedings of the National Academy of Sciences, USA 2001, 98(8):4563–4568. 10.1073/pnas.081068098

    Article  CAS  Google Scholar 

  31. Notohara M: The coalescent and the genealogical process in geographically structured population. J Math Biol 1990, 29: 59–75. 10.1007/BF00173909

    Article  CAS  PubMed  Google Scholar 

  32. Wilkinson-Herbots HM: Genealogy and subpopulation differentiation under various models of population structure. J Math Biol 1998, 37(6):535–585. 10.1007/s002850050140

    Article  Google Scholar 

  33. Griffiths RC, Tavare S: Sampling theory for neutral alleles in a varying environment. Philosophical Transactions of the Royal Society of London, Series B 1994, 344: 403–410. 10.1098/rstb.1994.0079

    Article  CAS  Google Scholar 

  34. Mohle M, Sagitov S: A classification of coalescent processes for haploid exchangeable population models. Annals of Probability 2001, 29(4):1547–1562. 10.1214/aop/1015345761

    Article  Google Scholar 

  35. Tajima F: The effect of change in population size on DNA polymorphism. Genetics 1989, 123: 597–601.

    PubMed Central  CAS  PubMed  Google Scholar 

  36. Hey J, Wakeley J: A coalescent estimator of the population recombination rate. Genetics 1997, 145: 833–846.

    PubMed Central  CAS  PubMed  Google Scholar 

  37. Hudson RR, Kaplan NL: The coalescent process in models with selection and recombination. Genetics 1988, 120: 831–840.

    PubMed Central  CAS  PubMed  Google Scholar 

  38. Kaplan NL, Darden T, Hudson RR: The coalescent process in models with selection. Genetics 1988, 120: 819–829.

    PubMed Central  CAS  PubMed  Google Scholar 

  39. Krone SM, Neuhauser C: Ancestral processes with selection. Theor Popul Biol 1997, 51(3):210–237. 10.1006/tpbi.1997.1299

    Article  PubMed  Google Scholar 

  40. Neuhauser C, Krone SM: The genealogy of samples in models with selection. Genetics 1997, 145: 519–534.

    PubMed Central  CAS  PubMed  Google Scholar 

  41. Donnelly P, Nordborg M, Joyce P: Likelihoods and simulation methods for a class of nonneutral population genetics models. Genetics 2001, 159(2):853–867.

    PubMed Central  CAS  PubMed  Google Scholar 

  42. Barton NH, Etheridge AM, Sturm AK: Coalescence in a random background. Annals of Applied Probability 2004, 14(2):754–785. 10.1214/105051604000000099

    Article  Google Scholar 

  43. Fearnhead P: Perfect simulation from nonneutral population genetic models: Variable population size and population subdivision. Genetics 2006, 174(3):1397–1406. 10.1534/genetics.106.060681

    Article  PubMed Central  PubMed  Google Scholar 

  44. Calafell F, Grigorenko EL, Chikanian AA, Kidd KK: Haplotype evolution and linkage disequilibrium: A simulation study. Hum Hered 2001, 51(1–2):85–96. 10.1159/000022963

    Article  CAS  PubMed  Google Scholar 

  45. Anisimova M, Nielsen R, Yang Z: Effect of Recombination on the Accuracy of the Likelihood Method for Detecting Positive Selection at Amino Acid Sites. Genetics 2003, 164(3):1229–1236.

    PubMed Central  CAS  PubMed  Google Scholar 

  46. Shriner D, Nickle DC, Jensen MA, Mullins JI: Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet Res 2003, 81: 115–121. 10.1017/S0016672303006128

    Article  CAS  PubMed  Google Scholar 

  47. Marjoram P, Wall JD: Fast "coalescent" simulation. BMC Genet 2006, 7: 16. 10.1186/1471-2156-7-16

    Article  PubMed Central  PubMed  Google Scholar 

  48. Liang L, Zollner S, Abecasis GR: GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics 2007, 23(12):1565–1567. 10.1093/bioinformatics/btm138

    Article  CAS  PubMed  Google Scholar 

  49. Wakeley J: The limits of theoretical population genetics. Genetics 2005, 169(1):1–7.

    PubMed Central  PubMed  Google Scholar 

  50. Balloux F: EASYPOP (version 1.7): a computer program for population genetics simulations. J Hered 2001, 92(3):301–302. 10.1093/jhered/92.3.301

    Article  CAS  PubMed  Google Scholar 

  51. Peng B, Kimmel M: simuPOP: a forward-time population genetics simulation environment. Bioinformatics 2005, 21(18):3686–3687. 10.1093/bioinformatics/bti584

    Article  CAS  PubMed  Google Scholar 

  52. Guillaume F, Rougemont J: Nemo: an evolutionary and population genetics programming framework. Bioinformatics 2006, 22(20):2556–2557. 10.1093/bioinformatics/btl415

    Article  CAS  PubMed  Google Scholar 

  53. Yang Z, Balding D, Bishop M, Cannings: Adaptive Molecular Evolution. In Handbook of Statistical Genetics. Wiley J. and Sons Ltd.; 2003.

    Google Scholar 

  54. Karlin S, Taylor HM: A second course in stochastic processes. New York , Academic Press; 1981:XVIII, 542 s..

    Google Scholar 

  55. Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). 4th edition. Sunderland, Massachusetts , Sinauer Associates; 2002.

    Google Scholar 

  56. Kosakovsky Pond SL, Frost SDW, Muse SV: HyPhy: hypothesis testing using phylogenies. Bioinformatics 2005, 21(5):676–679. 10.1093/bioinformatics/bti079

    Article  Google Scholar 

  57. Muse SV, Gaut BS: A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 1994, 11(5):715–724.

    CAS  PubMed  Google Scholar 

  58. Carvajal-Rodriguez A, Crandall KA, Posada D: Recombination Estimation under Complex Evolutionary Models with the Coalescent Composite Likelihood Method. Mol Biol Evol 2006, 23(4):817–827. 10.1093/molbev/msj102

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  59. Posada D, Crandall KA: Modeltest: testing the model of DNA substitution. Bioinformatics 1998, 14(9):817–818. 10.1093/bioinformatics/14.9.817

    Article  CAS  PubMed  Google Scholar 

  60. McVean GAT, Awadalla P, Fearnhead P: A coalescent based-method for detecting and estimating recombination from gene sequences. Genetics 2002, 160: 1231–1241.

    PubMed Central  CAS  PubMed  Google Scholar 

  61. Raymond M, Rousset F: GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. J Heredity 1995, 86: 248–249.

    Google Scholar 

  62. Kosakovsky Pond SL, Frost SD: Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol 2005, 22(5):1208–1222. 10.1093/molbev/msi105

    Article  PubMed  Google Scholar 

  63. Rodríguez F, Oliver JF, Marín A, Medina JR: The general stochastic model of nucleotide substitution. J Theor Biol 1990, 142: 485–501. 10.1016/S0022-5193(05)80104-3

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

I am grateful to A. Caballero, H. Quesada, S.T. Rodríguez-Ramilo and two anonymous reviewers for discussion and comments on the manuscript. I also want to thank Sergei L Kosakovsky Pond for his help with HYPHY. This work was supported by grant CPE03-004-C2 from Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) and from Dirección Xeral de Investigación e Desenvolvemento from Xunta de Galicia. AC-R is currently funded by an Isidro Parga Pondal research fellowship from Xunta de Galicia (Spain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Carvajal-Rodríguez.

Additional information

Authors' contributions

AC-R had the original idea for the work, designed and implemented the algorithms and wrote the manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Carvajal-Rodríguez, A. GENOMEPOP: A program to simulate genomes in populations. BMC Bioinformatics 9, 223 (2008). https://doi.org/10.1186/1471-2105-9-223

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-9-223

Keywords