Critical assessment of coalescent simulators in modeling recombination hotspots in genomic sequences
 Tao Yang^{1},
 HongWen Deng^{1} and
 Tianhua Niu^{1}Email author
DOI: 10.1186/14712105153
© Yang et al.; licensee BioMed Central Ltd. 2014
Received: 9 September 2013
Accepted: 30 December 2013
Published: 3 January 2014
Abstract
Background
Coalescent simulation is pivotal for understanding population evolutionary models and demographic histories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data. A plethora of coalescent simulators are developed, but selecting the most appropriate program remains challenging.
Results
We extensively compared performances of five widely used coalescent simulators – Hudson’s ms, msHOT, MaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2) scalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular standard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended program msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene conversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of DNA sequences. Simcoal2, based on a discrete generationbygeneration approach, could simulate more complex demographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential Markov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient features of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over other programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap package, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the best performance based on both real and simulated data.
Conclusions
While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS and fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different recombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in detecting and validating crossover hotspots.
Keywords
Coalescent Population genetics Linkage disequilibrium Recombination Single nucleotide polymorphismBackground
Coalescent simulation is a very useful tool in population genetics with a rich variety of applications, particularly to evaluate and compare performances of various statistical methods in rare variant analysis [1–3], to estimate parameters for different population histories [4, 5] and to infer phylogenetic trees [6]. In simulating DNA sequence data for studying human complex diseases, recombination hotspots, defined as genomic intervals with local recombination rates increased relative to that of the surrounding DNA region, need to be taken into account given their ubiquity in the human genome [7]. Further, population geneticists are interested in examining haplotype block patterns delineated by recombination hotspots along chromosomes in finemapping locations of disease loci. Over the past decade, a plethora of coalescent simulators have been developed, e.g. SelSim [8], CoaSim [9], FastCoal [10], Mlcoalsim [11], and RECOAL [12], to name a few. Of them, five most representative and widely used coalescent simulators, Hudson’s ms [13], msHOT [14], Markovian Coalescent Simulator (MaCS) [15], Simcoal2 [16], and fastsimcoal [17].
Although a set of DNA sequences with a predetermined number of recombination hotspots at userdefined positions could be generated by msHOT, MaCS, Simcoal2, and fastsimcoal, there appears to be a lack of rigorous validation of hotspot detection software tools in the current literature. Popular computer programs for discovering recombination hotspots include LDMAP [24], LDhot [7, 25], LDhat 2.2 rhomap package [26], Hotspotter [27], and sequenceLDhot [28]. The widely used Haploview program [29, 30] is also an appealing tool to visually localize recombination hotspots given the delineated LD block structure. Based on popularity, LDhat 2.2 rhomap, sequenceLDhot, and Haploview programs were chosen for detecting recombination hotspots in both real data and simulated data generated by the coalescent process. Because sequenceLDhot gave the best performance in inferring locations of crossover hotspots, recombination hotspots simulated by msHOT, MaCS, and fastsimcoal were subject to validation by sequenceLDhot.
Methods
Coalescent process
Coalescent process was initially derived as an approximation of the neutral WrightFisher model. This approximation works well when sample sizes are small relative to the population size. Mutations are assumed to be Poisson distributed along each branch given the mutation rate and branch length. Normally, an infinitesites model [31] is assumed, which means no recurrent mutations occur. Each recombination event breaks the sequence into several segments, and each segment is modeled by a genealogy tree. Simulation of recombination hotspots is realized by changing the rates where these recombination events occur. The process that includes both mutation and recombination events is illustrated by ARGs (Figure 2). The SMC is an approximating algorithm for simulating a series of trees that differ from each other by a single recombination event, starting from the left end and moving to the right end of the DNA sequence.
WrightFisher model
where y_{ T } denotes the realized value of the random variable Y_{ T }.
Based on this model, the trajectory of the coalescent process (Figure 1) tracing from the current generation backwards in time to the generation where A is coalesced could be modeled. The current generation could then be simulated through this trajectory given a random seed.
ARG
Figure 2 illustrates a simplest ARG [32]. Take the third 4letter sequence “TCCT” as an example. The common ancestral sequence evolves into two branches. For each branch, mutations have taken place on alleles at loci 1 and 3, respectively, giving rise to “TCGT” and “ACCT”. Next, a recombination event arises between these two sequences on (2, 3), and “TCCT” is produced. In the standard coalescent, a full ARG delineating all past coalescent and recombination events is constructed, with simulated samples corresponding to the edges of the graph.
SMC approach and its two variants (SMC’ and MaCS)
Detection of recombination hotspots
The LDhat 2.2 rhomap package [26], sequenceLDhot [28], and Haploview program [29, 30] were applied respectively to detect recombination hotspots in the simulated sequences. LDhat is a popular software for estimation of recombination rates, which was developed based on Bayesian reversiblejump Markov Chain Monte Carlo (MCMC) algorithm, and rhomap is a new method incorporated into LDhat 2.2 that specializes in fitting crossover hotspot model. Another widely used computer program in detecting recombination hotspots— sequenceLDhot, uses an approximate marginal likelihood method of [33] to estimate a likelihood ratio (LR) statistic to unveil a crossover hotspot. Further, Haploview was implemented to obtain a visualization of the LD block structure that could reflect varying recombination rates along a contiguous DNA sequence.
Real data set
The 216kb human leukocyte antigen (HLA) class II region is a wellstudied region where recombination hotspots have been identified with sperm typing technology [34–36]. The original data set analyzed in the current study (http://www.le.ac.uk/ge/ajj/HLA/Genotype.html) contains genotype data for 50 unrelated UK Caucasians for 296 markers [i.e., 264 single nucleotide polymorphisms (SNPs) and 22 1–11bp insertion/deletion polymorphisms] [35]. A subset of 263 SNPs without missing data were selected for recombination hotspot detection. Multilocus haplotypes are required as an input for hotspot detection programs such as sequenceLDhot, and they provide crucial phase information that is important for understanding haplotype structure [37, 38]. Therefore, haplotypes across the 263 SNPs were reconstructed by PHASE v2.1 program, a haplotype inference method based on the (i) coalescent theory using a variant of canonical Gibbs sampling [33, 39], (ii) an LD decay model [40], and (iii) the partitionligation algorithm [41]. In total, 100 haplotypes for the 216kb region were statistically inferred.
Simulation data sets
In order to assess the running efficiency with or without recombination hotspots, we simulated a set of DNA sequences (i.e., haplotypes) according to 0, 2, and 5hotspot models using these five simulators respectively. All three hotspot models were simulated based on sequences with lengths 1 and 5Mb, respectively. For each hotspot model and each sequence length, we simulated sample sizes (i.e., number of sequences) of 100, 500, 1,000, and 10,000, respectively. We implemented a symmetric twoisland model with a total effective population size of 10,000. The recombination rate and mutation rate were both assumed to be 1.0 × 10^{8} per site. Fifty replicates were done and the running times were recorded. All simulations were run on the platform — Linux OS, 2.0GHz CPU, 1TBRAM.
To validate the recombination hotspot position and intensity accuracy, we simulated the data sets according to 2 and 5hotspot models along a 200kb long DNA sequence. The recombination hotspots’ intensities were set to be 100 times higher compared to the background recombination rates.
Results
Features comparison
Feature comparisons of five widely used coalescent simulators
Category  ms  msHOT  MaCS  Simcoal2  Fastsimcoal 

Hotspot  No  Yes  Yes  Yes  Yes 
Gene Conversion  Yes  Yes  Yes  No  No 
Ascertainment  No  No  Yes  Yes  Yes 
Algorithm  SC^{†}  SC  SMC’^{#}  GenByGen*  SMC’ 
Admixture  Yes  Yes  Yes  Yes  Yes 
Multiple event/Gen  No  No  No  Yes  No 
Migration  Yes  Yes  Yes  Yes  Yes 
Population structure  Symmetric  Symmetric  Symmetric  Arbitrary  Arbitrary 
Different data types  No  No  No  Yes  Yes 
Arbitrary pattern of recombination  No  Yes  Yes  Yes  Yes 
Computation speed  Moderate  Moderate  Fast  Slow  Fast 
Sampling simulation parameters  No  No  No  No  Yes 
Publication Year  2002  2007  2009  2004  2011 
# of Citations**  1,300  52  82  185  16 
Running efficiency
Average (50 replicates) execution time (standard deviation) of simulating 1Mb sequence data with a prespecified number of recombination hotspots (mm:ss)*
Number of hotspots  N  ms  msHOT  MaCS  Simcoal2^{*}  Fastsimcoal 

0  100  0:01 (2×10^{3})  0:02 (0:01)  0:01 (0:00)  1:51 (0:03)  <0:01 (4×10^{4}) 
500  0:01 (2×10^{3})  0:04 (0:01)  0:06 (0:01)  2:19 (0:04)  0:03 (3×10^{3})  
1,000  0:02 (5×10^{3})  0:06 (0:01)  0:23 (0:02)  2:34 (0:08)  0:08 (0:01)  
10,000      5:42 (0:13)  3:51 (0:07)  2:18 (0:03)  
2  100    1:28 (0:02)  0:03 (0:01)  1:51 (0:06)  <0:01 (3×10^{4}) 
500    1:43 (0:05)  0:10 (0:01)  2:45 (0:08)  0:03 (0:01)  
1,000    1:51 (0:05)  0:25 (0:03)  2:50 (0:09)  0:08 (0:01)  
10,000    2:39 (0:07)  5:43 (0:10)  4:27 (0:11)  2:22 (0:03)  
5  100    2:48 (0:05)  0:02 (0:01)  2:30 (0:06)  <0:01 (9×10^{4}) 
500    3:14 (0:07)  0:10 (0:02)  2:41 (0:05)  0:03 (5×10^{3})  
1,000    3:33 (0:06)  0:23 (0:02)  3:24 (0:07)  0:09 (0:02)  
10,000    4:27 (0:11)  5:36 (0:15)  4:10 (0:09)  2:32 (0:05) 
Average (50 replicates) execution time (standard deviation) of simulating 5Mb sequence with recombination hotspots (hh:mm:ss)*
Number of hotspots  N  ms  msHOT  MacsCS  Simcoal2^{*}  Fastsimcoal 

0  100  1:48 (0:03)  1:49 (0:03)  0:17 (0:02)  1:08:23 (4:22)  0:03 (4×10^{3}) 
500      0:48 (0:03)  1:17:02 (4:41)  0:16 (0:02)  
1,000      1:39 (0:07)  1:20:57 (7:23)  0:40 (0:02)  
10,000      29:45 (1:22)  1:55:01 (8:21)  12:36 (0:14)  
2  100    1:36:18 (8:41)  0:17 (0:03)  1:16:16 (4:30)  0:03 (4×10^{3}) 
500    1:54:09 (6:20)  0:51 (0:08)  1:17:29 (6:23)  0:17 (0:03)  
1,000    1:59:32 (7:30)  1:32 (0:10)  1:25:50 (7:31)  0:40 (0:03)  
10,000    2:02:28 (10:52)  30:02 (3:04)  2:08:31 (7:24)  13:01 (0:10)  
5  100    3:08:45 (13:11)  0:19 (0:3)  1:09:38 (7:10)  0:04 (0:01) 
500    3:23:29 (16:08)  0:49 (0:07)  1:10:59 (10:03)  0:17 (0:02)  
1,000    3:31:39 (17:14)  1:34 (0:12)  1:24:10 (9:37)  0:41 (0:05)  
10,000    4:12:07 (20:06)  36:50 (5:47)  1:57:03 (14:20)  13:17 (0:12) 
Validation of recombination hotspots
Validation results by sequenceLDhot for 2 and 5hotspot models (20 replicates each) (genomic sequence length = 0.2Mb)
Number of hotspots  Simulator  # Detected peaks/Total # simulated peaks  Mean (standard deviation) of LR  Mean shifting (kb)^{*}  # Significant shiftings^{†}/# Detected peaks 

2  msHOT  39/40  45.83 (18.35)  26  2/39 
MaCS  38/40  28.73 (23.36)  86.65  4/38  
fastsimcoal  36/40  41.16 (20.75)  10.8  2/36  
5  msHOT  91/100  47.49 (28.87)  981.33  13/91 
MaCS  93/100  43.32 (25.48)  57  11/93  
fastsimcoal  94/100  48.78 (22.06)  35  14/94 
Discussion
Advances in nextgeneration sequencing technologies have resulted in a dramatic increase in generating whole genome sequence data. There is an urgent need to develop novel methods for analyzing such huge amounts of data. Along with it, computer simulation of genomewide data is also crucial. Coalescent model has been the most attractive model in population genetics, and is widely recognized as the cornerstone in statistical analysis of DNA sequences [44]. The quintessential feature of coalescent is to start with the current sample of DNA sequences and then trace backward in time to identify past events since their MRCA [44, 45]. The standard coalescent provides an accurate characterization of genealogies of haploid individuals of constant size, which can incorporate recombination [13, 45, 46]. However, the original standard coalescent does have several restrictive features based on the neutral theory that limits its application to realworld DNA sequences, and has been extended to handle selection [47–49], gene conversion [14, 50, 51], and migration [52–55]. As indicated by [53], the generating models based on coalescent theory should resemble real data as much as possible. Therefore, even the exact standard coalescent might not be the “best” model that could generate simulated data “most” similar to presentday DNA sequences. Nevertheless, the coalescent model based on the WrightFisher model is a theoretically convenient and reasonable approximation to realworld scenarios. By incorporating prior information based on coalescent theory, PHASE v2.1 [33, 39, 40] significantly improved phasing accuracies for both real and simulated data sets. Therefore, standard coalescent remains a widely applied tool in modeling realworld DNA sequences. However, standard coalescent implemented based on full ARGs incurs a high computational cost for a relatively long DNA sequence (e.g., > 5Mb), making it difficult to simulate DNA sequences at the genome scale for a large sample size (e.g., > 500). To overcome this obstacle, SMC, an approximation to standard coalescent, has been developed which scales linearly with the length of the DNA sequence being simulated from the left to the right, and has the remarkable advantage of being much faster and more extensible than standard coalescent algorithm [15]. Based on variants of SMC, both MaCS and fastsimcoal could generate LD patterns of DNA sequences very close to those generated under a classical ARG model but much more swiftly [17].
After our extensive simulations based on five widely used programs, for simulating up to a few hundred of samples (sequences) with sequence lengths spanning several Mbs, Hudson’s ms is a great choice for its flexibility in handling historical events and robust modeling. For simulating sequences up to tens of or a few hundred of Mbs or a large number of samples, ms is no longer adaptable. In our simulations, ms could not handle 10,000 samples for a 1Mb sequence or 500 samples for a 5Mb sequence. The basic algorithm of msHOT [14] is an extended version of ms (which generates ARGs for a sample of chromosomes based on coalescent theory) by adding both recombination hotspot and gene conversion hotspot models to the implementation of standard coalescent by ms. In simulating the simplest scenario of 0hotspot model assuming also no gene conversions, the implementation of msHOT appears to be the same as that of ms (because there is no necessity to invoke complex crossover and gene conversion hotspot models). However, when there is at least one recombination hotspot in the simulated DNA sequences, the implementation of msHOT algorithm must differ from that of ms to account for the presence of recombination hotspot(s) in coalescent simulation. As stated in [14], the modification of msHOT allows the user to insert any prespecified nonoverlapping crossover hotspots and nonoverlapping gene conversion hotspots into the genetic region by specifying the locations and intensities for each. Specifically, incorporating R recombination hotspots requires the user to specify a left endpoint (a_{ h }), right endpoint (b_{ h }), and intensity (I_{ h }) for each hotspot h, where h = 1, …, R. Inside a given hotspot h, the probability of a recombination occurring between two adjacent base pairs in a single transmission from parent to offspring is λ_{ h }r_{ bp }. Outside recombination hotspot(s), this probability is the recombination probability per base pair — r_{ bp }. That is the reason when for simulating scenarios for 2 and 5hotspot models, msHOT performance appears to be much slower (to account for the extra complexity introduced by the userdefined recombination hotspots) than for the 0hotspot model. In addition, in the absence of a recombination hotspot, just like ms, msHOT could not handle a sample size of 10,000 sequences for simulating 1Mb DNA sequence data (Table 2) or sample sizes of 500, 1,000, and 10,000 for simulating 5Mb DNA sequence data (Table 2). However, in the presence of at least one recombination hotspot, by taking a very different implementation compared to ms by including a more complex hotspot model, msHOT could handle such large sample sizes. By contrast, MaCS, which is based on a modified SMC algorithm, could achieve coalescent simulations for many more samples with much longer length, while accurately approximating the results simulated by standard coalescent (i.e., ms) and maintaining its flexibility. Theoretical interpretations for empirical observations of Table 3 are as follows. When simulating 2 and 5hotspot models for relatively long DNA sequences (i.e., 5Mb), msHOT (built on an extended algorithm based on ms) and Simcoal2 [built on a discrete generationbygeneration approach (rather than a continuous time approximation)] are understandably much slower than MaCS and fastsimcoal due to their critical algorithmic differences — MaCS has taken a faster modified SMC approach and fastsimcoal has also taken a computationally more efficient SMC’ approach. As indicated in Background section, the SMC method of [22], and the SMC’ method of [10] are both approximations to the standard coalescent algorithm, which have the advantage of being much faster. Specifically, MaCS [15] is a generalized SMC which is equivalent to SMC when the “history” parameter h = 1 bp, but becomes a closer approximation to ms than SMC when h increases (such that more information of adjacent genealogies are stored). Generally speaking, MaCS produces simulated data that are virtually identical to data simulated under the standard coalescent, but in much less time and using much less memory [15]. Similar to MaCS, fastsimcoal is based on a continuoustime SMC’ by applying ABC, which is much faster than msHOT or discrete generation coalescent approach of Simcoal2 that also gives excellent approximations to standard coalescent with a much quicker speed [17].
Based on crossover hotspot validation results (Table 4 and Figure 7), recombination hotspots simulated by MaCS did not appear to be as accurate as those of msHOT and fastsimcoal for a 2hotspot model. For a 5hotspot model, MaCS outperformed msHOT, but fastsimcoal was the most accurate simulator. When there is a demand for simulating DNA sequences under a variety of population genetic models [especially in the presence of crossover hotspot(s)], fastsimcoal appears to be the best choice. Different data types [DNA, SNP, simple tandem repeat (STR)] and sequences of different structures could be simulated by fastsimcoal, in addition to its advantages in terms of efficiency, accuracy, and capability of generating any userdefined patterns of recombination hotspots. From a practical standpoint, when a set of recombination hotspots need to be simulated, msHOT, MaCS, and fastsimcoal are all applicable, but msHOT performed much slower than MaCS and fastsimcoal, and had the lowest accuracy based on validation results of sequenceLDhot.
Conclusions
While Hudson’s ms remains an excellent choice for simulating relatively short DNA sequences (< several Mbs) under general scenarios, MaCS and fastsimcoal are much more scalable and flexible in simulating many different demographic histories and diverse DNA sequence structures (e.g., SNPs and STRs). Based on both running time and hotspot validation comparisons, fastsimcoal is shown to be the fastest and most reliable and consistent coalescent simulator, especially when the number of hotspots is large. MaCS is a runnerup with a lower speed and a slightly less accuracy. Based on our extensive simulation evaluation and comparison results, cautions should be taken in applications of these widely used coalescent simulators, such that sequence data simulated by a given software should be checked and validated — e.g., the positions and intensities of recombination hotspots, to guard against any discrepancy between the intended objective and the actual simulation results. Further, for detecting and validating recombination hotspots, among the three widely used computer programs, sequenceLDhot appears to be the best choice— fast, robust and accurate. In realworld DNA sequence data, a variety of factors such as GC content, local LD block structure, DNA elements that act as enhancers or inhibitors of recombination [56–58], could affect the intensities and locations of recombination hotspots along a given chromosome. For example, recombination hotspots correlate positively with GC content [59, 60]. Further, certain DNA motifs are enriched in crossover hotspots, among which CCTCCCT and CCCCACCCC are the most prominent [61]. Thereafter, in realworld scenarios, these factors should be taken into consideration in identifying genuine crossover hotspots.
Abbreviations
 ABC:

Approximate Bayesian Computation
 ARG:

Ancestral recombination graph
 LD:

Linkage disequilibrium
 LR:

Likelihood ratio
 HLA:

Human leukocyte antige
 MCMC:

Markov Chain Monte Carlo
 MRCA:

Most recent common ancestor
 SMC:

Sequential Markov coalescent
 SNP:

Single nucleotide polymorphism
 STR:

Simple tandem repeat.
Declarations
Acknowledgements
We are very grateful to the Editor and two anonymous reviewers for their constructive comments which have improved manuscript. Dr. Deng was supported in part by R01AR050496, R21AG027110, R01AG026564, R21AA015973, R01AR057049, and R03TW008221 from NIH and Specialized Center of Research Grant P50 AR055081 funded jointly by the NIAMS and the Office of Research on Women’s Health. Dr. Niu was supported in part by a startup fund of the Center for Bioinformatics and Genomics, Tulane University.
Authors’ Affiliations
References
 Brisbin A, Jenkins GD, Ellsworth KA, Wang L, Fridley BL: Localization of association signal from risk and protective variants in sequencing studies. Front Genet. 2012, 3: 173View ArticlePubMed CentralPubMedGoogle Scholar
 Kinnamon DD, Hershberger RE, Martin ER: Reconsidering association testing methods using singlevariant test statistics as alternatives to pooling tests for sequence data with rare variants. Plos One. 2012, 7 (2): e3023810.1371/journal.pone.0030238.View ArticlePubMed CentralPubMedGoogle Scholar
 Morris AP, Zeggini E: An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol. 2010, 34 (2): 188193. 10.1002/gepi.20450.View ArticlePubMed CentralPubMedGoogle Scholar
 Weiss G, von Haeseler A: Inference of population history using a likelihood approach. Genetics. 1998, 149 (3): 15391546.PubMed CentralPubMedGoogle Scholar
 Burgess R, Yang Z: Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol Biol Evol. 2008, 25 (9): 19791994. 10.1093/molbev/msn148.View ArticlePubMedGoogle Scholar
 Liu L, Yu LL, Kubatko L, Pearl DK, Edwards SV: Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol. 2009, 53 (1): 320328. 10.1016/j.ympev.2009.05.033.View ArticlePubMedGoogle Scholar
 McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P: The finescale structure of recombination rate variation in the human genome. Science. 2004, 304 (5670): 581584. 10.1126/science.1092500.View ArticlePubMedGoogle Scholar
 Spencer CCA, Coop G: SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics. 2004, 20 (18): 36733675. 10.1093/bioinformatics/bth417.View ArticlePubMedGoogle Scholar
 Mailund T, Schierup MH, Pedersen CNS, Mechlenborg PJM, Madsen JN, Schauser L: CoaSim: a flexible environment for simulating genetic data under coalescent models. BMC Bioinforma. 2005, 6: 25210.1186/147121056252.View ArticleGoogle Scholar
 Marjoram P, Wall JD: Fast “coalescent” simulation. BMC Genet. 2006, 7: 1610.1186/14712156716.View ArticlePubMed CentralPubMedGoogle Scholar
 RamosOnsins SE, MitchellOlds T: Mlcoalsim: multilocus coalescent simulations. Evol Bioinform. 2007, 3: 4144.Google Scholar
 Kang CJ, Marjoram P: Exact coalescent simulation of new haplotype data from existing reference haplotypes. Bioinformatics. 2012, 28 (6): 838844. 10.1093/bioinformatics/bts033.View ArticlePubMed CentralPubMedGoogle Scholar
 Hudson RR: Generating samples under a wrightfisher neutral model of genetic variation. Bioinformatics. 2002, 18 (2): 337338. 10.1093/bioinformatics/18.2.337.View ArticlePubMedGoogle Scholar
 Hellenthal G, Stephens M: MsHOT: modifying Hudson’s ms simulator to incorporate crossover and gene conversion hotspots. Bioinformatics. 2007, 23 (4): 520521. 10.1093/bioinformatics/btl622.View ArticlePubMedGoogle Scholar
 Chen GK, Marjoram P, Wall JD: Fast and flexible simulation of DNA sequence data. Genome Res. 2009, 19 (1): 136142.View ArticlePubMed CentralPubMedGoogle Scholar
 Laval G, Excoffier L: SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history. Bioinformatics. 2004, 15: 24852487.View ArticleGoogle Scholar
 Excoffier L, Foll M: fastsimcoal: a continuoustime coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics. 2011, 27 (9): 13321334. 10.1093/bioinformatics/btr124.View ArticlePubMedGoogle Scholar
 Kingman JFC: The coalescent. Stoch Process Appl. 1982, 13 (3): 14View ArticleGoogle Scholar
 Kingman JFC: On the genealogy of large populations. J Appl Probab. 1982, 19: 17View ArticleGoogle Scholar
 Griffiths RC, Marjoram P: Ancestral inference from samples of DNA sequences with recombination. J Comput Biol. 1996, 3 (4): 479502. 10.1089/cmb.1996.3.479.View ArticlePubMedGoogle Scholar
 Wiuf C, Hein J: Recombination as a point process along sequences. Theor Popul Biol. 1999, 55 (3): 248259. 10.1006/tpbi.1998.1403.View ArticlePubMedGoogle Scholar
 McVean GAT, Cardin NJ: Approximating the coalescent with recombination. Philos T Roy Soc B. 2005, 360 (1459): 13871393. 10.1098/rstb.2005.1673.View ArticleGoogle Scholar
 Eriksson A, Mahjani B, Mehlig B: Sequential Markov coalescent algorithms for population models with demographic structure. Theor Popul Biol. 2009, 76 (2): 8491. 10.1016/j.tpb.2009.05.002.View ArticlePubMedGoogle Scholar
 Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, Ke X, Morton NE: The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc Natl Acad Sci U S A. 2002, 99 (4): 22282233. 10.1073/pnas.042680999.View ArticlePubMed CentralPubMedGoogle Scholar
 Winckler W, Myers SR, Richter DJ, Onofrio RC, McDonald GJ, Bontrop RE, McVean GAT, Gabriel SB, Reich D, Donnelly P, et al: Comparison of finescale recombination rates in humans and chimpanzees. Science. 2005, 308 (5718): 107111. 10.1126/science.1105322.View ArticlePubMedGoogle Scholar
 Auton A, McVean G: Recombination rate estimation in the presence of hotspots. Genome Res. 2007, 17 (8): 12191227. 10.1101/gr.6386707.View ArticlePubMed CentralPubMedGoogle Scholar
 Li N, Stephens M: Modeling linkage disequilibrium and identifying recombination hotspots using singlenucleotide polymorphism data. Genetics. 2003, 165 (4): 22132233.PubMed CentralPubMedGoogle Scholar
 Fearnhead P: SequenceLDhot: detecting recombination hotspots. Bioinformatics. 2006, 22 (24): 30613066. 10.1093/bioinformatics/btl540.View ArticlePubMedGoogle Scholar
 Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21 (2): 263265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
 Barrett JC: Haploview: visualization and analysis of SNP genotype data. Cold Spring Harb Protoc. 2009, 2009 (10): pdb ip71View ArticlePubMedGoogle Scholar
 Kimura M: The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 1969, 61 (4): 893903.PubMed CentralPubMedGoogle Scholar
 Song YS, Hein J: Constructing minimal ancestral recombination graphs. J Comput Biol. 2005, 12 (2): 147169. 10.1089/cmb.2005.12.147.View ArticlePubMedGoogle Scholar
 Stephens M, Donnelly P: A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet. 2003, 73 (5): 11621169. 10.1086/379378.View ArticlePubMed CentralPubMedGoogle Scholar
 Jeffreys AJ, Ritchie A, Neumann R: High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot. Hum Mol Genet. 2000, 9 (5): 725733. 10.1093/hmg/9.5.725.View ArticlePubMedGoogle Scholar
 Jeffreys AJ, Kauppi L, Neumann R: Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet. 2001, 29 (2): 217222. 10.1038/ng1001217.View ArticlePubMedGoogle Scholar
 Jeffreys AJ, Neumann R: Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet. 2002, 31 (3): 267271. 10.1038/ng910.View ArticlePubMedGoogle Scholar
 Niu T: Algorithms for inferring haplotypes. Genet Epidemiol. 2004, 27 (4): 334347. 10.1002/gepi.20024.View ArticlePubMedGoogle Scholar
 Zhang Y, Niu T: Haplotype Structure. Handbook on Analyzing Human Genetic Data: Computational Approaches and Software. Edited by: Lin S, Zhao H. 2010, SpringerVerlag, 2580.Google Scholar
 Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001, 68 (4): 978989. 10.1086/319501.View ArticlePubMed CentralPubMedGoogle Scholar
 Stephens M, Scheet P: Accounting for decay of linkage disequilibrium in haplotype inference and missingdata imputation. Am J Hum Genet. 2005, 76 (3): 449462. 10.1086/428594.View ArticlePubMed CentralPubMedGoogle Scholar
 Niu T, Qin ZS, Xu X, Liu JS: Bayesian haplotype inference for multiple linked singlenucleotide polymorphisms. Am J Hum Genet. 2002, 70 (1): 157169. 10.1086/338446.View ArticlePubMed CentralPubMedGoogle Scholar
 Beaumont MA, Zhang WY, Balding DJ: Approximate Bayesian computation in population genetics. Genetics. 2002, 162 (4): 20252035.PubMed CentralPubMedGoogle Scholar
 Jeffreys AJ, Kauppi L, Neumann R: Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet. 2001, 29 (2): 217222. 10.1038/ng1001217.View ArticlePubMedGoogle Scholar
 Fu YX, Li WH: Coalescing into the 21st century: an overview and prospects of coalescent theory. Theor Popul Biol. 1999, 56 (1): 110. 10.1006/tpbi.1999.1421.View ArticlePubMedGoogle Scholar
 Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet. 2002, 3 (5): 380390. 10.1038/nrg795.View ArticlePubMedGoogle Scholar
 Hudson RR: Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983, 23 (2): 183201. 10.1016/00405809(83)900138.View ArticlePubMedGoogle Scholar
 Krone SM, Neuhauser C: Ancestral processes with selection. Theor Popul Biol. 1997, 51 (3): 210237. 10.1006/tpbi.1997.1299.View ArticlePubMedGoogle Scholar
 Neuhauser C, Krone SM: The genealogy of samples in models with selection. Genetics. 1997, 145 (2): 519534.PubMed CentralPubMedGoogle Scholar
 Pokalyuk C, Pfaffelhuber P: The ancestral selection graph under strong directional selection. Theor Popul Biol. 2013, 87: 2533.View ArticlePubMedGoogle Scholar
 Wiuf C, Hein J: The coalescent with gene conversion. Genetics. 2000, 155 (1): 451462.PubMed CentralPubMedGoogle Scholar
 Wiuf C: A coalescence approach to gene conversion. Theor Popul Biol. 2000, 57 (4): 357367. 10.1006/tpbi.2000.1462.View ArticlePubMedGoogle Scholar
 Beerli P, Felsenstein J: Maximumlikelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics. 1999, 152 (2): 763773.PubMed CentralPubMedGoogle Scholar
 Arenas M, Posada D: Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinforma. 2007, 8: 45810.1186/147121058458.View ArticleGoogle Scholar
 Notohara M: An application of the central limit theorem to coalescence times in the structured coalescent model with strong migration. J Math Biol. 2010, 61 (5): 695714. 10.1007/s002850090318z.View ArticlePubMedGoogle Scholar
 Steinrucken M, Paul JS, Song YS: A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor Popul Biol. 2013, 87: 5161.View ArticlePubMed CentralPubMedGoogle Scholar
 Zhang J, Li F, Li J, Zhang MQ, Zhang X: Evidence and characteristics of putative human alpha recombination hotspots. Hum Mol Genet. 2004, 13 (22): 28232828. 10.1093/hmg/ddh310.View ArticlePubMedGoogle Scholar
 Arnheim N, Calabrese P, TiemannBoege I: Mammalian meiotic recombination hot spots. Annu Rev Genet. 2007, 41: 369399. 10.1146/annurev.genet.41.110306.130301.View ArticlePubMedGoogle Scholar
 Rana NA, Ebenezer ND, Webster AR, Linares AR, Whitehouse DB, Povey S, Hardcastle AJ: Recombination hotspots and block structure of linkage disequilibrium in the human genome exemplified by detailed analysis of PGM1 on 1p31. Hum Mol Genet. 2004, 13 (24): 30893102. 10.1093/hmg/ddh337.View ArticlePubMedGoogle Scholar
 Fullerton SM, Bernardo Carvalho A, Clark AG: Local rates of recombination are positively correlated with GC content in the human genome. Mol Biol Evol. 2001, 18 (6): 11391142. 10.1093/oxfordjournals.molbev.a003886.View ArticlePubMedGoogle Scholar
 Clark AG, Wang X, Matise T: Contrasting methods of quantifying fine structure of human recombination. Annu Rev Genomics Hum Genet. 2010, 11: 4564. 10.1146/annurevgenom082908150031.View ArticlePubMed CentralPubMedGoogle Scholar
 Zheng J, Khil PP, CameriniOtero RD, Przytycka TM: Detecting sequence polymorphisms associated with meiotic recombination hotspots in the human genome. Genome Biol. 2010, 11 (10): R10310.1186/gb20101110r103.View ArticlePubMed CentralPubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.