- Methodology article
- Open Access
Haplotype allelic classes for detecting ongoing positive selection
© Hussin et al; licensee BioMed Central Ltd. 2010
- Received: 21 August 2009
- Accepted: 28 January 2010
- Published: 28 January 2010
Natural selection eliminates detrimental and favors advantageous phenotypes. This process leaves characteristic signatures in underlying genomic segments that can be recognized through deviations in allelic or haplotypic frequency spectra. To provide an identifiable signature of recent positive selection that can be detected by comparison with the background distribution, we introduced a new way of looking at genomic polymorphisms: haplotype allelic classes.
The model combines segregating sites and haplotypic information in order to reveal useful data characteristics. We developed a summary statistic, Svd, to compare the distribution of the haplotypes carrying the selected allele with the distribution of the remaining ones. Coalescence simulations are used to study the distributions under standard population models assuming neutrality, demographic scenarios and selection models. To test, in practice, haplotype allelic class performance and the derived statistic in capturing deviation from neutrality due to positive selection, we analyzed haplotypic variation in detail in the locus of lactase persistence in the three HapMap Phase II populations.
We showed that the Svd statistic is less sensitive than other tests to confounding factors such as demography or recombination. Our approach succeeds in identifying candidate loci, such as the lactase-persistence locus, as targets of strong positive selection and provides a new tool complementary to other tests to study natural selection in genomic data.
- Selective Sweep
- Detection Power
- Slide Window Approach
- Lactase Persistence
- Haplotype Phase
The role of positive selection in the evolution and local adaptation of modern humans has been extensively studied using DNA variation data [1–6]. The increasing availability of such data led to the development of new statistical methods to detect signatures of natural selection along DNA sequences. As these techniques use and analyze DNA diversity in different ways, the overlap between the reported candidate loci under selection is relatively low . Indeed, different summary statistics may capture different types of selection events. In addition, signatures may differ depending on the sequence context, time and strength of selection . In the context of human evolution, it is particularly interesting to look for recent selection events resulting from local adaptations. These should have left signatures of incomplete selective sweeps in the human genome, where the selected allele dominates but is not yet fixed in a population. Loci affected by such selective events are likely to be of functional importance and responsible for inter-individual differences in genetic susceptibility to disease and/or to therapeutic outcome. Most of the early techniques to detect selection from DNA variation analyze allelic frequency spectra of individual polymorphic sites [7–10]. Newer methods look at haplotypes, their frequencies and length to capture those with extended linkage disequilibrium (LD), suggestive of a rapid and recent rise in population frequency and thus plausibly due to selection [1–3]. Other tests, such as that of Fu  or Depaulis and Veuille  propose to integrate information on haplotypes and their underlying sites. However, these tests are inadequate in the presence of recombination.
In order to combine information on alleles of single nucleotide polymorphisms (SNPs) with that of the resulting haplotypes, we propose to plot haplotype allelic classes (HACs) that group haplotypes of the same mutational distance from a predefined reference haplotype . This distance, also called HAC, is calculated as the count of allelic differences between the reference and the individual haplotypes in the sample. The HAC distribution (i.e. the number of haplotypes belonging to each class) expected under neutrality can be evaluated by computer simulations. If one finds, in a sample, a significant deviation from the neutral HAC distribution, it may be concluded that the genetic variation observed in the sample is not neutral.
A critical point is the choice of the reference haplotype defining the classes. This haplotype does not have to exist in the sample and can be chosen to suit a particular application. If we aim to study patterns of genetic variation and haplotype diversity in a population sample, the ancestral haplotype would be an appropriate reference haplotype . The HAC of a given haplotype would thus correspond to the number of non-ancestral (derived) alleles it carries, ranging from zero to the total number of SNPs within the analyzed DNA sequences. Under an incomplete selective sweep model, haplotypes carrying a positively selected allele on its way to fixation are very likely to also carry a large proportion of major frequency alleles of the accompanying SNPs . It is, therefore, practical to define as a reference a haplotype carrying only major frequency alleles of its constituting SNPs. This major-allele-reference haplotype (MARH) is expected to be structurally close to haplotypes carrying a positively selected allele. Using the MARH, the HAC of a given haplotype corresponds to the number of minor alleles it carries. A selective sweep is expected to favor haplotypes similar to the MARH and narrow HAC distribution with respect to neutral distribution. Therefore, we propose that HAC-derived statistics should be helpful in identifying selection events using genetic diversity data.
In this paper, we present Svd, the first summary statistic based on HAC distribution intended to detect ongoing selective sweeps. The resulting test can be used on a specific DNA region or to scan larger sequences using a sliding window approach. It appears less sensitive than other tests to confounding factors such as changes in population size or recombination. We successfully tested our approach using the lactase persistence locus on human chromosome 2, known to be under recent positive selection in a range of human populations [14–17].
To evaluate the likelihood that a given SNP is affected by an ongoing selective sweep, we considered separately each of its two alleles. This SNP is referred to as the evaluated segregating site. We compared the HAC distribution of all haplotypes carrying the major allele of the evaluated site to the distribution of the remaining haplotypes carrying the minor allele. In order to compare these distributions, we considered their variances. For a neutrally evolving sequence, the spread of both distributions is expected to be a function of the frequency of the evaluated allele, the extent of the associated haplotypes and the recombination rate. When a sequence evolves under a positive selection, the selected allele rises in frequency. It drags behind all alleles of adjacent SNPs that are carried on the same haplotype, a process known as genetic hitchhiking . Hence, the HAC distribution of haplotypes carrying the selected allele (or a linked hitch-hiked allele) will be tight and characterized by low variance. At the same time, the other allele would be expected to occur on a number of haplotypes with a broader HAC distribution, i.e., greater variance.
is a consistent and asymptotically normal sample estimator for V(HAC).
Svd- a Statistic based on the HAC Variance Difference
where and are the variance estimators for the sub-samples R k and r k , respectively. Under neutrality, vd k is expected to be close to zero, when R k and r k contain a similar number of sequences, or negative, when R k contains significantly more sequences than rk.
When the selected allele reaches major frequency due to positive selection, the speed of this frequency rise leaves little time for the carrier haplotype to diversify by mutation or recombination. The HAC distribution for R k is then expected to be tight and close to 0, making particularly small. Hence, vd k is expected to be positive when computed for a selected SNP and/or its linked sites.
The vd k values should be normalized, in order to be independent of haplotype length, to the number of the contributing SNPs S. We can demonstrate (see Additional File 1) that the HAC variance is in O(S). We thus obtain a normalized difference of variance estimators by dividing vd k by S. Furthermore, because we only consider cases when selection drives new alleles to major frequencies, whereas high frequency ancestral alleles are of little interest, the normalized vd k values are weighted by the derived allele frequency of SNP k, fd, k, to obtain the following summary statistic:
Statistical Test of Neutrality using Svd
Svd can be used as a decision variable for a test that could statistically distinguish a site evolving under neutrality from one subjected to ongoing positive selection. Neutrality is rejected when Svd is superior to a critical value. For all subsequent analyses, the critical value c of the test is defined as Pr(Svd > c|neutrality) = p, with p = 0.05. The detection power represents the sensitivity of the test, i.e., the probability of having Svd > c when a selective sweep is in progress.
Test Validation Using Simulations
Svd power to detect selection in the context of various population scenarios.
Population model parameters
Window size (S)
× population size
2 x population size
Constant recombination rate
Weak recombination hotspot
Strong recombination hotspot
Coalescence simulations under selective neutrality were carried out using the ms program . In a standard scenario, population evolves for 4,000 generations without recombination. In a population bottleneck scenario, the same population evolves for 3,660 generations, experiences a 95% reduction in size during 80 generations and recovers for subsequent 260 generations (see Additional File 1). At demographic expansion, a population of N e = 500 grows to N e = 1,000 in the last 300 generations (see Additional File 1). Recombination was tested under the standard scenario with a population recombination rate ρ = Θ/2, kept constant along the sequence.
SelSim  was used to simulate sets of replicates under an ongoing selective sweep. In a default selection scenario, a population evolves under the standard scenario with the evaluated SNP brought to a frequency of f = 0.75 by the ongoing positive selection with a selection coefficient of s = 0.15. Small and large population selection scenarios were tested, where a population of N e = 500 and N e = 2000, respectively, evolved under the default selection scenario. Recombination was tested under the default selection scenario with a population recombination rate ρ = Θ/2 kept constant along the sequence and in the presence of hotspots. In the latter case, the background rate is again ρ b = Θ/2 with hotspots rate ρ HS corresponding to 10 ρ b (weak hotspot) and 100 ρ b (strong hotspot). Hotspots are located 2 Kb downstream of the evaluated site. In addition, samples for a range of values of f = 0.6, 0.7, 0.75, 0.8, 0.9 and s = 0.05, 0.15, 0.5 were also simulated.
Ascertainment Bias and Haplotype Phasing
In some ascertainment protocols, SNPs are reported only if they have some minimum frequency in the sample. Since sites with a minor allele frequency (MAF) below 0.05 are considered more likely to reflect sequencing errors and less useful in genome-wide mapping, they were typically excluded from genotyping chips. To approximate such situations, singletons and doubletons were removed from the simulated replicates (with n = 50, these SNPs have a MAF below 0.05). In addition, we recreated an ascertainment scheme involving the identification of SNPs in a smaller sequencing panel consisting of m chromosomes and genotyping them in a larger panel of size n. To evaluate the impact of the sequencing panel size, different values of m were considered: m = 4, 8, 12, 16, 20, 26, 32, 38, 44 and 50 (at m = 50, there is no ascertainment bias). The ascertainment procedures are applied to each replicate simulated under the default selection scenario. To recreate the effect of haplotype phasing, for each replicate of a simulated dataset, we randomly assigned n = 50 sequences to 25 individuals. We then resolved the resulting genotypes back to haplotypes using the fastphase program . The Svd statistic was then computed on haplotypes of length S = 50, 200, 400, 600 and 800, centered on the evaluated site. This procedure was applied to the set of replicates simulated under the default selection scenario.
To assess the detection power of Svd, iHS, D and H under different selection scenarios, we needed to determine critical values at p = 0.05 for each set of parameters. These critical values were obtained by computing the statistics on datasets simulated under the same scenarios, with identical ascertainment and haplotype reconstruction procedures and with identical parameters except for the selection coefficient, which was set to s = 0. The critical Svd value c was determined for each scenario so that the proportion of Svd values greater than or equal to c, at s = 0, was exactly 0.05.
Application to Data
Experimental data were from the HapMap project, Phase II Release 21a . The Japanese (JPT) and Chinese (CHB) samples were considered together as the East-Asian (ASI) sample of 89 unrelated individuals. The West European (CEU) sample and the Yoruba from Nigeria (YRI) sample contain 60 unrelated individuals each. The phased haplotype data were taken directly from the BioMart HapMap browser http://hapmart.hapmap.org/BioMart/martview, which no longer gives access to the Phase II Release 21a dataset. This dataset is currently available from the HapMap ftp site ftp://ftp.ncbi.nlm.nih.gov/hapmap/. The chimp allele, or the macaque allele when the chimp allele was unavailable, was used as a proxy of the ancestral allele of a human SNP, found through the UCSC table browser http://genome.ucsc.edu/cgi-bin/hgTables?command=start. When both the chimp and macaque orthologous alleles were unavailable in the UCSC database, such SNPs were discarded.
Scan and Candidate Approach
We used a sliding window approach with different window lengths to analyze the entire chromosome 2 in ASI, CEU and YRI. The number of SNPs analyzed was 221,956, 206,665 and 252,249, respectively. The window of fixed length S slides one SNP at a time. We assigned p-values to each SNP according to the empirical distribution of Svd values, computed for all SNPs of chromosome 2.
In addition, we analyzed the lactase persistence locus in CEU, where we considered 26 polymorphic sites (rs IDs are listed in Additional File 2, Table S1) from the MCM6 gene in the genomic region Chr2:136424478..136459810. To measure confidence in inference of selection in this genomic region, for each SNP we evaluated its associated p-value based on a simulated distribution of Svd values (see below).
Replicates Matching the MCM6 Locus
To assign p-values to the observed CEU data, we simulated a set of 1000 replicates, with 120 chromosomes, at the population mutation rate Θ = 223. The evaluated SNP in all replicates was under positive selection at s = 0.15, assuming current f = 0.78, which corresponds to the frequency of the MCM6 T variant (rs4988235) in CEU. To model SNP ascertainment, we used a rejection sampling, as described by Voight and collaborators , to modify the simulated frequency spectrum to correspond to the observed frequency spectrum of SNPs in chromosome 2. To match the MCM6 locus in CEU, haplotypes of 26 SNPs were chosen in such a way that the 8th SNP of each replicate is the one under positive selection. P-values were estimated by comparing the Svd values computed from experimental data to the Svd distribution obtained by simulation.
Distribution of Svd Values
Svd Power to Detect Ongoing Positive Selection
Under the default selection scenario, the detection power of Svd at p = 0.05 is 0.81 (Table 1). Its detection power at different false discovery rates (FDR) outperforms the three compared statistics at FDR > 0.05 (Additional File 2, Figure S1). On the other hand, Svd is less efficient than iHS at FDR < 0.05 and its performance becomes comparable to D at even lower FDRs. Overall, iHS appears to have the highest specificity, whereas Svd has the highest sensitivity with the detection power reaching 0.95 at FDR = 0.1.
Application to the Data
Results of Svd scan in ten chromosome 2 genes under positive selection according to previous reports
Most significant SNP
p value( S )
iHS p value
Most significant SNP
p value( S )
iHS p value
Most significant SNP
p value( S )
iHS p value
The neutral theory of molecular evolution  recognizes genetic drift as the main force shaping genetic variation. However, many recent studies suggest that substantial portions of the human genome have evolved under positive selection . Selected loci can cause changes in the frequency of genetically linked sites remarkably similar to fluctuations caused by genetic drift, as Gillespie's model of genetic draft suggests . This means that if there are many genes undergoing partial selective sweeps in the human genome, genetic variation might be shaped by selective forces acting on adaptive mutations and not mainly by genetic drift. To test whether genetic variation should be interpreted in the light of models of draft rather than drift, it seemed that a good strategy would be to develop a statistical test specific for detection of incomplete selective sweeps.
In this paper, we have presented a novel intuitive and computationally efficient statistical test based on Svd, a statistic specifically created to look for genomic signatures of strong incomplete selective sweeps. When developing this statistic, we found it useful to start by displaying genomic diversity data in histograms of haplotype allelic classes that capture information on haplotype diversity combined with that on the contributing SNPs. In this way, HACs provide an interesting framework to developing summary statistics as convenient substrates to develop new neutrality tests.
The Svd statistic is based on the allelic variability of SNPs and the resulting haplotypes and on the expected different apportionment of these between the selected allele and its complementary allele for the site under sweep. It is thus likely that it behaves differently when compared with other statistics such as iHS, D or H and tends to be less sensitive to demographic changes. While our simulation experiments were based on a restricted set of parameters, they illustrate the fact that the Svd test has good detection power and should perform well on a variety of population models. We demonstrated the potential of the Svd test, applicable to genomic data when using a sliding window approach, as shown by our analysis of the human chromosome 2 (Figures 4 and 5). To evaluate the statistical significance of the outcome of the test, we first used an empirical approach. We assigned p-values to concrete Svd values based on the empirical distribution of all Svd values obtained by scanning the whole chromosome 2 in the analyzed population sample. Subsequently, to validate a candidate locus, such as MCM6, we evaluated p-values of each of its SNPs by simulations taking into account any prior information we may have had on the locus itself and on the population in which the signal was found (recombination rates, allelic frequencies, demography, SNP-ascertainment protocol). A strong signal of ongoing positive selection in the lactase persistence locus is found only in the European-derived population. This result was expected. In Europe, cattle were domesticated 10,000 years ago and cultural habits associated with milk consumption may have been advantageous for individuals (nutritional benefit, improved calcium absorption ). Although the SNP with the strongest Svd signal, based on the p-value obtained by simulation, was already identified as associated with lactase persistence in European populations, our analysis demonstrates the great potential of the proposed method in detecting new candidate polymorphisms for association studies.
The majority of available genotyping datasets are biased in the choice of the genetic markers typed, because they were collected for use in linkage and association studies and the analysis of this data should focus on tests of overall diversity . Svd can thus be applied to such datasets because computing HAC distribution provides a summary of overall haplotype diversity. In addition, the removal of rare SNPs from simulated data increases detection power, which suggests that the Svd test may perform even better on data with common SNPs than on data with rare and common variants. This can be explained by greater informativeness of common SNPs. Removal of rare SNPs increases the effective window size, thus increasing the detection power (Table 1, Figure 3A). In the case where a site under selection is not among SNPs that are genotyped, selection would still be detected by an Svd test through the surrounding linked SNPs, although the detection power may be decreased (data not shown).
Inaccuracy in haplotype inference is known to hamper the detection of signature of positive selection in genetic data and strategies to accurately infer haplotypes (e.g. using trio data) must be applied prior to using selection detection methods . We observed, with simulation data, a loss of power of Svd selection test due to haplotype phasing, but the test remains conservative in the sense that phasing errors won't create false positive results. Using longer, and thus potentially more informative haplotypes can compensate this effect. Therefore, the use of large windows, in the range of hundreds of SNPs, could be recommended to increase the signal. If this works, it suggests that the selective sweep is relatively young or that its signature persists longer because of a relatively low local recombination rate. In other words, longer haplotypes appear to be more robust, but at the same time, are more sensitive to recombination and to the age of a genetic sweep. This explains why certain significant Svd signals may fade with the increasing window size. Different haplotype lengths are thus to be explored to scan the genome or a specific region of interest. Given the data and the recombination rates, we used a pre-treatment method to determine the "pseudo-optimal" haplotype length around each SNP to consider as a starting point and guide the practical analysis (see Additional File 1).
The idea behind the Svd statistic is very similar to the approach used to compute the iHS statistic . The advantageous alleles favored by positive selection are generally found within large shared haplotypes where the level of diversity is reduced. These haplotypes contrast with the more variable haplotypes, which do not carry alleles under selection. With iHS, one can look at the decay of identity of haplotypes that carry a specific allele. With Svd, rather than looking at haplotype homozygosity, we contrast haplotypes carrying one or the other allele of the evaluated site. For haplotypes of 50 SNPs, at FDR = 0.05, iHS and Svd have the same detection power when the selected allele frequency is over 0.5 (Figure 1). When the selected allele frequency is under 0.5, Svd is not expected to find the signal whereas iHS can detect low frequency sweeps.
Furthermore, iHS outperforms Svd when FDR < 0.05. On the other hand, Svd power increases with haplotype length. Even if the edges of the selected haplotype are broken by recombination, the portion of originally selected haplotype still remain within the analyzed pool, portioned among different sequences. Using simulated data where the selected site is surrounded by one or two hotspots of recombination, we showed that Svd had a better detection power to identify signals of selection (Additional File 2, Table S2), because long range haplotype tests require intact haplotypes to remain in the population. Yet, recombination hotspots are expected every 50 Kb . Svd can therefore be considered as a useful complement to long-range haplotype statistics in detecting signatures of recent positive selection.
Different steps in the analysis of selection signatures proposed in this study can be modified, depending on the data and specific questions. Here, our reference haplotype was composed of predominant alleles in the population, but other reference haplotypes can be considered . Other applications are also possible, such as the use of Svd to compare groups of haplotypes in case-control studies. Furthermore, because the HAC distribution is also sensitive to a complete selective sweep, an approach similar to the one proposed by Kimura and collaborators  to identify fixed loci under positive selection could be developed using HAC distribution instead of haplotype homozygosity.
We thank John Keebler for providing a script to compute the iHS statistic and Philip Awadalla, Nicolas Lartillot and Tomi Pastinen for helpful discussions. JH and PN were recipients of studentships from biT, the Canadian Institutes of Health Research sponsored program and from the Fonds Québecois de Recherche sur la Nature et les Technologies. This study is a part of GRID project supported by GenomeQuebec and GenomeCanada.
- Kim Y, Nielsen R: Linkage disequilibrium as a signature of selective sweeps. Genetics 2004, 167(3):1513–1524. 10.1534/genetics.103.025387View ArticlePubMedPubMed CentralGoogle Scholar
- Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol 2006, 4(3):e72. 10.1371/journal.pbio.0040072View ArticlePubMedPubMed CentralGoogle Scholar
- Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES: Detecting recent positive selection in the human genome from haplotype structure. Nature 2002, 419(6909):832–837. 10.1038/nature01140View ArticlePubMedGoogle Scholar
- Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES: Positive natural selection in the human lineage. Science 2006, 312(5780):1614–1620. 10.1126/science.1124309View ArticlePubMedGoogle Scholar
- Kelley JL, Swanson WJ: Positive selection in the human genome: from genome scans to biological significance. Annu Rev Genomics Hum Genet 2008, 9: 143–160. 10.1146/annurev.genom.9.081307.164411View ArticlePubMedGoogle Scholar
- Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG: Recent and ongoing selection in the human genome. Nat Rev Genet 2007, 8(11):857–868. 10.1038/nrg2187View ArticlePubMedPubMed CentralGoogle Scholar
- Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics 2000, 155(3):1405–1413.PubMedPubMed CentralGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989, 123(3):585–595.PubMedPubMed CentralGoogle Scholar
- Fu YX, Li WH: Statistical tests of neutrality of mutations. Genetics 1993, 133(3):693–709.PubMedPubMed CentralGoogle Scholar
- Zeng K, Fu YX, Shi S, Wu CI: Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 2006, 174(3):1431–1439. 10.1534/genetics.106.061432View ArticlePubMedPubMed CentralGoogle Scholar
- Fu YX: Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 1997, 147(2):915–925.PubMedPubMed CentralGoogle Scholar
- Depaulis F, Veuille M: Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol Biol Evol 1998, 15(12):1788–1790.View ArticlePubMedGoogle Scholar
- Labuda D, Labbe C, Langlois S, Lefebvre JF, Freytag V, Moreau C, Sawicki J, Beaulieu P, Pastinen T, Hudson TJ, Sinnett D: Patterns of variation in DNA segments upstream of transcription start sites. Hum Mutat 2007, 28(5):441–450. 10.1002/humu.20463View ArticlePubMedPubMed CentralGoogle Scholar
- Mace R, Jordan F, Holden C: Testing evolutionary hypotheses about human biological adaptation using cross-cultural comparison. Comp Biochem Physiol A Mol Integr Physiol 2003, 136(1):85–94. 10.1016/S1095-6433(03)00019-9View ArticlePubMedGoogle Scholar
- Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, Powell K, Mortensen HM, Hirbo JB, Osman M, Ibrahim M, Omar SA, Lema G, Nyambo TB, Ghori J, Bumpstead S, Pritchard JK, Wray GA, Deloukas P: Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 2007, 39(1):31–40. 10.1038/ng1946View ArticlePubMedPubMed CentralGoogle Scholar
- Enattah NS, Jensen TG, Nielsen M, Lewinski R, Kuokkanen M, Rasinpera H, El-Shanti H, Seo JK, Alifrangis M, Khalil IF, Natah A, Ali A, Natah S, Comas D, Mehdi SQ, Groop L, Vestergaard EM, Imtiaz F, Rashed MS, Meyer B, Troelsen J, Peltonen L: Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. Am J Hum Genet 2008, 82(1):57–72. 10.1016/j.ajhg.2007.09.012View ArticlePubMedPubMed CentralGoogle Scholar
- Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN: Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 2004, 74(6):1111–1120. 10.1086/421051View ArticlePubMedPubMed CentralGoogle Scholar
- Smith JM, Haigh J: The hitch-hiking effect of a favourable gene. Genet Res 1974, 23(1):23–35. 10.1017/S0016672300014634View ArticlePubMedGoogle Scholar
- Watterson GA: On the number of segregating sites in genetical models without recombination. Theor Popul Biol 1975, 7(2):256–276. 10.1016/0040-5809(75)90020-9View ArticlePubMedGoogle Scholar
- Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 2002, 18(2):337–338. 10.1093/bioinformatics/18.2.337View ArticlePubMedGoogle Scholar
- Spencer CC, Coop G: SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics 2004, 20(18):3673–3675. 10.1093/bioinformatics/bth417View ArticlePubMedGoogle Scholar
- Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 2006, 78(4):629–644. 10.1086/502802View ArticlePubMedPubMed CentralGoogle Scholar
- Consortium TIH: A haplotype map of the human genome. Nature 2005, 437(7063):1299–1320. 10.1038/nature04226View ArticleGoogle Scholar
- Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 2001, 68(4):978–989. 10.1086/319501View ArticlePubMedPubMed CentralGoogle Scholar
- Tang K, Thornton KR, Stoneking M: A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol 2007, 5(7):e171. 10.1371/journal.pbio.0050171View ArticlePubMedPubMed CentralGoogle Scholar
- Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R: Localizing recent adaptive evolution in the human genome. PLoS Genet 2007, 3(6):e90. 10.1371/journal.pgen.0030090View ArticlePubMedPubMed CentralGoogle Scholar
- Carlson CS, Thomas DJ, Eberle MA, Swanson JE, Livingston RJ, Rieder MJ, Nickerson DA: Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 2005, 15(11):1553–1565. 10.1101/gr.4326505View ArticlePubMedPubMed CentralGoogle Scholar
- Enattah NS, Sahi T, Savilahti E, Terwilliger JD, Peltonen L, Jarvela I: Identification of a variant associated with adult-type hypolactasia. Nat Genet 2002, 30(2):233–237. 10.1038/ng826View ArticlePubMedGoogle Scholar
- Kimura M: The neutral theory of molecular evolution. New York: Cambridge University Press; 1983.View ArticleGoogle Scholar
- Nielsen R: Molecular signatures of natural selection. Annu Rev Genet 2005, 39: 197–218. 10.1146/annurev.genet.39.073003.112420View ArticlePubMedGoogle Scholar
- Gillespie JH: Genetic drift in an infinite population. The pseudohitchhiking model. Genetics 2000, 155(2):909–919.PubMedPubMed CentralGoogle Scholar
- Koichiro Higasa YK, Kiyoko Kato, Norio Wake, Tomoko Tahira, Kenshi Hayashi: Evaluation of Haplotype Inference Using Definitive Haplotype Data Obtained from Complete Hydatidiform Moles, and Its Significance for the Analyses of Positively Selected Regions. PLoS Genetics 2009., 5(5):Google Scholar
- Myers S, Bottolo L, Freeman C, McVean G, Donnelly P: A fine-scale map of recombination rates and hotspots across the human genome. Science 2005, 310(5746):321–324. 10.1126/science.1117196View ArticlePubMedGoogle Scholar
- Kimura R, Fujimoto A, Tokunaga K, Ohashi J: A practical genome scan for population-specific strong selective sweeps that have reached fixation. PLoS ONE 2007, 2(3):e286. 10.1371/journal.pone.0000286View ArticlePubMedPubMed CentralGoogle Scholar
- Lefebvre JF, Labuda D: Fraction of informative recombinations: a heuristic approach to analyze recombination rates. Genetics 2008, 178(4):2069–2079. 10.1534/genetics.107.082255View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.