Haplotype based testing for a better understanding of the selective architecture

Background The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies. Results Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits. Conclusions Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-023-05437-3.

S.1.1 Vovk's methods [3] proposed combination methods that promise type I error control under any dependencies given that the number of p-values is large.The methods are extremely conservative similar to the Bonferroni correction, which is a special case of the proposed methods, but can be useful if having very few false positives is the goal.The methods depend on a parameter value r, with guidelines suggesting that more dependent p-values should use larger values of r.The merging function dependent on the total number of p-values N and parameter r is defined as follows: with evidence of rejecting the null hypothesis present if: > > : (r + 1) ln N e Mr,N , r= 1 The Bonferroni correction would correspond to the case when r = 1.

S.2 Test statistic under di↵erent scenarios
In the paper we have shown the test statistic for haplotype based testing in a scenario where haplotype frequencies are known, and e↵ective population size does not change over time.This however, might not be realistic.Usually, haplotype and allele frequencies are estimated with either or both sampling and pool sequencing noises.When such is the case, variances from these noises need to be taken into account to avoid type I error violations.Here, we provide test statistics for a wider range of scenarios, where the frequencies are estimated with di↵erent sources of variance.

S.2.1
Test statistic when all haplotype frequencies are estimated with sampling variance Assuming a sample size of size n (r) t is used to estimate the haplotype frequencies at time t and replicate r, an additional variance component of frequency estimation is present when compared to the case with known frequencies.being consistent estimators of V ar(f (r) j,0 ) and V ar(f (r) j,T ) respectively: S.2.2 Test statistic when starting haplotype frequencies are known, others estimated with sampling variance Similar to the scenario in Section S.2.1, except the variance component for starting frequencies is nullified. with with ⇣ s(r) being consistent estimators of V ar(f (r) j,0 ) and V ar(f (r) j,T ) respectively.For SNP based test, this can be computed by: and for haplotype based test, these are the variances of the estimated regression coe cients F(r) P at time points 0 and T .Further derivations for the SNP based tests can be found in [4].

S.2.4 Drift variance when Ne changes over time
If Ne changes due to experimental design, and is measured at all sequenced time points, or due to testing for number of selected haplotypes, the variance of drift with changing Ne, drif t, Ne can be computed by: where Ne i is the Ne at timepoint ti.drif t, Ne can then be used in place of drift in the previous test statistics.

S.2.5
Testing for the number of selected haplotypes when frequencies are estimated with sampling variance Assuming the presence of R replicates and estimated frequencies, the m1 haplotype that provides the maximum change in frequency outcome is calculated as follows: Notice that due to the noise introduced by sequencing and sampling, estimated intermediate frequencies can be zero, while latter frequencies deviate from zero again.

S.3 Combining haplotypes using SNP based test
As introduced in Section 2.5, by removing certain SNPs, it could be possible that several haplotypes now share the same set of nucleotides, and their frequencies can thus be combined.Even though due to hitchhiking, smaller p-values do not necessarily indicate a selected target, selected targets will unlikely result in large p-values.Thus through the removal of SNPs that give p-values higher than some certain threshold , the combined haplotypes are likely to either be both targets or non-targets of selection, achieving the goal of haplotype number reduction while not diluting the original signal.Below we explain this idea in more detail.
Step 1: From allele frequency matrix A N SNP ⇥(T +1) , apply adapted chi-square test [4] as in haplotype based test.
Step 3: Create the new haplotype structure matrix H filt by removing all SNPs from H not present in the filtered SNP set.
Step 4: Combine all haplotype frequencies that now have the same nucleotide Step 5: Apply the haplotype based test on the new combined haplotype frequency matrix F filt .

S.4 Combining haplotypes using haplotype blocks
In this section we provide more details about the computation of haplotype blocks.Suppose in a simple setup where we have 2 SNPs and hence 4 haplotypes, with the allele frequencies of the two SNPs being u, v respectively, we write the haplotype frequencies as h1,1, h1,0, h0,1, h0,0 dependent on the nucleotide they hold.The coe cient of linkage disequilibrium, D, is then calculated by: This value is not informative as its size is relative to the allele frequencies, therefore usually a normalised version D 0 proposed by Lewontin (1964) [5] is used to measure LD: The variance of D 0 can be estimated as suggested in [6]: where n is the total number of haplotypes of the entire interested SNP region, and a = v, b = (1 v) if D 0 < 0 and vice versa.If we write the normalised coe cient of linkage disequilibrium between some SNP pairs i and j as D 0 i,j , by using this variance approximation and assuming D 0 i,j to be normally distributed, we can calculate its 90% confidence interval, with the upper and lower intervals being Ui,j and Li,j respectively.The definition of an haplotype block is then given by Gabriel et al. (2002) [7], where the pair of SNPs are said to be in strong linkage disequilibrium (LD) if Li,j 0.7 and Ui,j 0.98 and the pair shows strong evidence of historical recombination, or EHR if Ui,j < 0.9.A haplotype block from SNP i to SNP j can only exist if the SNP pairs i and j are in strong LD.The number of SNP pairs that are in strong LD within the block needs to be at least 19 times the number of SNP pairs that are in EHR.After finding all possible haplotype block within the given window, we rank the haplotype blocks according to SNP length, and select non-overlapping blocks as our block choice starting from the highest rank.We only use haplotype blocks that extend over more than 2 SNPs, as little information is gained from very short blocks.

S.5 Type I error control without replicates
In Figure S1 we show that there can be type I error control issues, when no replicates are present, and the number of founder haplotypes are very small.This issue however is solved if there are more founder haplotypes, or if any replicate population is present.2.            S.8 Further results on testing for the number of selected haplotypes Here we show a wider range of scenarios where we perform the post hoc test for the number of selected haplotypes.We show results when using omnibus and HMP as multiple testing corrections and consider scenarios with and without replicate in Figures S14, S15 and S16.The heat map is simulated using 5 founder haplotypes and a selection strength of s = 0.05, 1 replicate population, with all other parameters default as outlined in Table 2.In the left panel, the test is conditional on the presence of selection.In the right panel, the test is unconditional.

S.6 E↵ect on power and AUC when changing model parameters
S.9 Type I error of testing for the number of selected haplotypes under low selective strength Figure S17 shows a scenario where we observed violations in the conditional type I error for our proposed post hoc test.However, as shown below, the unconditional type I error remains under control.
Figure S17: Type I error plot when testing for one vs more than one selected haplotype.The type I errors were computed either conditional or unconditional on the rejection of the initial test for selection.The omnibus test is used for multiple testing correction.Data is simulated using 5 founder haplotypes and di↵erent selection coe cients s, with all other parameters as outlined in Table 2.

S.10 Further results on pairwise test
Figure S18: Results for pairwise tests when all haplotypes have di↵erent fitness.B&H is used for multiple testing correction.Data is simulated using 5 founder haplotypes, each having a fitness of (1,1.02,1.04,1.06,1.08), 1 or 3 replicates, with all other parameters default as outlined in Table 2.

S.11 Comparing chi-squared and CMH test under low selective strength
In Section S.10 we observed that the pairwise test with the adapted CMH test has lower power than with the adapted chi-squared test at very low fitness di↵erences, despite the increase in replicates.We show in Figure S19 that, this also apply more generally to non-pairwise test scenarios.Counter-intuitively, here we observe that at very low selective strength, increasing the number of replicates resulted in lower power when applying the adapted CMH test.This is caused by the CMH test being much more conservative than the adapted chi-squared test.In Figure  2. S20 we show an additional scenario where we apply the adapted chi-squared test separately on each of the three replicates, and combine their results using a Benjamini & Hochberg multiple testing correction.Here we see that this method provides a significant increase in terms of power when compared to the scenario with one replicate, but also a much higher type I error when compared to the CMH test.We also plotted ROC curves in Figure S21 and show that even at 0.02 selective strength, despite the low power and conservativeness, CMH test is still the best performing test overall for haplotype frequencies, in terms of AUC.

Figure S20:
Results when changing selection strength, number of replicate, using 2 testing methods.Power and type I error curves are plotted for haplotype based test using omnibus as p-value combination method under various s and R, with all other parameters default as outlined in Table 2.
Figure S21: Results when changing the number of replicate and testing methods.ROC curves of haplotype and SNP based test when changing the testing method and number of replicates.The selective strength is set to be 0.02, with other parameters default as outlined in Table 2.

S.12 Further results for combining haplotypes
In this section we provide additional results for the proposed methods to combine haplotypes when the total number of haplotypes is too large.Figure S22 shows the ROC curves for the original haplotype and SNP based tests for selection and the tests where the SNP and block based methods for combining the haplotypes are used.Figures S23 and S24 compare the methods when changing the total number of haplotypes with respect to type I error and AUC. Figure S25 explores a special scenario where only 1 haplotype is selected despite a large number of founder haplotypes.Figure S26 shows the e↵ect of changing inclusion threshold of the combination methods.2. Hapblock refers to the method of using haplotype blocks to reduce haplotype number outlined in Section 2.5.Here, the number of selected haplotypes is set to be half of the total haplotype number h Sel = N Hap /2, no replicate population, with all other parameters default as outlined in Table 2.  S.13 Scenarios when the haplotype starting frequencies are not equal In all other simulations, we assume that the haplotype starting frequencies are always equal.Since this might not be the case in reality, we want to explore the scenarios of unequal starting frequency.To illustrate this, we plot the di↵erences between AUC of ROCs for haplotype and SNP based tests.Additionally, all plots shown assume no replicate populations.From supplement Section S.6 we know that this would a more disadvantageous scenario for haplotype compared with the SNP based test.We first explore the scenario of unequal starting frequencies with 3 founder haplotypes.Figure S27 and S28 show the case where there are 1 or 2 selected haplotypes respectively.We see that the haplotype based test consistently outperforms the SNP based test under almost all scenarios.In the few parameter sets where SNP based test outperforms the haplotype based test the AUC di↵erence between the two tests was not significant.It is therefore likely that these disadvantageous scenarios are simply due to noise.2.
We next explore a scenario with 4 founder haplotypes.When one haplotype is selected (Figure S29), the results are similar to the scenario with 3 founder haplotypes (Figure S27) where the haplotype based test consistently performs better.We note in Figure S30 that when there are 4 founder haplotypes and 2 selected haplotypes however, there are some scenarios where SNP based test is performing slightly better (blue dots).Whilst with replicates, haplotype based test might perform better at these scenarios, we want to explore whether these slightly worse performance can simply be regarded as noise.In supplement Figure S31, we compute the 95% confidence interval of the AUC di↵erences, and choose to only plot the parameter sets that provide a significant AUC di↵erence.We note that all scenarios where the haplotype based test is performing worse can potentially just be noise.With 3 selected haplotypes shown in Figure S32, there is again little to no combination of starting frequencies where SNP based test is performing better, and similar to the other scenarios, the haplotype based test obtains the largest advantage whenever the combined frequency of the selected haplotypes is large.This biases the SNP based test, as the fixation or loss of an allele can happen any time between cycle 6 and cycle 12, which would be roughly 105 generations.As no information is available between the cycles, we will not be able to obtain an accurate result using these data with the considered methods.Moreover, the amount of average absolute frequency change between the first and second time point is much smaller than the frequency change between the second and third time point (Table S2).This might be explained by a shrinking population size, and thus much higher drift at latter cycles.Indeed, using the Ne estimates proposed by [9], we observe a much lower median Ne estimate between the latter cycles, which makes the frequency changes noisier.Table S3 shows that it becomes therefore harder to separate selection and drift.Table S4: Absolute number of windows each method is able to detect as selected for each chromosome.

Figure S1 :
Figure S1: Results when changing the number of total haplotypes with no replicate population.Power and Type I error curves are plotted for both haplotype and SNP based test under various N Hap , no replicate population, with all other parameters default as outlined in Table2.

Figure S2 :
Figure S2: Results when changing the number of selected haplotypes h Sel .Area under curve (AUC) of the ROC curves are provided for the haplotype and SNP based tests under various values of h Sel , with all other parameters default as outlined in Table2.

Figure S3 :
Figure S3: Results when changing the number of replicate populations.Power and Type I error curves are plotted for both haplotype and SNP based test under various values for the number of replicates R, with all other parameters default as outlined in Table2.

Figure S4 :
Figure S4: Results when changing selection strength.Power and Type I error curves are plotted for both haplotype and SNP based test under various values for the selection coe cient s, with all other parameters default as outlined in Table2.

Figure S5 :
Figure S5: Results when changing selection strength.AUC of the ROC curves are plotted for both haplotype and SNP based test under various values for the selection coe cient s, with all other parameters default as outlined in Table2.

Figure S6 :
Figure S6: Results when changing the total number of generations.Power and Type I error curves are plotted for both haplotype and SNP based test under various total number of generations, with all other parameters default as outlined in Table2.

Figure S7 :
Figure S7: Results when changing the total number of generations.AUC of the ROC curves are plotted for both haplotype and SNP based test under various total number of generations, with all other parameters default as outlined in Table2.

Figure S8 :
Figure S8: Results when changing e↵ective population size.Power and Type I error curves are plotted for both haplotype and SNP based test under various N e , with all other parameters default as outlined in Table2.

Figure S9 :
Figure S9: Results when changing e↵ective population size.AUC of the ROC curves are plotted for both haplotype and SNP based test under various N e , with all other parameters default as outlined in Table2.

Figure S10 :
Figure S10: Results when changing the number of SNPs.Power and Type I error curves are plotted for both haplotype and SNP based test under various N SNP , with all other parameters default as outlined in Table2.

Figure S11 :
Figure S11: Results when changing the number of SNPs.AUC of the ROC curves are plotted for both haplotype and SNP based test under various N SNP , with all other parameters default as outlined in Table2.

Figure S12 :
Figure S12: Results when changing the number of total haplotypes under 3 scenarios.Power and Type I error curves are plotted for both haplotype and SNP based test under various N Hap , with all other parameters default as outlined in Table 2 under 3 scenarios.Scenario 1: all haplotype frequencies are known.Scenario 2: starting haplotype frequencies are known and frequencies for later time points are estimated with a sample size of 80. Scenario 3: all haplotype frequencies are estimated with a sample size of 80.

Figure S13 :
Figure S13: Results when changing the coverage.Power and type I error curves are plotted for haplotype and SNP based test under various sampling size.All frequencies are estimated with a sample size of 500, with other parameters default as outlined in Table2.

Figure S14 :
Figure S14: Result showing the predictability of haplotype based iterative testing.Omnibus is used for multiple testing correction under all scenarios.The heat map is simulated using 5 founder haplotypes and a selection strength of s = 0.05, 1 replicate population, and all other parameters default as outlined in Table2.In the left panel, the test is conditional on the presence of selection.In the right panel, the test is unconditional.

Figure S15 :
Figure S15: Results showing the prediction accuracy of haplotype based iterative testing.The HMP method is used for multiple testing correction in all scenarios.The heat map has been obtained using 5 founder haplotypes and a selection strength of s = 0.05, and all other parameters default as outlined in Table2.In the left panel, the test is conditional on the presence of selection.In the right panel, the test is unconditional.

Figure S16 :
Figure S16: Result showing the predictability of haplotype based iterative testing.HMP is used for multiple testing correction under all scenarios.The heat map is simulated using 5 founder haplotypes and a selection strength of s = 0.05, 1 replicate population, with all other parameters default as outlined in Table2.In the left panel, the test is conditional on the presence of selection.In the right panel, the test is unconditional.

Figure S19 :
Figure S19: Results when changing selection strength and number of replicate populations.Power and type I error curves are plotted for haplotype based test using omnibus as p-value combination method under various s and R, with all other parameters default as outlined in Table2.

Figure S22 :
Figure S22: ROC curves for di↵erent testing method with 100 haplotypes.ROC curves of haplotype, HapSNP, hapblock and SNP based tests are shown for 100 total haplotypes, 50 selected haplotypes and no replicate population with all other parameters default as outlined in Table2.Hapblock refers to the method of using haplotype blocks to reduce haplotype number outlined in Section 2.5.

Figure S23 :
Figure S23: Type I error for di↵erent values for the number of haplotypes N Hap .Type I error curve is plotted for the original haplotype, HapSNP, and SNP based test under various choices of N Hap .HapSNP refers to the method of using the SNP based test to reduce the haplotype number prior to haplotype based testing outlined in Section 2.5.Here, the number of selected haplotypes is set to h Sel = N Hap /2.All other parameter values can be found in Table2.

Figure S24 :
Figure S24: AUC for di↵erent values for the number of haplotypes N Hap .AUC of the ROC curves are plotted for haplotype, HapSNP, Hapblock and SNP based test under various N Hap .Here, the number of selected haplotypes is set to be half of the total haplotype number h Sel = N Hap /2, no replicate population, with all other parameters default as outlined in Table2.

Figure S25 :
Figure S25: ROC curves when there are 30 haplotypes, one of them selected.Haplotype, HapSNP, Hapblock and SNP based tests are considered for 30 haplotypes, 1 selected haplotype and no replicate population, with all other parameters default as outlined in Table2.

Figure S26 :
Figure S26: AUC when changing the inclusion threshold of SNPs from the HapSNP procedure.AUC for the HapSNP test is plotted under various inclusion threshold, compared with the AUC with haplotype based test, under parameters identical to Figure S25.

Figure S27 :
Figure S27: Results when there is 1 selected haplotype.The AUC of haplotype minus SNP based test is plotted under various starting haplotype frequencies.The frequency on the y-axis refers to the selected haplotype.HMP multiple testing correction is used in the left panel and the omnibus multiple testing correction in the right panel.Simulations are performed with 3 founders and 1 selected haplotype.The other parameters are as outlined in Table2.

Figure S28 :
Figure S28: Results when there are 2 selected haplotypes.The AUC of haplotype minus SNP based test is plotted under various starting haplotype frequencies.The frequency on both the y and x-axis are always for the selected haplotypes.The Figure on the left panel is using HMP as multiple testing correction, while the Figure on the right panel is using omnibus.Simulations are performed with 3 founder and 2 selected haplotypes.The other parameters are as outlined in Table2.

Figure S29 :
Figure S29: Results when there is 1 selected haplotype.The AUC of haplotype minus SNP based test is plotted under various starting haplotype frequencies.The Figure on the left panel is using HMP as multiple testing correction, while the Figure on the right panel is using omnibus.Simulations are performed with 4 founders and one selected haplotype.Here only 1,000 simulation is performed on each parameter set.The other parameters are as outlined in Table2.

Figure S30 :
Figure S30: Results when there are 2 selected haplotypes.The AUC of haplotype minus SNP based test is plotted under various starting haplotype frequencies.The Figure on the left panel is using HMP as multiple testing correction, while the Figure on the right panel is using omnibus.Simulations are performed with 4 founder and 2 selected haplotypes.Here only 1,000 simulation is performed on each parameter set.The other parameters are as outlined in Table2.

Figure S31 :
Figure S31: Results when there are 2 selected haplotypes.The AUC of haplotype minus SNP based test is plotted under various starting haplotype frequencies.We only plot the parameter sets giving results outside the 95% confidence interval of the AUC di↵erences.The Figure on the left panel is using HMP as multiple testing correction, while the Figure on the right panel is using omnibus.Simulations are performed with 4 founder and 2 selected haplotypes.Here only 1,000 simulations is performed on each parameter set.The other parameters are as outlined in Table2.

Figure S32 :
Figure S32: Results when there are 3 selected haplotypes.The AUC of haplotype minus SNP based test is plotted under various starting haplotype frequencies.The Figure on the left panel is using HMP as multiple testing correction, while the Figure on the right panel is using omnibus.Simulations are performed with 4 founder and 3 selected haplotypes.Here only 1,000 simulation is performed on each parameter set.The other parameters are as outlined in Table2.
S.2.3Test statistic when all frequencies are estimated with both sampling and pool-sequencing variances Assuming a sample size of n at time t and replicate r, additional components of variance are needed to be taken into account.The new test statistic under this scenario is given as follows:

Table 2 .
[8]4 Justification of not using cycle 12 in Section 3.2 Wen analysing the real data from[8], we noticed an exceptionally high percentage of SNPs either fixed or lost at cycle 12 for population 4k, as shown in the table S1 below:

Table S1 :
Percentage of SNPs either fixed or lost at last time point

Table S2 :
Average absolute frequency change at each chromosome, between di↵erent cycles

Table S3 :
Median of estimated N e using allele frequencies between di↵erent cycles S.15 Number of windows where selection is detected in real data application