LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms
© Yang et al.; licensee BioMed Central Ltd. 2014
Received: 11 November 2012
Accepted: 27 January 2014
Published: 17 February 2014
As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs.
Recently, an algorithm called “LDsplit” has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified.
LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of recombination hotspots among individuals, opening a new avenue for motif finding. Tested on an established motif and simulated datasets, LDsplit shows promise to discover novel DNA motifs for meiotic recombination hotspots.
KeywordsMeiotic recombination hotspots Single nucleotide polymorphism (SNP) DNA sequence motif Genome instability Linkage disequilibrium (LD)
During meiosis, homologous recombination is initiated by SPO11-catalyzed machinery at DNA double strand breaks. In many species, recombination events are clustered in localized regions of a few kb long, called recombination hotspots. Understanding the regulatory mechanisms of recombination hotspots can shed light on birth defect diseases, molecular evolution, genome instability, etc. [1, 2]. Therefore, it is desirable to understand how the locations and intensities of recombination hotspots are regulated.
For the regulatory mechanisms of recombination hotspots, striking progress has been made recently, thanks to the high-throughput genomic technology and bioinformatics techniques. Myers et al. used the LDhat software to estimate the fine-scale genome-wide recombination hotspots from HapMap Phase II SNP data . From the hotspots, a list of motifs has been discovered, in which the two most prominent motifs are the 7-mer CCTCCCT and the 9-mer CCCCACCCC. Interestingly, when located inside THE1A/B repeats, the motifs have much stronger association with proximal hotspots than outside the repeats. The 7-mer motif was later extended to a degenerate 13-mer motif CCNCCNTNNCCNC . Moreover, the 13-mer motif contains the sequence pattern of 3-periodicity, indicating a zinc finger binding array. Then, three groups reported the discovery of PRDM9 protein as a trans-acting regulator of the locations and intensities of meiotic recombination hotspots in human and mouse [5–7]. Strikingly, PRDM9 protein binds to the aforementioned 13-mer motif. The discovery of PRDM9 protein and its binding motifs was a major breakthrough in the understanding of the regulation of meiotic recombination hotspots. However, it has been observed that PRDM9 is not indispensable for recombination hotspots; for example, in PRDM9 knockout mouse germ line, meiotic recombination hotspots are still observed [8, 9]. Overall, PRDM9 can explain about 18% of the variations in human recombination hotspots  and the 13-mer motif occurs in about 40% of human hotspots . Therefore, additional trans and cis-regulatory elements for meiotic recombination hotspots remain to be discovered. Some recent efforts have been made toward this goal [10–12]. Moreover, follow-up investigation of the functions of PRDM9, e.g. its detailed mechanism to mediate the location and intensity of meiotic recombination, is also under intense research [9, 13, 14].
The discovery of DNA sequence motifs that stimulate meiotic recombination is crucial to the uncovering of the regulatory mechanisms of recombination hotspots. Several approaches have been developed for this purpose. The first approach is based on yeast mutagenesis. After genetically mapping the locations of meiotic recombination hotspots, Steiner et al. carried out base-pair substitution screening on the genome of fission yeast to scan for DNA sequences responsible for the activities of hotspots . They identified 5 DNA motifs stimulating recombination hotspots, and showed evidence for the existence of more motifs. The second approach is the computational search for short DNA sequences that are significantly enriched in hotspots against cold spots [3, 4]. Its success in identifying the 13-mer binding motif of the PRDM9 protein shows the power of bioinformatics methods for the study of genetic recombination . However, this approach has several caveats. First, the statistical associations based on counting of co-localization of hotspot and motif may not correspond to biological causal relations. Second, due to the limited power of computational detection of hotspots based on LD patterns, and the difficulty of finding degenerate motifs genome-wide, false negative remains a serious issue. The discovery of the 13-mer motif was based on two exact shorter motifs (CCTCCCT and CCAC) with two bases in between, which required close manual inspections .
It is desirable and challenging to design Bioinformatics algorithms to automatically detect degenerate motifs for recombination hotspots in large-scale. To this end, additional information is needed to increase the statistical power of motif detection. One type of such information is the association between sequence polymorphisms with the variation in recombination rates of a hotspot. Evidence of such association has been demonstrated by sperm typing. For instance, individuals with different alleles at the FG11 SNP located within the DNA2 hotspot have 20-fold difference in recombination rate of the DNA2 hotspot . A similar case was reported for the NID1 hotspot . Myers et al. noted that the two SNPs associated with the DNA2 and NID1 hotspots discovered by sperm typing are located within the motifs (CCTCCCT and CCCCACCCC) after the motifs had been identified; however, information of sequence polymorphism did not contribute to the discovery of motifs. In extending the 7-mer motif to the degenerate 13-mer motif , Myers et al. did compare the original and disrupt forms of motifs due to mutations, however they only compared motif occurrences in different hotspots in the human reference genome, rather than polymorphisms of the same motif occurrence and variations of intensities at the same hotspot (e.g. DNA2 and NID1) among different individuals. Therefore, in the above computational approach, genetic variation information encoded in the DNA sequence polymorphisms has not been fully used for motif discovery.
Recently we have designed an algorithm called LDsplit to detect DNA sequence polymorphisms associated with individual meiotic recombination hotspots, and thereby discover cis-regulatory elements of these hotspots . Given a sample of haplotypes (e.g. about 200 consecutive SNPs from HapMap Phase 2 data) flanking a hotspot H, LDsplit first splits the haplotypes into two subsets according to alleles of a candidate SNP A (i.e. haplotypes in each subset have the same allele at that SNP). Then, LDsplit compares the population recombination rate of the hotspot H between the two subsets calculated by LDhat algorithm . The rationale is that, if the SNP A is located in a cis-regulatory element (say, a motif) of the hotspot H, then one allele of SNP A represents a mutation that disrupts the motif. As a result, the subset of haplotypes with that disruptive allele should have lower intensity of hotspot H than the other allele. This is the case for the sperm typing examples of DNA2 hotspot and NID1 hotspot. This method is based on the assumption that population genetics methods (e.g. LDhat) for estimation of recombination rate by inferring historical crossovers from LD patterns should be able to capture the extant difference of hotspot intensities between the two alleles, which has been supported by some evidence . Extensive tests on simulated and real data demonstrated the effectiveness of LDsplit to capture cis-regulatory genomic loci associated with recombination hotspots . However, the software implementation of LDsplit needs an accessible user interface to be shared with research community, and it needs further validation with real data.
In this paper we report a significant upgrading to the implementation of LDsplit algorithm, and also demonstrate that LDsplit can help discover DNA sequence motifs stimulating recombination by narrowing down to cis-regulatory genomic regions. First, we have implemented a user-friendly graphic interface (GUI) for the LDsplit algorithm. Compared with the previous implementation using Perl, this version is implemented in Java, which is more efficient and platform independent, running under Windows, Linux and Mac OS. It provides computational unsophisticated users from life sciences a convenient access to the software. More importantly, the integrative scientific visualization of genetic data (e.g. recombination rate profile, physical locations of SNPs, shape of recombination hotspots) shows informative details about the genomic context of hotspots, which greatly facilitate explorative analysis of genetic factors of recombination hotspots. Far more than being incremental, this upgrading significantly enhances the usefulness and impact of LDsplit algorithm for the biomedical research. Second, to demonstrate the effectiveness of LDsplit, we carry out a focused analysis of SNPs falling inside the well known 7-mer motif CCTCCCT, which is the core of the binding motif of PRDM9. Running LDsplit on HapMap (phase 2) SNPs flanking the 7-mer motif, we observe that the original form of the 7-mer motif tends to have higher LD-based recombination rates than the disrupt form (i.e. disrupted by the alternative allele of the SNP inside the 7-mer motif). This is evidence that, in spite of biased gene conversion, LDsplit is able to capture the allele-specific intensities of hotspots. Third, to validate LDsplit by demonstrating its ability to uncover the 7-mer motif, we picked haplotype windows of about 200 SNPs each flanking a hotspot containing an occurrence of the 7-mer motif CCTCCCT, and then ran LDsplit on each SNP window. From the output of LDsplit, SNPs with the most significant p-values are extended to short DNA sequences, which were fed into the motif-finding algorithm of MEME . We found that the top two motifs found by MEME from the top SNPs given by LDsplit match closely the 7-mer motif. In contrast, when running on DNA sequences flanking SNPs randomly chosen from the windows (i.e. without guide of LDsplit), none of the output motifs is similar to the 7-mer motif. This test on the well known 7-mer motif shows that LDsplit can narrow down to target cis-regulatory elements for meiotic recombination, which significantly increases the power of motif-finding algorithms like MEME . Moreover, we carried out simulation studies by inserting two artificially generated motifs near SNPs simulated to be associated with hotspots. LDsplit is able to guide MEME to identify both motifs as top hits. Therefore, LDsplit can be used to discover novel DNA sequence motifs, including hotspots of human and mouse that have not been covered by PRDM9 and many species whose cis-regulatory motifs for recombination hotspots are yet unknown.
The implementation of LDsplit consists of two stages. First, LDhat is called to calculate the recombination profiles for subpopulations split by SNP alleles and pseudo-population in permutation tests. Second, the p-values of hotspot-SNP associations are estimated and the recombination profiles of sub-populations are displayed. In the rest of this section, we will describe the two stages of LDsplit in detail. The source code and user manual of LDsplit are available for free download on its website (http://www.ntu.edu.sg/home/zhengjie/software/LDsplit.htm).
Stage 1: Calculating recombination profiles
We apply LDhat to calculate recombination profiles for a window consisting of sequences of SNPs (or haplotypes). There are three types of recombination profiles: (1) the profile of the whole input population of haplotypes; (2) profiles of sub-populations of haplotypes each corresponding to an allele of a candidate SNP (i.e. for each SNP, splitting the population into two sub-populations according to the two alleles of the SNP); (3) profiles of pseudo-populations from a random split of the input population. Meanwhile, the input SNP data in this stage consist of two text files, namely the sites file (consisting of haplotypes in FASTA format) and the locs file (consisting of physical locations of the SNPs on the chromosomes). Both types of files can be extracted from HapMap SNP database by cutting a window of haplotypes. LDsplit provides a friendly user interface to cut a window from HapMap data and generate the sites and locs files. In addition, the user can set the parameters involved in this stage (instructions provided in the user manual). As LDhat is computationally costly, this stage of LDsplit usually takes a long time on a regular personal computer (e.g. a few hours for 180 haplotypes each of 200 SNPs). Hence it is better to batch this computation. The output of this stage can be exported to hard disk as a Java serialization file, which can be loaded back into LDsplit for analysis in the next stage.
Stage 2: Deriving hotspot-SNP associations
The software of LDsplit was implemented in Java, with a user-friendly graphical interface (as shown in Figure 2). For detailed instructions of using LDsplit, please see its user manual included in the package of LDsplit.
To demonstrate that LDsplit can help narrow down to hotspot associated motifs, we simulated recombination hotspots and flanking SNPs using simuPOP (version 1.0.3), a forward-time population genetics simulation framework . In each test, a simuPoP based Python script was run to simulate the evolution of a population of around 5,000 individuals during many generations (for example, 3,000) using a forward-time model. Each individual had a pair of homologous haplotypes spanning 200 kb of DNA. Each haplotype was represented as a list of SNPs with alleles 0 and 1.
A simulated hotspot was located at the central point of haplotypes, and a causal SNP, whose alleles resulted in different probabilities of crossover at the recombination hotspot, was inserted at the position of 100 kb. In each generation, we randomly selected pairs of individuals as parents. Assuming one crossover event in a chromosome of 200 Mb per meiosis, in 200 kb the background probability of a crossover would be 0.001. When the hot allele of the causal SNP is present, the probability of a crossover would be increased to 0.01. In addition, the crossover position was chosen with the probability under normal distribution with mean at position 100 kb where the center of hotspot is located. At the end of simulation, a simulated population was exported, from which we randomly collected 10 subsets, each consisting of 90 individuals (180 haplotypes) as benchmark SNP data. The SNPs were extended with randomly generated DNA sequences, and artificial motifs were inserted spanning the causal SNPs. The SNP and DNA sequence data would then be fed into LDsplit and MEME to test if they are able to identify the motifs.
Results and discussion
As a case study to demonstrate the usefulness of LDsplit software for the discovery of cis-regulatory motifs of meiotic recombination hotspots, we will analyze the 7-mer motif of CCTCCCT, which has been established as the core binding motif of PRDM9. First, running LDsplit, we confirm that, when the 7-mer motif containing a SNP is disrupted by one allele of the SNP, its proximal hotspot would have lower intensity estimated from LD patterns. Second, we show that LDsplit can guide the discovery of the 7-mer motif by narrowing down to genomic regions proximal to motifs. Third, we also discuss the effects of biased gene conversion implicated in this study. Then, LDsplit is further validated through simulation studies. At the end, we outline directions for future work.
Disrupting effect of SNPs inside motifs
As introduced in the background section of this paper, the DNA binding of PRDM9 protein regulates the locations and intensities of many recombination hotspots of human and mouse. Particularly, the 7-mer DNA sequence motif CCTCCCT is at the core of the 13-mer binding motif of PRDM9 in the human genome . Thus, this 7-mer motif has the function of stimulating human recombination hotspots, which will be confirmed by LDsplit in this section.
First, we search in the human genome for all occurrences of the 7-mer motif that each contains a SNP (using data from HapMap Phase 2 release 22 and human genome Build 36.1, hg18). We have not observed any case that one occurrence of the 7-mer motif contains more than one SNPs. In our previous study  we focused on the chromosome 6 of Asian (Chinese and Japanese) population from HapMap phase 2, and obtained successful results. Instead of repeating our past results, we chose SNPs in the European population instead and excluded chromosome 6 from the HapMap Phase 2 data. In this way, we collected 228 occurrences of the 7-mer motif each containing a SNP. Second, from the 228 SNPs we selected those SNPs with MAF (minor allele frequency) at least 30% in the European population, so that LDsplit will not give biased prediction due to small numbers of haplotypes of the minor allele. After this filtering, 70 SNPs inside the 7-mer motif were left, which are proximal to 15 recombination hotspots Additional file 2. Details of the 70 SNPs and their flanking motif occurrences can be find in Additional file 2. Then, we extended each of the 70 selected SNPs to a flanking window of 101 SNPs (with 50 SNPs on each side), and ran LDsplit on haplotypes of the 101-SNP window. From the results of LDsplit on this window, we identified the hotspot H that is located nearest to the candidate SNP A (at the center of the window and falling inside the motif). In the following analysis, we will compare the recombination rates of hotspot H between two alleles of the SNP A, where one allele preserves the original form of the 7-mer motif, and the other allele disrupts the motif.
Motif finding guided by LDsplit
In the previous section, we have shown that LDsplit confirms the 7-mer motif as a cis-regulator element for meiotic recombination hotspots. However, our goal is to use LDsplit to select SNPs whose positions can guide the discovery of DNA sequence motifs. Figure 3 shows that some SNPs inside the 7-mer motif may not have significant LDsplit p-values, whereas some SNPs outside the motif can have significant p-values. In this section, we will demonstrate that, despite false positives and false negatives, LDsplit is able to provide signals that are sufficient for the detection of the 7-mer motif, by narrowing down to the target genomic regions.
From each of the aforementioned 70 SNPs inside the 7-mer motif, we extended to a window of 101 SNPs (with 50 SNPs on each side). Then we ran LDsplit on the haplotypes of the SNP window to get p-values of SNPs in the window. From the 70 windows, totally 332 SNPs were collected, each of which had a significant LDsplit p-value (p < 0.05). Given reference DNA sequences (human genome assembly Build 36.1, hg18, March 2006, UCSC genome browser) and SNP physical locations in the reference genome, each of the 332 candidate SNPs was extended to a window of 101 DNA bases (50 bases on each side), by cutting the reference sequences flanking the candidate SNPs. Then, we ran the MEME program  (version 4.8.1, release date Feb. 7, 2012) on the 332 DNA sequences, setting appropriate parameters for MEME e.g. ‘one motif occurrence per sequence’ (OOPS), minimum sites 10 and motif length between 10 to 20, etc. (see Section C “Motifs flanking randomly selected SNPs found by MEME” in Additional file 1: Supplementary materials). For comparison, we also randomly selected SNPs (i.e. regardless whether their LDsplit p-values are significant), and extended them into 101-base DNA sequences as input for MEME, which was run with the same parameters. To get stable results of comparison, we carried out such random tests for three groups Additional file 3. Additional file 3 contains DNA sequences flanking the SNPs found by LDsplit to be significantly associated with hotspots, and DNA sequences flanking randomly selected SNPs.
In our method, the input to MEME consists of short DNA sequences (101 bases long) flanking SNPs of significant association with hotspots, as detected by LDsplit. By contrast, most existing methods for finding motifs of recombination hotspots use DNA sequences under whole hotspots (2 kb – 10 kb long) as input [3, 4]. In the following, we compare the two types of methods and show that, by focusing on short DNA sequences around SNPs, LDsplit can improve accuracy of motif finding. We first selected two groups of hotspots from the same genome-wide list of hotspots as previously used (i.e. estimated by LDhat from HapMap phase 2 data, European population). The first group consists of 15 hotspots that are the same as we used LDsplit and MEME to detect CCTCCCT, the core 7-mer motif of PRDM9. The other group of 15 hotspots was selected randomly, which may or may not contain the 7-mer motif. The locations and lengths of the hotspots are listed in Additional file 1: Tables S1 and S2 in Supplementary materials. Then, for each of the two groups, the DNA sequences under the hotspots were cut and fed into MEME, using the same parameters as we did for LDsplit based motif finding (Figure 5). For each of the two groups, the top 10 motifs ranked by significance (E-values) were recorded (see Additional file 1: Figures S5 and S6 in Supplementary materials). On the list of motifs from the first group of hotspots (Additional file 1: Figure S5), the second motif contains the 6-mer “CCTCCC” as sub-motif, but none of the other motifs is similar with the 7-mer motif. Likewise, from the second group of hotspots, only the third motif and fifth motif contain “CCTCCC” as sub-motif (Additional file 1: Figure S6). Therefore, for the 7-mer core motif of PRDM9, the accuracy of motif finding using the whole sequences under hotspots is not as good as the motif finding guided by SNPs selected by LDsplit. One reason could be that DNA sequences of whole hotspots are much longer than those around SNPs, and hence contain various enriched patterns besides the true motifs, which tend to weaken the true signals. Nevertheless, the other enriched motifs might also have regulatory functions for recombination hotspots, and need further study.
In the above, we have demonstrated the usefulness of LDsplit for the detection of recombination associated motifs, using the 7-mer core motif of PRDM9 as an example. Several recent papers suggest that PRDM9 may be responsible for more recombination hotspots in mammalian genomes than previously appreciated [13, 9, 22]. However, finding and analysis of DNA motifs responsible for meiotic recombination hotspots are still challenging in many situations. Even for species in which PRDM9 is known to determine recombination hotspots, the specific DNA motifs may not be clear, due to various reasons such as rapid evolution and extensive diversity of PRDM9 zinc finger arrays, variation of genetic and epigenetic contexts flanking the binding sites of PRDM9. In , for example, although a consensus motif was found to be present in at least 73% of mouse hotspots, the consensus motif is not the same as the putative PRDM9 binding site generated by ZIFIBI database  from zinc fingers in the PRDM9 protein. Moreover, for human and mouse, some transcription factor binding sites in addition to the PRDM9 motif may also regulate recombination hotspots [24, 10]. For other species, PRDM9 is either known to be not functional for hotspots (e.g. dogs ) or a protein with similar functions as PRDM9 has not been found. Sequence based detection of DNA motifs would still be important for these species. Therefore, for species with or without PRDM9 determining initiation of recombination hotspots, LDsplit will be useful for understanding DNA motifs associated with meiotic recombination. In addition to guiding motif finding, LDsplit can also facilitate detailed visualization and analysis of individual hotspots with specific functions, e.g. immunology related hotspots in the major histocompatibility complex (MHC) region.
Finding simulated motifs
Parameter setting on simulated data
Causal SNP position (kb)
Window length (kb)
Hotspot center (kb)
Hotspot width (kb)
Beginning hot allele frequency (%)
Ending hot allele frequency (%)
# sample subset
Sample subset size
Effect of biased gene conversion
While the idea of LDsplit appears simple, there are subtle caveats because a meiotic recombination hotspot tends to kill itself through biased gene conversion (BGC) [26, 27]. Hellenthal et al. suggested that, due to BGC, it is likely that an extant hot allele may experience no more crossovers than the cold allele in the history of a population, and therefore it is difficult to detect the allele-specific effect of DNA sequence polymorphisms on the intensity of a recombination hotspot from LD patterns. This conclusion was drawn from a mathematical model to estimate the probability that a chromosome with the hot (and cold) allele of a SNP experienced a crossover in the previous generation, taking into account of BGC. However, their mathematical model also predicted that, when the probability of crossover of a hot allele is say 20 times higher than the cold allele, the number of crossover events in the sub-population of the hot allele is about 3.4 times that of the cold allele in the transmission from the previous generation. This is the case for the DNA2 hotspot and FG11 SNP, as reported by sperm typing . Hence, the mathematical model of Hellenthal et al. offers an explanation for the success of LDsplit in predicting the association between the DNA2 hotspot and FG11 SNP . Although the general conclusion in  highlighted the difficulty of association studies for recombination hotspots using LD patterns, it does not rule out the effectiveness of LD-based methods like LDsplit, especially for those SNPs with a big difference in hotspot penetrance between two alleles. Indeed, LDsplit has been verified on both real and simulated data, with promising performances. In addition to correctly associate the DNA2 hotspot with the FG11 SNP, it identified an 11-mer motif whose complement closely matches the 13-mer binding motif of PRDM9 .
In this paper, we confirmed once gain the prediction of the mathematical model of Hellenthal et al.. As Figure 3 shows, for 22 out of 70 SNPs inside the 7-mer motif CCTCCCT, the disrupting allele, which is supposed to be cold, has higher intensities in proximal hotspots than the “hot” allele that preserves the original motif form. This is probably due to the effect of BGC. Meanwhile, the fact that in 48 out of 70 cases the hot allele still has higher LD-based recombination rate (Figure 3) suggests that the 7-mer motif has a strong stimulating penetrance on proximal recombination hotspots. Whereas we did not give any theoretical explanation for the effectiveness of LDsplit in spite of BGC in our previous work , here we have made the connection between experimental results and mathematical theory of population genetics. Our new experiments and analysis suggest that LDsplit may be more likely to be effective when there is a big difference in hotspot penetrance between two alleles of an associated SNP (e.g. the DNA2-FG11 case).
Since our presentation of LDsplit software in a recent conference , several researchers have inquired, downloaded and used LDsplit, showing their interests in this software tool. For broader applications and impacts on the biological research, we plan to further improve LDsplit and continue Bioinformatics analysis of recombination hotspots as follows. First, as LDsplit is based on the computation intensive algorithm of LDhat, it is still time consuming for large data sets. Thus, we will speed up LDsplit by parallel computing, algorithm optimization, and pre-computing of results on large-scale data (e.g. HapMap SNPs). Recently we have obtained significant speedup for LDhat algorithm by optimization and parallel computing . Second, as the biased gene conversion (BGC) plays a key role in the success of LDsplit, we will investigate the relation between recombination hotspots and BGC (see  for recent research in this direction).
One implication of the discovery of PRDM9 is that epigenetic factors (e.g. DNA methylation and histone modification) play important roles in the regulation of recombination hotspots. Several recent studies have pointed to this connection [10, 13, 32]. Recently we have made an effort to integrate genetic and epigenetic factors for regulating meiotic recombination into one predictive model, with promising results . An important future work is to incorporate epigenetic factors into the detection of cis-regulatory DNA motifs.
In this paper we presented LDsplit, an open source software tool for predicting cis-regulatory motifs of meiotic recombination hotspots through SNP analysis. With its graphical user interface which allows convenient interactive and integrative analysis of hotspots and associated DNA sequences, LDsplit will be a useful tool for many researchers working in areas including (but not limited to) DNA recombination, genomic evolution, disease gene mapping, and genome instability. In this paper, we demonstrated the usefulness and accuracy of LDsplit by testing on the 7-mer DNA motif of CCTCCCT which is bound to by PRDM9 and a well-known cis-regulator of recombination hotspots. We showed that SNP alleles disrupting the 7-mer motif have lower hotspot intensities estimated by LDsplit. Moreover, LDsplit is able to guide the discovery of the 7-mer motif and simulated motifs by the MEME software. Such a Bioinformatics approach is promising for the discovery of cis- and trans-regulators, to uncover the molecular machinery regulating recombination hotspots and genome stability.
Availability and requirements
Project name: LDsplit.
Project home page: http://www.ntu.edu.sg/home/zhengjie/software/LDsplit.htm.
Operating system(s): Windows, Linux, Mac OS.
Programming language: Java.
Other requirements: JDK, JFreechart package.
License: GNU General Public License.
Any restrictions to use by non-academics: None.
Biased gene conversion
Minor allele frequency
Single nucleotide polymorphism
Graphical user interface.
This work was supported by the following funding sources: Tier 1 AcRF Grant MOE RG 32/11 (M4010977.020) from Ministry of Education, Singapore; Tier 2 AcRF Grant MOE2008-T2-1-074 from Ministry of Education, Singapore; Intramural Research Program of NIH (National Library of Medicine), USA.
- Hey J: What’s so hot about recombination hotspots?. PLoS Biol. 2004, 2 (6): e190-10.1371/journal.pbio.0020190.View ArticlePubMed CentralPubMedGoogle Scholar
- McVean G: What drives recombination hotspots to repeat DNA in humans?. Philos Trans R Soc Lond B Biol Sci. 2010, 365 (1544): 1213-1218. 10.1098/rstb.2009.0299.View ArticlePubMed CentralPubMedGoogle Scholar
- Myers S, Bottolo L, Freeman C, McVean G, Donnelly P: A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005, 310 (5746): 321-324. 10.1126/science.1117196.View ArticlePubMedGoogle Scholar
- Myers S, Freeman C, Auton A, Donnelly P, McVean G: A common sequence motif associated with recombination hot spots and genome instability in humans. Nat Genet. 2008, 40 (9): 1124-1129. 10.1038/ng.213.View ArticlePubMedGoogle Scholar
- Myers S, Bowden R, Tumian A, Bontrop RE, Freeman C, MacFie TS, McVean G, Donnelly P: Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science. 2010, 327 (5967): 876-879. 10.1126/science.1182363.View ArticlePubMedGoogle Scholar
- Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, Przeworski M, Coop G, de Massy B: PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science. 2010, 327 (5967): 836-840. 10.1126/science.1183439.View ArticlePubMed CentralPubMedGoogle Scholar
- Parvanov ED, Petkov PM, Paigen K: Prdm9 controls activation of mammalian recombination hotspots. Science. 2010, 327 (5967): 835-10.1126/science.1181495.View ArticlePubMed CentralPubMedGoogle Scholar
- Hayashi K, Yoshida K, Matsui Y: A histone H3 methyltransferase controls epigenetic events required for meiotic prophase. Nature. 2005, 438 (7066): 374-378. 10.1038/nature04112.View ArticlePubMedGoogle Scholar
- Brick K, Smagulova F, Khil P, Camerini-Otero RD, Petukhova GV: Genetic recombination is directed away from functional genomics elements in mice. Nature. 2012, 485 (7400): 642-645. 10.1038/nature11089.View ArticlePubMed CentralPubMedGoogle Scholar
- Wu M, Kwoh CK, Przytycka TM, Li J, Zheng J: Epigenetic functions enriched in transcription factors binding to mouse recombination hotspots. Proteome Sci. 2012, 10 (Suppl 1): S11-10.1186/1477-5956-10-S1-S11.View ArticlePubMed CentralPubMedGoogle Scholar
- Wahls WP, Davidson MK: New paradigms for conserved, multifactorial, cis-acting regulation of meiotic recombination. Nucleic Acids Res. 2012, 40 (10): 9983-9989.View ArticlePubMed CentralPubMedGoogle Scholar
- Wu M, Kwoh CK, Przytycka TM, Li J, Zheng J: Integration of Genomic and Epigenomic Features to Predict Meiotic Recombination Hotspots in Human and Mouse. Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM-BCB 2012). 2012, New York, NY, USA: ACM, 297-304.View ArticleGoogle Scholar
- Smagulova F, Gregoretti IV, Brick K, Khil P, Camerini-Otero RD, Petukhova GV: Genome-wide analysis reveals novel molecular features of mouse recombination hotspots. Nature. 2011, 472 (7343): 375-378. 10.1038/nature09869.View ArticlePubMed CentralPubMedGoogle Scholar
- Grey C, Barthes P, Chauveau-Le Friec G, Langa F, Baudat F, de Massy B: Mouse PRDM9 DNA-binding specificity determines sites of histone H3 lysine 4 trimethylation for initiation of meiotic recombination. PLoS Biol. 2011, 9 (10): e1001176-10.1371/journal.pbio.1001176.View ArticlePubMed CentralPubMedGoogle Scholar
- Steiner WW, Davidow PA, Bagshaw AT: Important characteristics of sequence-specific recombination hotspots in Schizosaccharomyces pombe. Genetics. 2011, 187 (2): 385-396. 10.1534/genetics.110.124636.View ArticlePubMed CentralPubMedGoogle Scholar
- Jeffreys AJ, Neumann R: Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet. 2002, 31 (3): 267-271. 10.1038/ng910.View ArticlePubMedGoogle Scholar
- Jeffreys AJ, Neumann R: Factors influencing recombination frequency and distribution in a human meiotic crossover hotspot. Hum Mol Genet. 2005, 14 (15): 2277-2287. 10.1093/hmg/ddi232.View ArticlePubMedGoogle Scholar
- Zheng J, Khil PP, Camerini-Otero RD, Przytycka TM: Detecting sequence polymorphisms associated with meiotic recombination hotspots in the human genome. Genome Biol. 2010, 11 (10): R103-10.1186/gb-2010-11-10-r103.View ArticlePubMed CentralPubMedGoogle Scholar
- Auton A, McVean G: Recombination rate estimation in the presence of hotspots. Genome Res. 2007, 17 (8): 1219-1227. 10.1101/gr.6386707.View ArticlePubMed CentralPubMedGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994, 2: 28-36.PubMedGoogle Scholar
- Peng B, Kimmel M: simuPOP: a forward-time population genetics simulation environment. Bioinformatics. 2005, 21 (18): 3686-3687. 10.1093/bioinformatics/bti584.View ArticlePubMedGoogle Scholar
- Auton A, Fledel-Alon A, Pfeifer S, Venn O, Segurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, Humburg P, Iqbal Z, Lunter G, Maller J, Hernandez RD, Melton C, Venkat A, Nobrega MA, Bontrop R, Myers S, Donnelly P, Przeworski M, McVean G: A fine-scale chimpanzee genetic map from population sequencing. Science. 2012, 336 (6078): 193-198. 10.1126/science.1216872.View ArticlePubMed CentralPubMedGoogle Scholar
- Cho SY, Chung M, Park M, Park S, Lee YS: ZIFIBI: prediction of DNA binding sites for zinc finger proteins. Biochem Biophys Res Commun. 2008, 369 (3): 845-848. 10.1016/j.bbrc.2008.02.106.View ArticlePubMedGoogle Scholar
- Zhang J, Li F, Li J, Zhang MQ, Zhang X: Evidence and characteristics of putative human alpha recombination hotspots. Hum Mol Genet. 2004, 13 (22): 2823-2828. 10.1093/hmg/ddh310.View ArticlePubMedGoogle Scholar
- Axelsson E, Webster MT, Ratnakumar A, Ponting CP, Lindblad-Toh K: Death of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res. 2012, 22 (1): 51-63. 10.1101/gr.124123.111.View ArticlePubMed CentralPubMedGoogle Scholar
- Coop G, Myers SR: Live hot, die young: transmission distortion in recombination hotspots. PLoS Genet. 2007, 3 (3): e35-10.1371/journal.pgen.0030035.View ArticlePubMed CentralPubMedGoogle Scholar
- Boulton A, Myers RS, Redfield RJ: The hotspot conversion paradox and the evolution of meiotic recombination. Proc Natl Acad Sci U S A. 1997, 94 (15): 8058-8063. 10.1073/pnas.94.15.8058.View ArticlePubMed CentralPubMedGoogle Scholar
- Hellenthal G, Pritchard JK, Stephens M: The effects of genotype-dependent recombination, and transmission asymmetry, on linkage disequilibrium. Genetics. 2006, 172 (3): 2001-2005.View ArticlePubMed CentralPubMedGoogle Scholar
- Yang P, Wu M, Kowh CK, Khil PP, Camerini-Otero RD, Przytycka TM, Zheng J: Predicting DNA sequence motifs of recombination hotspots by integrative visualization and analysis. Proceedings of International Symposium on Integrative Bioinformatics. 2012, Hangzhou, China, 52-58.Google Scholar
- Guo J, Jain R, Yang P, Fan R, Kwoh CK, Zheng J: Reliable and Fast Estimation of Recombination Rates by Convergence Diagnosis and Parallel Markov Chain Monte Carlo. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014. IEEE Computer Society, http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.133
- Katzman S, Capra JA, Haussler D, Pollard KS: Ongoing GC-biased evolution is widespread in the human genome and enriched near recombination hot spots. Genome Biol Evol. 2011, 3: 614-626. 10.1093/gbe/evr058.View ArticlePubMed CentralPubMedGoogle Scholar
- Wahls WP, Davidson MK: Discrete DNA sites regulate global distribution of meiotic recombination. Trends Genet. 2010, 26 (5): 202-208. 10.1016/j.tig.2010.02.003.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.