CODA (crossover distribution analyzer): quantitative characterization of crossover position patterns along chromosomes
© Gauthier et al; licensee BioMed Central Ltd. 2011
Received: 21 July 2010
Accepted: 20 January 2011
Published: 20 January 2011
During meiosis, homologous chromosomes exchange segments via the formation of crossovers. This phenomenon is highly regulated; in particular, crossovers are distributed heterogeneously along the physical map and rarely arise in close proximity, a property referred to as "interference". Crossover positions form patterns that give clues about how crossovers are formed. In several organisms including yeast, tomato, Arabidopsis, and mouse, it is believed that crossovers form via at least two pathways, one interfering, the other not.
We have developed a software package - "CODA", for CrossOver Distribution Analyzer - which allows one to quantitatively characterize crossover patterns by fitting interference models to experimental data. Two families of interfering models are provided: the "gamma" model and the "beam-film" model. The user can specify single or two-pathways modeling, and the software package infers the model's parameters and their confidence intervals. CODA can handle data produced from measurements on bivalents or gametes, in the form of continuous crossover positions or marker genotyping. We illustrate the possibilities on data from Wheat, corn and mouse.
CODA extends the kind of crossover data that could be analyzed so far to include gametic data (rather than only bivalents/tetrads) when using two-pathways modeling. It will also enable users to perform analyses based on the beam-film model. CODA implements that model's complex physics and mathematics, and uses a summary statistic to overcomes the lack of a computable likelihood which has hampered its use till now.
In sexually reproducing organisms, haploid gametes are produced during meiosis. Fertilization then restores the diploid number of chromosomes via the fusion of two gametes to form a zygote. A major consequence of meiosis is that the genetic material of the parents undergoes two levels of shuffling: (1) the chromosomes segregate independently, and (2) intra-chromosomal recombination occurs through reciprocal exchanges of chromosome segments due to crossing-over between homologs during prophase I of meiosis. Crossovers (COs) thus drive genetic diversity through recombination, and they are also essential for proper chromosome segregation because they hold homologous chromosomes together at metaphase I . The genetic distance between two loci is defined as the average number of COs within this interval, per meiosis, when considering gametes. Thus a 100 cM (1 Morgan) segment has on average 1 CO per gamete, and 2 COs per bivalent (the bivalent including two homologous chromosomes during the first meiotic division). In most organisms, CO formation is highly regulated with respect to the number and the distribution of CO events on the chromosomes. This distribution is clearly non-random; some regions of the physical chromosome are much more prone to CO formation and recombination than others .
Furthermore, a phenomenon called CO interference  lowers the probability that two COs occur close to each other in the same meiosis. Recent evidence highlights that COs form within two different pathways: the interfering (hereafter referred to as P1) and the non-interfering (hereafter referred to as P2) pathways. The proportion p of COs formed through P2 is quite variable. This proportion has been estimated in different species via biological [4, 5] or computational [6–8] approaches, with typical values between 5 and 30%.
At present there is no efficient tool for targeting COs to desired chromosomal locations in higher plants. Since breeding relies on selecting progenies where COs have assembled together favorable alleles that were separated in the parental genotypes, getting insights into the mechanisms of CO formation and characterizing the number and localization of COs is of key importance. A first characterization of CO distributions is achieved by extracting (i) the strength of interference at work in P1 and (ii) the proportions of P1 and P2 COs.
To characterize interference strength, the most powerful approach is to fit mathematical models to experimental data sets. Such models may be grouped into two classes: physically motivated models and statistically oriented models. The main physical model is the beam-film (BF) model . It considers the establishment and propagation of a mechanical stress using a mechanical analogy of a ceramic film on a metallic beam, with COs being seen as "cracks" which release the stress locally in the film and thus forbid nearby COs. The statistical models are mainly based on the statistics of genetic distances between successive COs, using stationary renewal processes (SRPs), and no chromatid interference [10, 11]. Among SRP-based models, one of the most studied is the gamma model, which generates inter-CO distances using the gamma distribution . To include a non-interfering pathway, non-interfering COs are simply added to those of the interfering pathway, leading to the gamma "sprinkling" (GS) [6–8] and BF sprinkling (BFS)  models.
Fitting these kinds of models to data requires rather complex mathematics and computer programming. It has thus long been difficult for data producers to analyze their CO distributions without collaborating with specialized groups. To try to fill this gap, Viswanath & Housworth  developed a Java application based on two pathways modeling where P1 is described by the counting model (a particular case of the gamma model). However this software tool is restricted to marker segregation data in tetrads, a data type that is available only for particular organisms like Ascomycetes or for one particular Arabidobsis mutant. It is thus not applicable to the vast majority of organisms, for which marker segregation data are obtained from individual gametes (or their progeny), rather than from tetrads or bivalents. More recently, Housworth & Stahl  published a R script to estimate interference strength from CO position data using the single pathway gamma model. Unfortunately, the script does not allow two-pathways analyses: this compromises realistic descriptions of CO distributions whenever two pathways are at work, which is the case in the vast majority of species studied so far. Moreover, this R script was designed to analyze positions of protein foci immunolocalized on synaptonemal complexes, which is equivalent to positions of P1 COs on the bivalent. So here as well as with the previous software, it is not possible to analyze marker segregation data in gametes or genetic mapping populations; this is all the more a drawback that those are the most common data sets available from plants and animals.
To our knowledge, no tool has been available to perform two-pathways analyses of CO distributions on segregation data obtained from individual gametes, for instance through linkage mapping experiments. Moreover, we do not know of any software for using physically motivated models like the BF model to analyze CO distributions. This last point is important because model-based inferences may often be considered suspiciously if their robustness to different model choices cannot be evaluated. So having predictions from two very different models gives further credence to these kinds of analyses and to their inferred parameters.
This software allows one to analyze CO position data sets or genetic mapping data to quantitatively characterize CO distributions along chromosomes. CODA works with two different types of datasets: (1) marker segregation data at the gamete level, obtained for instance from backcross or double-haploid linkage mapping populations, or from sperm typing, or (2) continuous CO positions such as obtained from immunolocalization of protein foci on synaptonemal complexes, corresponding to bivalents. These positions can be given in whatever units, including micrometers of synaptonemal complex or Mb of DNA sequence. CODA converts all of these into genetic positions to perform the fits in this space.
Two different interference models are implemented: the statistical gamma model and the physical BF model. When using the gamma model (either in a single or two-pathways framework), the adjustment of parameters is done using maximum likelihood [6, 10]. In the case of the BF model, no likelihood can be computed, so we have developed a score based on a projected likelihood to measure the goodness of fit . This score quantifies the differences between histograms produced from the experimental data and predicted by the model with its parameters; these histograms are for both inter-CO distances and numbers of COs per chromosome. The user can compare scores for different models to see which fits best. CODA provides two methods for determining the optimum model parameters: either by scanning the parameter space via a grid, or by applying hill-climbing -- performing small steps in the two-dimensional parameter space towards better goodness-of-fits, until no local improvement is found. In terms of speed, both methods have comparable efficiencies for single-pathway models, but with two pathways, the hill-climbing is much more effective and is strongly recommended for reasonable computation times. In terms of reliability, the hill-climbing may be affected by local maxima, but in all of our tests with real CO data, such a situation did not arise if the size of the population simulated was 106 or more. In case of doubt, users may prefer to use the scanning method, which is insensitive to local maxima.
Using the scanning algorithm, the likelihood or score can be displayed as a 3 D surface plot, whereas using the hill-climbing algorithm, CODA provides on the fly a graph of the trajectory in the space spanned by parameter 1 and parameter 2. All these aspects will be illustrated with figures in the "Results" section. CODA may be used in command-line mode to compute confidence intervals based on re-simulations, as was done in . However, since that approach requires substantial CPU resources, the CODA GUI provides confidence intervals for the gamma model based on Fisher's information matrix.
The core of CODA (i.e., the computing part) is written in standard C/C++, while the CODA graphical user interface (GUI) uses cross-platform C++ Qt libraries, and Qt based Qwt and Qwtplot3 d libraries for specialized 2 D and 3 D (respectively) plotting widgets.
Qt, Qwt and Qwtplot3 d are distributed under the terms of the GNU Lesser General Public License.
The result of running CODA is an estimation of interference strength (parameter 1), and in the case of two-pathways models, an estimate of the proportion of COs coming from each pathway (parameter 2). In addition, the user interface provides three characteristic graphs displaying features of the experimental CO patterns, as well as a visual comparison between models and experimental data during the adjustment process. Upon completion of analysis, one may export graphs as bitmap or vector images.
The GUI allows the user to choose the search method for determining the optimum model parameters: either by scanning the parameter space with a grid, or by performing hill-climbing in the goodness-of-fit.
We first present how to use the GUI to analyze a specific dataset (using the Wheat chromosome IIIB segregation data of ). Then we compare the outcomes of analyses (using mouse genetic mapping data of ) when employing (1) the gamma-sprinkling vs the beam-film sprinkling model; (2) the single vs two-pathways gamma model. Finally, we benchmark the speed of our software to that of existing tools on yet another data set (namely Maize late nodules data in electron microscopy from ).
Graphical User Interface
Description of the "Settings" tab
Description of the "Graphs" tab
The upper part of this tab provides three types of graphs, the lower part displays current and best parameters values, as well as the confidence intervals for each parameter (gamma-based model only). Both parts are real-time updated throughout the duration of the fit. At any time, one can switch from one graph to another, keeping the parameters panel visible.
Bitmap (png, jpeg, bmp) or vector (pdf, eps, ps, svg) images of all graph can be generated.
Comparing different models
Gamma sprinkling vs Beam-film sprinkling
One vs two pathways modeling
Performance benchmarks (computing time)
To quantify computational performances, we have used the platform GNU/Linux Ubuntu 10.04 32 bit, RAM 4 Go, 4 × Intel Xeon CPU E5410 2.33 GHz.
Comparison of fitting algorithms in CODA
In our first benchmark, we compare the computing times when using the hill-climbing vs the complete scan method for fitting the model's parameters. The analysis presented here is for Maize late nodule data , chromosome 1, using the gamma sprinkling model, for which we have crossover positions on bivalents, a situation relevant for the upcoming benchmark.
Benchmark comparison between algorithms "complete scan" and "hill-climbing"
Interference strength (nu)
7 h 10 min
2 min 40 s
Existing tools comparison: CODA vs Interference Analyzer
Benchmark comparison between CODA and Interference Analyzer (IA)
2 min 40 s
CODA provides both a quantitative and qualitative advance for the analysis of crossover data. With its ability to treat two pathway models and data coming from gametes, along with its easy use thanks to a GUI, CODA will allow researchers to perform their own analysis quite straightforwardly. It also allows users to compare the relative merits of the gamma model and the BF model.
Furthermore, the software package is evolvable: because it is open source, the users can modify the details of the models implemented, or even substitute their own choice of models. With data sets of crossover patterns growing in size and number, the number of potential users will increase. One can also expect that researchers will request more sophisticated models; we thus anticipate that we will be upgrading CODA to include these new models from which their added value can be tested.
The open-access utility CODA provides the user with an easy interface for model-based characterizations of crossover patterns along chromosomes. It allows one to estimate the strength of crossover interference, using either the statistically motivated gamma model, or the mechanically formulated beam-film model. It can also be used for two-pathways modeling, where a second (non-interfering) pathway is superposed on the first, and generates multiple histograms that summarize the features of crossover patterns. The experimental input files can contain marker segregation data coming from genetic linkage mapping experiments, or crossover positions on chromosomes coming for instance from cytological imaging, significantly extending the possibilities of previous software packages. The use of this kind of modeling can give support to the presence of putative pathways as was done in . Also, as the mechanisms of crossover formation become better known, more sophisticated models can be added to CODA to exhibit their characteristics and to quantify their level of agreement with experiments, thereby advancing the detailed understanding of meiotic processes.
Availability and requirements
Project name: CrossOver Distribution Analyzer.
Project home page: http://cms.moulon.inra.fr/content/view/25/56/lang,en/
Operating system(s): Gnu/Linux, MacOS X (10.4 or higher), Windows (XP or higher)
Programming language: C++
Other requirements: Ready-to-use executables are provided for MacOS or Windows, but installing from the sources (e.g. for Linux systems) needs to have Qt4, qwt and qwtplot3 d development packages installed on the system. Compiler versions: g++ v4.0.1 under MacOSX and g++ v4.4 under Windows and linux platforms. Recommanded Hardware: At least Pentium4 or equivalent, and 512 Mo RAM.
License: GNU GPL
Any restrictions to use by non-academics: None
Binary files for Windows and MacOS can be downloaded freely on the project home page. All sources are available under the GPL license from the same URL. The software can also be used through command lines, making it easy to perform calculations on a remote server and/or to launch analyses automatically using scripts. A detailed documentation is included in the distribution, as well as a sample data file. We request that publications using CODA refer to the present article.
The training of the authors FG, OM, and MF is respectively in computer science, physics, and genetics.
Graphical User Interface
stationary renewal process
This work was supported by funding from the Agence nationale de la recherche [ANR-07-BLANC-COPATH and ANR-09-GENM-022-003 SingleMeiosis].
The authors thank Olivier Langella for helpful technical advices, and Pavel Borodin, Victor Sabarly, and Olivier Sosnowski for beta-testing CODA.
- Jones GH, Franklin FCH: Meiotic Crossing-over: Obligation and Interference. Cell 2006, 126: 246–248. 10.1016/j.cell.2006.07.010View ArticlePubMedGoogle Scholar
- Anderson LK, Stack SM: Meiotic Recombination in Plants. Current Genomics 2002, 3: 507–525. 10.2174/1389202023350200View ArticleGoogle Scholar
- Sturtevant AH: The behavior of the chromosomes as studied through linkage. Molecular and General Genetics MGG 1915, 13: 234–287.View ArticleGoogle Scholar
- Berchowitz LE, Francis KE, Bey AL, Copenhaver GP: The Role of AtMUS81 in Interference-Insensitive Crossovers in A. thaliana. PLoS Genet 2007, 3: e132. 10.1371/journal.pgen.0030132PubMed CentralView ArticlePubMedGoogle Scholar
- Lhuissier FG, Offenberg HH, Wittich PE, Vischer NO, Heyting C: The Mismatch Repair Protein MLH1 Marks a Subset of Strongly Interfering Crossovers in Tomato. Plant Cell 2007, 19: 862–876. 10.1105/tpc.106.049106PubMed CentralView ArticlePubMedGoogle Scholar
- Copenhaver GP, Housworth EA, Stahl FW: Crossover Interference in Arabidopsis. Genetics 2002, 160: 1631–1639.PubMed CentralPubMedGoogle Scholar
- Housworth E, Stahl F: Crossover Interference in Humans. The American Journal of Human Genetics 2003, 73: 188–197. 10.1086/376610View ArticlePubMedGoogle Scholar
- Falque M, Anderson LK, Stack SM, Gauthier F, Martin OC: Two Types of Meiotic Crossovers Coexist in Maize. Plant Cell 2009, 21: 3915–3925. 10.1105/tpc.109.071514PubMed CentralView ArticlePubMedGoogle Scholar
- Kleckner N, Zickler D, Jones GH, Dekker J, Padmore R, Henle J, Hutchinson J: A mechanical basis for chromosome function. Proc Natl Acad Sci USA 2004, 101: 12592–12597. 10.1073/pnas.0402724101PubMed CentralView ArticlePubMedGoogle Scholar
- McPeek MS, Speed TP: Modeling Interference in Genetic Recombination. Genetics 1995, 139: 1031–1044.PubMed CentralPubMedGoogle Scholar
- Zhao H, McPeek MS, Speed TP: Statistical Analysis of Chromatid Interference. Genetics 1995, 139: 1057–1065.PubMed CentralPubMedGoogle Scholar
- Viswanath L, Housworth E: InterferenceAnalyzer: Tools for the analysis and simulation of multi-locus genetic data. BMC Bioinformatics 2005, 6: 297. 10.1186/1471-2105-6-297PubMed CentralView ArticlePubMedGoogle Scholar
- Housworth E, Stahl F: Is There Variation in Crossover Interference Levels Among Chromosomes From Human Males? Genetics 2009, 183: 403–405. 10.1534/genetics.109.103853PubMed CentralView ArticlePubMedGoogle Scholar
- Saintenac C, Falque M, Martin OC, Paux E, Feuillet C, Sourdille P: Detailed Recombination Studies Along Chromosome 3B Provide New Insights on Crossover Distribution in Wheat (Triticum aestivum L.). Genetics 2009, 181: 393–403. 10.1534/genetics.108.097469PubMed CentralView ArticlePubMedGoogle Scholar
- Broman KW, Rowe LB, Churchill GA, Paigen K: Crossover Interference in the Mouse. Genetics 2002, 160: 1123–1131.PubMed CentralPubMedGoogle Scholar
- Anderson LK, Doyle GG, Brigham B, Carter J, Hooker KD, Lai A, Rice M, Stack SM: High-Resolution Crossover Maps for Each Bivalent of Zea mays Using Recombination Nodules. Genetics 2003, 165: 849–865.PubMed CentralPubMedGoogle Scholar
- Foss E, Lande R, Stahl FW, Steinberg CM: Chiasma Interference as a Function of Genetic Distance. Genetics 1993, 133: 681–691.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.