OrgConv: detection of gene conversion using consensus sequences and its application in plant mitochondrial and chloroplast homologs
© Hao; licensee BioMed Central Ltd. 2010
Received: 6 October 2009
Accepted: 2 March 2010
Published: 2 March 2010
The ancestry of mitochondria and chloroplasts traces back to separate endosymbioses of once free-living bacteria. The highly reduced genomes of these two organelles therefore contain very distant homologs that only recently have been shown to recombine inside the mitochondrial genome. Detection of gene conversion between mitochondrial and chloroplast homologs was previously impossible due to the lack of suitable computer programs. Recently, I developed a novel method and have, for the first time, discovered recurrent gene conversion between chloroplast mitochondrial genes. The method will further our understanding of plant organellar genome evolution and help identify and remove gene regions with incongruent phylogenetic signals for several genes widely used in plant systematics. Here, I implement such a method that is available in a user friendly web interface.
OrgConv (Org anellar Conv ersion) is a computer package developed for detection of gene conversion between mitochondrial and chloroplast homologous genes. OrgConv is available in two forms; source code can be installed and run on a Linux platform and a web interface is available on multiple operating systems. The input files of the feature program are two multiple sequence alignments from different organellar compartments in FASTA format. The program compares every examined sequence against the consensus sequence of each sequence alignment rather than exhaustively examining every possible combination. Making use of consensus sequences significantly reduces the number of comparisons and therefore reduces overall computational time, which allows for analysis of very large datasets. Most importantly, with the significantly reduced number of comparisons, the statistical power remains high in the face of correction for multiple tests.
Both the source code and the web interface of OrgConv are available for free from the OrgConv website http://www.indiana.edu/~orgconv. Although OrgConv has been developed with main focus on detection of gene conversion between mitochondrial and chloroplast genes, it may also be used for detection of gene conversion between any two distinct groups of homologous sequences.
Efforts to detect gene conversion and homologous recombination (HR) have increased in the past two decades [1, 2]. This has sparked the development of many computer programs, such as RDP, geneconv, Max χ2, Homoplasy test , Phi and many others. In general, the vast majority of interspecific HR events involve closely related species [8–11], and the frequency of HR tends to decrease sharply with the level of relatedness between donor and recipient [12–14]. Nonetheless, several cases of gene conversion between distantly related homologous sequences have been reported in recent years [15–17]. Mitochondria (mt) and chloroplasts (cp) originated from endosymbiotic bacteria and last shared common ancestry some 2 billion years ago. Plant mitochondrial genomes harbor a significant amount of chloroplast sequences (up to 8.8% of the complete mitochondrial genomes) due to intracellular gene transfer from chloroplast to mitochondria [18, 19]. The coexistence of homologous genes inside the mitochondrial genome creates the potential for gene conversion between ancient homologs. Plant mitochondrial and chloroplast genomes share 3 ribosomal RNA genes and about half of the 40 protein coding genes, which together serve as the substrate for recombination. The discovery of several chimeric plant mitochondrial genes, in this case between native and horizontally transferred mitochondrial genes [20, 21], further suggest that mitochondrial genes are involved in recombination/conversion during or after DNA exchange events. Despite this abundance of factors that would seem to facilitate conversion in mitochondrial genes, evidence of gene conversion from ancient chloroplast homologs into mitochondrial genes has, until recently, not been shown. One possible reason is that the relatively low substitution rate in both plant mitochondrial and chloroplast genes [22, 23] prevents mt-cp conversion from being detected, since both empirical and simulation studies have shown that all existing programs are not sensitive at very low sequence diversity [24–27]. In this article, I describe a new method  that makes use of consensus sequences, which have good computational efficiency and retain high statistical power. The development of the method led us to a discovery of recurrent conversion between the mitochondrial and chloroplast homologs of the alpha subunit of ATP synthase in the mitochondrial genes . Here, I implement the method into a computer program, and make it available for the public in both source code and a user friendly web interface.
The core calculation for conversion identification
where L is the length of informative sites, N is the length of the putative recombinant segment, M is the number of common nucleotides shared between the putative recombinant sequences, and p is the proportion of nucleotides common between the same pair of sequences. There are non-overlapping windows of size N in the sequence (L sites). The term was used in the RDP method to correct for multiple windows.
In this study, two improvements were made to the above calculation. 1), the parameter p (the proportion of nucleotides common between sequences) was calculated from the sequence excluding the examined region instead of from the entire sequence. The calculation based on the entire sequence in the original RDP method is under the null hypothesis that there is no recombination. However, when there is recombination, the proportion of nucleotides common between the entire donor and recipient sequences is inflated because of the recombinant region, and consequently the calculated probability P will be less significant than it should be. It would therefore be more appropriate to exclude the examined region from the overall p calculation. 2), in addition to the term , a second term (L - N) was introduced to correct for multiple windows. In this study, calculation was performed in sliding-windows by incrementing one informative site at a time. For a given window-size N, there are (L - N) instead of windows, but these (L-N) windows are not independent from each other. The "effective" number of windows that need to be corrected for multiple tests should fall between (L - N) and . The use of (L - N) will present an upper bound of the probability P. Both P -values based on the term and L - N are presented in the output.
Unlike in the RDP program, the size of the sliding window is not fixed in the OrgConv package. Instead, from the site where windows begin, the final window size is from the window that has the smallest P-value. This is computationally more expensive than the calculation using a fixed window-size in the RDP program. This computationally expensive calculation is used in the program because there is no easy way for users to pre-set any window-size that will be guaranteed to be optimal for their data. Finally, the performance of the improved calculations and the original RDP method was evaluated via simulation.
The OrgConv package
The OrgConv web interface
Results and Discussion
Substantially increased comparisons and decreased P-values to be considered significant for Bonferroni correction along with the increase of sequences if every possible combination is calculated
Using 2 sequences (e.g. geneconv)
Using 3 sequences (e.g. RDP)
5.0 × 103
1.0 × 10-05
1.6 × 105
3.1 × 10-07
5.0 × 105
1.0 × 10-07
1.7 × 108
2.9 × 10-10
5.0 × 107
1.0 × 10-09
1.7 × 1011
2.9 × 10-13
Performance of mtcpconv, geneconv, and RDP on a various number of mitochondrial atp1 and chloroplast atpA sequences in different angiosperm groups
mt-cp segments detected
Sequence group ‡
The use of consensus sequences carries a risk that recombination events involving chloroplast regions that differ significantly from the chloroplast consensus sequence will be missed. A possible approach to overcoming this is to compare mitochondrial sequences against chloroplast sequences from closely related species. Indeed, 23 segments in the Asterids group did show slightly smaller initial (before Bonferroni correction) P-values when comparing Asterids mitochondrial and chloroplast sequences than comparing all angiosperm mitochondrial and chloroplast sequences (data not shown). However, comparison of sequences from the same taxonomic group might not always outperform comparison of larger groups. For example, Myrtus communis has been detected to have a mtcp-conversion by analyzing the entire angiosperms dataset with two P-values of 6.19 × 10-08 and 4.70 × 10-06, whereas the P-values when comparing Rosids mitochondrial and chloroplast genes are only 2.74 × 10-03 and 2.50 × 10-01, and not considered to be significant in Figure 4.
The OrgConv package was developed for detection of mt-cp conversion. It makes use of the consensus sequence from each group of sequences and compare each examined sequence against consensus sequences rather than examining every possible sequence combination. By doing so, computational burden has been significantly reduced and it becomes feasible to analyze very large data sets. More importantly, the statistical power of the program is retained in the face of Bonferroni correction because of the reduced number of comparisons. Furthermore, although developed for detection of mt-cp converson, the program may be applied on other sequences than mitochondrial and chloroplast sequences, e.g., when two large groups of sequences have very low diversity within each group and high diversity between groups.
Availability and requirements
Project name: OrgConv
Project home page: http://www.indiana.edu/~orgconv
Operating system(s): Linux for the distributed source code and operating systems independent for the web-interface
Programming language: C++ for the source code and Perl CGI scripts for the web-interface
License: Free for academic use
List of abbreviations
horizontal gene transfer
- mt-cp conversion:
gene conversion between mitochondrial and chloroplast genes.
I thank Jeffrey Palmer, Andy Alverson, and Danny Rice for helpful discussions and I acknowledge the High Performance Systems at Indiana University for computational facility. I would also like to thank an anonymous reviewer for suggesting the removal of the examined region in the calculation of the proportion of common nucleotides (p), and the use of both and (L - N) to correct for multiple windows in the calculation of P-values. This research was supported in part by a Natural Sciences and Engineering Council of Canada (NSERC) postdoctoral fellowship to W.H., by National Institutes of Health Grant R01-GM-70612 and by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. (both to Jeffrey D. Palmer).
- Posada D, Crandall KA, Holmes EC: Recombination in evolutionary genomics. Annu Rev Genet 2002, 36: 75–97. 10.1146/annurev.genet.36.040202.111115View ArticlePubMedGoogle Scholar
- Fraser C, Hanage WP, Spratt BG: Recombination and the nature of bacterial speciation. Science 2007, 315: 476–480. 10.1126/science.1127573View ArticlePubMedPubMed CentralGoogle Scholar
- Martin D, Rybicki E: RDP: detection of recombination amongst aligned sequences. Bioinformatics 2000, 16: 562–563. 10.1093/bioinformatics/16.6.562View ArticlePubMedGoogle Scholar
- Sawyer S: Statistical tests for detecting gene conversion. Mol Biol Evol 1989, 6: 526–538.PubMedGoogle Scholar
- Smith JM: Analyzing the mosaic structure of genes. J Mol Evol 1992, 34: 126–129.PubMedGoogle Scholar
- Maynard Smith J, Smith NH: Detecting recombination from gene trees. Mol Biol Evol 1998, 15: 590–599.View ArticlePubMedGoogle Scholar
- Bruen TC, Philippe H, Bryant D: A simple and robust statistical test for detecting the presence of recombination. Genetics 2006, 172: 2665–2681. 10.1534/genetics.105.048975View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou J, Spratt BG: Sequence diversity within the argF , fbp and recA genes of natural isolates of Neisseria meningitidis: interspecies recombination within the argF gene. Mol Microbiol 1992, 6: 2135–2146. 10.1111/j.1365-2958.1992.tb01387.xView ArticlePubMedGoogle Scholar
- Gogarten JP, Doolittle WF, Lawrence JG: Prokaryotic evolution in light of gene transfer. Mol Biol Evol 2002, 19: 2226–2238.View ArticlePubMedGoogle Scholar
- Papke RT, Koenig JE, Rodriguez-Valera F, Doolittle WF: Frequent recombination in a saltern population of Halorubrum . Science 2004, 306: 1928–1929.PubMedGoogle Scholar
- Jaramillo-Correa JP, Bousquet J: Mitochondrial genome recombination in the zone of contact between two hybridizing conifers. Genetics 2005, 171: 1951–1962. 10.1534/genetics.105.042770View ArticlePubMedPubMed CentralGoogle Scholar
- Stratz M, Mau M, Timmis KN: System to study horizontal gene exchange among microorganisms without cultivation of recipients. Mol Microbiol 1996, 22: 207–215. 10.1046/j.1365-2958.1996.00099.xView ArticlePubMedGoogle Scholar
- Vulic M, Dionisio F, Taddei F, Radman M: Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in enterobacteria. Proc Natl Acad Sci USA 1997, 94: 9763–9767. 10.1073/pnas.94.18.9763View ArticlePubMedPubMed CentralGoogle Scholar
- Majewski J, Cohan FM: DNA sequence similarity requirements for interspecific recombination in Bacillus . Genetics 1999, 153: 1525–1533.PubMedPubMed CentralGoogle Scholar
- Archibald JM, Roger AJ: Gene conversion and the evolution of euryarchaeal chaperonins: a maximum likelihood-based method for detecting conflicting phylogenetic signals. J Mol Evol 2002, 55: 232–245. 10.1007/s00239-002-2321-5View ArticlePubMedGoogle Scholar
- Miller SR, Augustine S, Olson TL, Blankenship RE, Selker J, Wood AM: Discovery of a free-living chlorophyll d -producing cyanobacterium with a hybrid proteobacterial/cyanobacterial small-subunit rRNA gene. Proc Natl Acad Sci USA 2005, 102: 850–855. 10.1073/pnas.0405667102View ArticlePubMedPubMed CentralGoogle Scholar
- Inagaki Y, Susko E, Roger AJ: Recombination between elongation factor 1 α genes from distantly related archaeal lineages. Proc Natl Acad Sci USA 2006, 103: 4528–4533. 10.1073/pnas.0600744103View ArticlePubMedPubMed CentralGoogle Scholar
- Kubo T, Mikami T: Organization and variation of angiosperm mitochondrial genome. Physiol Plant 2007, 129: 6–13. 10.1111/j.1399-3054.2006.00768.xView ArticleGoogle Scholar
- Goremykin VV, Salamini F, Velasco R, Viola R: mtDNA of Vitis vinifera and the issue of rampant horizontal gene transfer. Mol Biol Evol 2008, 26: 99–110. 10.1093/molbev/msn226View ArticlePubMedGoogle Scholar
- Bergthorsson U, Adams KL, Thomason B, Palmer JD: Widespread horizontal transfer of mitochondrial genes in flowering plants. Nature 2003, 424: 197–201. 10.1038/nature01743View ArticlePubMedGoogle Scholar
- Barkman TJ, McNeal JR, Lim SH, Coat G, Croom HB, Young ND, Depamphilis CW: Mitochondrial DNA suggests at least 11 origins of parasitism in angiosperms and reveals genomic chimerism in parasitic plants. BMC Evol Biol 2007, 7: 248. 10.1186/1471-2148-7-248View ArticlePubMedPubMed CentralGoogle Scholar
- Wolfe KH, Li WH, Sharp PM: Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci USA 1987, 84: 9054–9058. 10.1073/pnas.84.24.9054View ArticlePubMedPubMed CentralGoogle Scholar
- Mower JP, Touzet P, Gummow JS, Delph LF, Palmer JD: Extensive variation in synonymous substitution rates in mitochondrial genes of seed plants. BMC Evol Biol 2007, 7: 135. 10.1186/1471-2148-7-135View ArticlePubMedPubMed CentralGoogle Scholar
- Drouin G, Prat F, Ell M, Clarke GD: Detecting and characterizing gene conversions between multigene family members. Mol Biol Evol 1999, 16: 1369–1390.View ArticlePubMedGoogle Scholar
- Posada D, Crandall KA: Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA 2001, 98: 13757–13762. 10.1073/pnas.241370698View ArticlePubMedPubMed CentralGoogle Scholar
- Wiuf C, Christensen T, Hein J: A simulation study of the reliability of recombination detection methods. Mol Biol Evol 2001, 18: 1929–1939.View ArticlePubMedGoogle Scholar
- Posada D: Evaluation of methods for detecting recombination from DNA sequences: empirical data. Mol Biol Evol 2002, 19: 708–717.View ArticlePubMedGoogle Scholar
- Hao W, Palmer JD: Fine-scale mergers of chloroplast and mitochondrial genes create functional, transcompartmentally chimeric mitochondrial genes. Proc Natl Acad Sci USA 2009, 106: 16728–16733. 10.1073/pnas.0908766106View ArticlePubMedPubMed CentralGoogle Scholar
- Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, Muller KF, Guisinger-Bellian M, Haberle RC, Hansen AK, Chumley TW, Lee SB, Peery R, McNeal JR, Kuehl JV, Boore JL: Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA 2007, 104: 19369–19374. 10.1073/pnas.0709121104View ArticlePubMedPubMed CentralGoogle Scholar
- Moore MJ, Bell CD, Soltis PS, Soltis DE: Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA 2007, 104: 19363–19368. 10.1073/pnas.0708072104View ArticlePubMedPubMed CentralGoogle Scholar
- Rambaut A, Grassly NC: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 1997, 13: 235–238.PubMedGoogle Scholar
- Lefebure T, Stanhope MJ: Evolution of the core and pan-genome of Streptococcus : positive selection, recombination, and genome composition. Genome Biol 2007., 8(5): 10.1186/gb-2007-8-5-r71Google Scholar
- Marri PR, Hao W, Golding GB: Gene gain and gene loss in Streptococcus : Is it driven by habitat? Mol Biol Evol 2006, 23: 2379–2391. 10.1093/molbev/msl115View ArticlePubMedGoogle Scholar
- Springman AC, Lacher DW, Wu G, Milton N, Whittam TS, Davies HD, Manning SD: Selection, recombination, and virulence gene diversity among group B streptococcal genotypes. J Bacteriol 2009, 191: 5419–5427. 10.1128/JB.00369-09View ArticlePubMedPubMed CentralGoogle Scholar
- Lopez P, Casane D, Philippe H: Heterotachy, an important process of protein evolution. Mol Biol Evol 2002, 19: 1–7.View ArticlePubMedGoogle Scholar
- McGuire G, Wright F: TOPAL: recombination detection in DNA and protein sequences. Bioinformatics 1998, 14: 219–220. 10.1093/bioinformatics/14.2.219View ArticlePubMedGoogle Scholar
- Salminen MO, Carr JK, Burke DS, McCutchan FE: Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. AIDS Res Hum Retroviruses 1995, 11: 1423–1425. 10.1089/aid.1995.11.1423View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.