OrgConv: detection of gene conversion using consensus sequences and its application in plant mitochondrial and chloroplast homologs
© Hao. 2010
Received: 6 October 2009
Accepted: 2 March 2010
Published: 2 March 2010
Skip to main content
© Hao. 2010
Received: 6 October 2009
Accepted: 2 March 2010
Published: 2 March 2010
The ancestry of mitochondria and chloroplasts traces back to separate endosymbioses of once free-living bacteria. The highly reduced genomes of these two organelles therefore contain very distant homologs that only recently have been shown to recombine inside the mitochondrial genome. Detection of gene conversion between mitochondrial and chloroplast homologs was previously impossible due to the lack of suitable computer programs. Recently, I developed a novel method and have, for the first time, discovered recurrent gene conversion between chloroplast mitochondrial genes. The method will further our understanding of plant organellar genome evolution and help identify and remove gene regions with incongruent phylogenetic signals for several genes widely used in plant systematics. Here, I implement such a method that is available in a user friendly web interface.
OrgConv (Organellar Conversion) is a computer package developed for detection of gene conversion between mitochondrial and chloroplast homologous genes. OrgConv is available in two forms; source code can be installed and run on a Linux platform and a web interface is available on multiple operating systems. The input files of the feature program are two multiple sequence alignments from different organellar compartments in FASTA format. The program compares every examined sequence against the consensus sequence of each sequence alignment rather than exhaustively examining every possible combination. Making use of consensus sequences significantly reduces the number of comparisons and therefore reduces overall computational time, which allows for analysis of very large datasets. Most importantly, with the significantly reduced number of comparisons, the statistical power remains high in the face of correction for multiple tests.
Both the source code and the web interface of OrgConv are available for free from the OrgConv website http://www.indiana.edu/~orgconv. Although OrgConv has been developed with main focus on detection of gene conversion between mitochondrial and chloroplast genes, it may also be used for detection of gene conversion between any two distinct groups of homologous sequences.
Efforts to detect gene conversion and homologous recombination (HR) have increased in the past two decades [1, 2]. This has sparked the development of many computer programs, such as RDP, geneconv, Max χ 2, Homoplasy test , Phi and many others. In general, the vast majority of interspecific HR events involve closely related species [8–11], and the frequency of HR tends to decrease sharply with the level of relatedness between donor and recipient [12–14]. Nonetheless, several cases of gene conversion between distantly related homologous sequences have been reported in recent years [15–17]. Mitochondria (mt) and chloroplasts (cp) originated from endosymbiotic bacteria and last shared common ancestry some 2 billion years ago. Plant mitochondrial genomes harbor a significant amount of chloroplast sequences (up to 8.8% of the complete mitochondrial genomes) due to intracellular gene transfer from chloroplast to mitochondria [18, 19]. The coexistence of homologous genes inside the mitochondrial genome creates the potential for gene conversion between ancient homologs. Plant mitochondrial and chloroplast genomes share 3 ribosomal RNA genes and about half of the 40 protein coding genes, which together serve as the substrate for recombination. The discovery of several chimeric plant mitochondrial genes, in this case between native and horizontally transferred mitochondrial genes [20, 21], further suggest that mitochondrial genes are involved in recombination/conversion during or after DNA exchange events. Despite this abundance of factors that would seem to facilitate conversion in mitochondrial genes, evidence of gene conversion from ancient chloroplast homologs into mitochondrial genes has, until recently, not been shown. One possible reason is that the relatively low substitution rate in both plant mitochondrial and chloroplast genes [22, 23] prevents mt-cp conversion from being detected, since both empirical and simulation studies have shown that all existing programs are not sensitive at very low sequence diversity [24–27]. In this article, I describe a new method  that makes use of consensus sequences, which have good computational efficiency and retain high statistical power. The development of the method led us to a discovery of recurrent conversion between the mitochondrial and chloroplast homologs of the alpha subunit of ATP synthase in the mitochondrial genes . Here, I implement the method into a computer program, and make it available for the public in both source code and a user friendly web interface.
where L is the length of informative sites, N is the length of the putative recombinant segment, M is the number of common nucleotides shared between the putative recombinant sequences, and p is the proportion of nucleotides common between the same pair of sequences. There are non-overlapping windows of size N in the sequence (L sites). The term was used in the RDP method to correct for multiple windows.
In this study, two improvements were made to the above calculation. 1), the parameter p (the proportion of nucleotides common between sequences) was calculated from the sequence excluding the examined region instead of from the entire sequence. The calculation based on the entire sequence in the original RDP method is under the null hypothesis that there is no recombination. However, when there is recombination, the proportion of nucleotides common between the entire donor and recipient sequences is inflated because of the recombinant region, and consequently the calculated probability P will be less significant than it should be. It would therefore be more appropriate to exclude the examined region from the overall p calculation. 2), in addition to the term , a second term (L - N) was introduced to correct for multiple windows. In this study, calculation was performed in sliding-windows by incrementing one informative site at a time. For a given window-size N, there are (L - N) instead of windows, but these (L-N) windows are not independent from each other. The "effective" number of windows that need to be corrected for multiple tests should fall between (L - N) and . The use of (L - N) will present an upper bound of the probability P. Both P -values based on the term and L - N are presented in the output.
Unlike in the RDP program, the size of the sliding window is not fixed in the OrgConv package. Instead, from the site where windows begin, the final window size is from the window that has the smallest P-value. This is computationally more expensive than the calculation using a fixed window-size in the RDP program. This computationally expensive calculation is used in the program because there is no easy way for users to pre-set any window-size that will be guaranteed to be optimal for their data. Finally, the performance of the improved calculations and the original RDP method was evaluated via simulation.
Substantially increased comparisons and decreased P-values to be considered significant for Bonferroni correction along with the increase of sequences if every possible combination is calculated
Using 2 sequences (e.g. geneconv)
Using 3 sequences (e.g. RDP)
5.0 × 103
1.0 × 10-05
1.6 × 105
3.1 × 10-07
5.0 × 105
1.0 × 10-07
1.7 × 108
2.9 × 10-10
5.0 × 107
1.0 × 10-09
1.7 × 1011
2.9 × 10-13
Performance of mtcpconv, geneconv, and RDP on a various number of mitochondrial atp1 and chloroplast atpA sequences in different angiosperm groups
mt-cp segments detected
Sequence group ‡
The use of consensus sequences carries a risk that recombination events involving chloroplast regions that differ significantly from the chloroplast consensus sequence will be missed. A possible approach to overcoming this is to compare mitochondrial sequences against chloroplast sequences from closely related species. Indeed, 23 segments in the Asterids group did show slightly smaller initial (before Bonferroni correction) P-values when comparing Asterids mitochondrial and chloroplast sequences than comparing all angiosperm mitochondrial and chloroplast sequences (data not shown). However, comparison of sequences from the same taxonomic group might not always outperform comparison of larger groups. For example, Myrtus communis has been detected to have a mtcp-conversion by analyzing the entire angiosperms dataset with two P-values of 6.19 × 10-08 and 4.70 × 10-06, whereas the P-values when comparing Rosids mitochondrial and chloroplast genes are only 2.74 × 10-03 and 2.50 × 10-01, and not considered to be significant in Figure 4.
The OrgConv package was developed for detection of mt-cp conversion. It makes use of the consensus sequence from each group of sequences and compare each examined sequence against consensus sequences rather than examining every possible sequence combination. By doing so, computational burden has been significantly reduced and it becomes feasible to analyze very large data sets. More importantly, the statistical power of the program is retained in the face of Bonferroni correction because of the reduced number of comparisons. Furthermore, although developed for detection of mt-cp converson, the program may be applied on other sequences than mitochondrial and chloroplast sequences, e.g., when two large groups of sequences have very low diversity within each group and high diversity between groups.
Project name: OrgConv
Project home page: http://www.indiana.edu/~orgconv
Operating system(s): Linux for the distributed source code and operating systems independent for the web-interface
Programming language: C++ for the source code and Perl CGI scripts for the web-interface
License: Free for academic use
horizontal gene transfer
gene conversion between mitochondrial and chloroplast genes
I thank Jeffrey Palmer, Andy Alverson, and Danny Rice for helpful discussions and I acknowledge the High Performance Systems at Indiana University for computational facility. I would also like to thank an anonymous reviewer for suggesting the removal of the examined region in the calculation of the proportion of common nucleotides (p), and the use of both and (L - N) to correct for multiple windows in the calculation of P-values. This research was supported in part by a Natural Sciences and Engineering Council of Canada (NSERC) postdoctoral fellowship to W.H., by National Institutes of Health Grant R01-GM-70612 and by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. (both to Jeffrey D. Palmer).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.