Changes in transcriptional orientation are associated with increases in evolutionary rates of enterobacterial genes
© Lin et al; licensee BioMed Central Ltd. 2011
Published: 5 October 2011
Changes in transcriptional orientation (“CTOs”) occur frequently in prokaryotic genomes. Such changes usually result from genomic inversions, which may cause a conflict between the directions of replication and transcription and an increase in mutation rate. However, CTOs do not always lead to the replication-transcription confrontation. Furthermore, CTOs may cause deleterious disruptions of operon structure and/or gene regulations. The currently existing CTOs may indicate relaxation of selection pressure. Therefore, it is of interest to investigate whether CTOs have an independent effect on the evolutionary rates of the affected genes, and whether these genes are subject to any type of selection pressure in prokaryotes.
Three closely related enterbacteria, Escherichia coli, Klebsiella pneumoniae and Salmonella enterica serovar Typhimurium, were selected for comparisons of synonymous (dS) and nonsynonymous (dN) substitution rate between the genes that have experienced changes in transcriptional orientation (changed-orientation genes, “COGs”) and those that do not (same-orientation genes, “SOGs”). The dN/dS ratio was also derived to evaluate the selection pressure on the analyzed genes. Confounding factors in the estimation of evolutionary rates, such as gene essentiality, gene expression level, replication-transcription confrontation, and decreased dS at gene terminals were controlled in the COG-SOG comparisons.
We demonstrate that COGs have significantly higher dN and dS than SOGs when a series of confounding factors are controlled. However, the dN/dS ratios are similar between the two gene groups, suggesting that the increase in dS can sufficiently explain the increase in dN in COGs. Therefore, the increases in evolutionary rates in COGs may be mainly mutation-driven.
Here we show that CTOs can increase the evolutionary rates of the affected genes. This effect is independent of the replication-transcription confrontation, which is suggested to be the major cause of inversion-associated evolutionary rate increases. The real cause of such evolutionary rate increases remains unclear but is worth further explorations.
Genome rearrangements occur frequently in the evolution of prokaryotes. Among the rearrangement events, inversions usually occur symmetrically around the origin (designated as “Ori”) or terminus (“Ter”) of replication between closely related bacterial genomes . These rearrangement events often lead to changes in transcriptional orientation (designated as “CTOs”)  and increases in mutation pressure in the affected genes because of a conflict between the directions of transcription and replication [1, 3, 4]. However, when the inversion events result from the flipping across Ori or Ter, replication-transcription conflict may not occur . Furthermore, CTOs may cause changes in homology-based recombination or impediments in DNA replication (by altering, for example, DNA protein binding sites or secondary structure), which may increase mutation rate. Meanwhile, CTOs may lead to disruptions of operon structure and transcriptional regulations, which are potentially deleterious. Therefore, currently existing CTOs may signify relaxation of selection pressure on the affected genes. Nevertheless, it remains unknown whether selection actually plays an important role in maintaining CTOs. In view of the potential influences of CTOs on gene evolution, we are interested in investigating whether CTO per se has any effect on the evolutionary rates of prokaryotic genes.
Note that the evolution of prokaryotic genes is driven by two major forces: mutation and natural selection. An increased mutation rate can accelerate short-term nucleotide substitutions, which are then retained or eliminated by natural selection according to their fitness effects. The evolutionary rates of prokaryotic genes mainly reflect the combinatorial effects of these two forces. In general, substitutions that cause protein sequence changes (nonsynonymous substitutions) are subject to stronger selection pressure than those that do not (synonymous substitutions). By comparing nonsynonymous (dN) and synonymous substitution rate (dS), we may evaluate whether mutation or selection plays a more important role in the evolution of prokaryotic genes.
Here we attempt to examine (1) the effects of CTOs on prokaryotic gene evolution; and (2) the relative contributions of the two abovementioned driving forces to dS and dN in the genes that have experienced CTOs. To this end, we compared the dS and dN between the genes that have experienced CTOs (changed-orientation genes, “COGs”) and those that do not (same-orientation genes, “SOGs”) in closed related prokaryotic genomes. In the case of mutation-driven evolution, COGs and SOGs should have significantly different dS but approximately the same dN/dS ratios. Alternatively, if selection has been the major driver of the changes in evolutionary rates, the dN/dS ratios are expected to differ significantly between the two gene groups. Clarifying the molecular mechanisms by which prokaryotic genomes evolve may help us understand how prokaryotes develop novel functions, which is in turn relevant to ecological and biomedical studies.
Accordingly, we compared the genomes of three closely related enterobacteria [5, 6]: Escherichia coli (ECO), Klebsiella pneumonia (KPN) and Salmonella enterica serovar Typhimurium (STM). A series of analyses were performed to control for potential confounding factors, including gene essentiality, expression level, background mutation rate, the pattern of replication-transcription confrontation, and codon usage bias. These potential confounding factors have been reported to be associated with evolutionary rates. For example, highly expressed genes, essential genes, and genes with large codon usage bias tend to evolve slowly [7–9]. Meanwhile, a higher mutation rate was observed in the genes near Ter and the genes that were subject to orientation conflicts between DNA replication and transcription . Our results suggest that COGs have significantly higher dN and dS than SOGs. Such increases are independent of the analyzed confounding factors, and are mainly mutation-driven.
The complete genomic sequences and gene annotations of Escherichia coli K-12 MG 1655 (GenBank accession number U00096), Klebsiella pneumonia MGH 78578 (CP000647) and Salmonella enterica serovar Typhimurium LT2 (AE006468) were retrieved from the GenBank . These species were selected because their genomes had been completely sequenced and carefully annotated, and because genome-scale gene expression data were available for these species. The orthologous gene pairs were identified according to reciprocal BLASTP best matches with the following parameters: (1) E value < 1.0 x 10-5 ; and (2) > 50% amino acid sequence identity. Only the orthologous genes that were found in all of the three compared species were retained. A total of 2,574 one-to-one orthologous gene groups were therefore obtained for subsequent analyses. Since one of our analyses measured the evolutionary rates separately for the terminal and central regions for individual genes, the analyzed genes must be sufficiently long to reduce variations in evolutionary rate estimates. A previous study suggested that the first 50 and the last 20 codons usually evolved more slowly than the rest [12, 13]. Therefore, we took twice the length of 70 (2*(50+20) = 140) to include a central region with a minimal length of 70 codons. Accordingly, genes that are shorter than 420 bp (140*3) were discarded. Note that this practice will lead to differences in the number of analyzed gene pairs between different pair-wise species comparisons. The numbers of analyzed gene pairs are listed in Additional file 1. Since the reciprocal-BLASTP approach may not be an optimal solution for finding orthologous genes, we also retrieved orthologous genes for the analyzed species from the OMA database for comparison . In fact, over 98% of the orthologous gene pairs retrieved from OMA were identical to those identified by using reciprocal BLASTP matches (see Additional file 2). We actually observed similar results when these 98% of orthologous genes were analyzed (i.e. COGs have significantly higher dS and dN but similar dN/dS when compared with SOGs; all p<0.05, see Additional file 3).
The locations of Ori and Ter were determined by using the program Oriloc .
Estimation of evolutionary rates and statistical analyses
The amino acid sequences of orthologous gene pairs were aligned by using ClustalW 2.0  and back-translated to nucleotide sequences. dS and dN were estimated by using the CODEML module of PAML 4.1 . Only orthologous gene pairs with a dS value of < 3 were considered . Note that this will also lead to differences in the number of analyzed genes in different pair-wise species comparisons. In the end, 1,784, 1,809 and 2,069 orthologous gene pairs were obtained, respectively, for the ECO-KPN, STM-KPN, and ECO-STM comparison (Additional file 1). Since dS > 3 may indicate an extremely high mutation rate and loss of function, we also investigated whether COGs tend to have dS > 3. Indeed, the proportion of COGs with dS > 3 is higher than that of all of the analyzed genes (see Additional file 4). This observation is in fact consistent with our results that COGs tend to evolve faster than SOGs.
Spearman’s rank correlations and partial correlations between evolutionary rates and the rate-determining factors were performed by using the R program (http://www.r-project.org). The statistical significance of the evolutionary rate differences was evaluated by using the Wilcoxon Rank Sum test throughout the study.
Analysis of gene expression data
The genome-scale gene expression data of ECO (GSE15534) and STM (GSE11486) were downloaded from the Gene Expression Omnibus (GEO) database . The gene expression data of KPN were generated by using a custom-made NimbleGen tiling array (provided by Dr. Bernhard O. Palsson at University of California, San Diego). In all of the three data sets, the gene expression levels were measured separately at the log and stationary phase. The signal intensity of gene expression was log2-transformed and normalized by using quantile normalization. For cross-species comparisons, the gene expression levels for each species were standardized to a median value of 0 and a variance of 1.
Functional enrichment analysis
Gene Ontology (GO) enrichment analysis was performed by using the DAVID Bioinformatics Resources [20, 21]. Since the genes of ECO are more extensively studied than those of the other two species, only the COGs in ECO-STM and ECO-KPN comparisons were included in this analysis. The gene clustering and GO term enrichment were assessed with reference to the enrichment score and the p-values of the modified Fisher’s Exact test.
Results and discussion
COGs have higher dN and dS but similar dN/dS ratio when compared with SOGs
Confounding factor 1: gene essentiality
The median evolutionary rates of COGs and SOGs when different confounding factors are controlled.
non-highly expressed gene
ECO-KPN (log phase)
ECO-KPN (stationary phase)
STM-KPN (log phase)
STM-KPN (stationary phase)
ECO-KPN (log phase)
ECO-KPN (stationary phase)
STM-KPN (log phase)
STM-KPN (stationary phase)
Confounding factor 2: gene expression level
Highly expressed genes are under strong selection pressure to maintain functional stability, and thus usually evolve more slowly than lowly expressed genes [23, 24]. The question now is whether SOGs tend to be highly expressed, so that they evolve more slowly than COGs. We thus classified the analyzed genes into highly (top 20%) and non-highly (other 80%) expressed genes. In the ECO-KPN and STM-KPN comparisons, the dS of COGs is significantly larger than SOGs in both highly and non-highly expressed genes for both growth phases (stationary and log phase, Table 1). dN values also show similar trends. However, the dN/dS ratios are approximately the same between COGs and SOGs (Table 1). Notably, it has been previously reported that mutation rate may increase with gene expression level [25–27]. Therefore, it is of interest to investigate whether the increases in evolutionary rate in COGs are associated with increased expression levels in these genes. However, our results indicate that this is not the case. Highly expressed COGs actually have lower dN and dS than non-highly expressed COGs (Additional file 6). Therefore, the increases in evolutionary rates (particularly dS) in COGs do not result from increased expression levels in these genes.
Confounding factor 3: background mutation rate
Confounding factor 4: replication–transcription confrontation
The head-on collision between the directions of transcription and replication has been reported to increase the mutation rate of the affected genes [3, 10, 29]. When a gene is involved in an inversion event, its transcriptional orientation is likely to change, causing replication-transcription confrontation. We thus investigated whether COGs have replication-transcription confrontation more often than SOGs. Interestingly, most of the COGs maintain their status in terms of replication-transcription confrontation. For example, using KPN as an outgroup, we found that 157 of 162 COGs in the ECO-STM comparison maintain the status of head-on collision or co-orientation between replication and transcription after the occurrences of the inversion events. Meanwhile, the ratio of head-on-collision to co-orientation genes in COGs and SOGs are 1.1 and 0.64, respectively in the ECO–STM comparison. Therefore, COGs are in fact more likely to be head-on-collision genes than SOGs. To further clarify whether the difference in the degree of head-on collision is a major determinant of the COG-SOG differences in evolutionary rate, we compared the evolutionary rates of COGs and SOGs by controlling the replication-transcription confrontation pattern. In the ECO-STM comparison, the rates of dN, dS and dN/dS of head-on-collision COGs do not differ significantly from co-orientation COGs (Additional File 8). By contrast, head-on-collision SOGs have significantly higher evolutionary rates than co-orientation SOGs. These results suggest that the higher evolutionary rates of COGs do not result from a higher proportion of head-on-collision genes. Rather, CTO by itself may be the major reason for the increased evolutionary rate.
Confounding factor 5: decreased dS at gene terminals (codon usage bias)
Functional enrichment analysis of COGs
We next ask whether the rapidly evolving COGs are enriched in certain functional categories. Additional file 10 indicates that the protein products of COGs tend to be located on membranes and cell walls, and be involved in a variety of metabolic reactions. This observation is biologically sensible because (1) proteins that are located on the cell surface tend to evolve faster; and (2) different bacterial species have very different metabolic capacities. However, the relationship between CTOs and the functional preferences of these genes remains unknown.
To our knowledge, this is the first study to demonstrate that changes in transcriptional orientation may increase the evolutionary rates (dN and dS) of prokaryotic genes. We show that the evolutionary effects of CTOs are independent of gene essentiality, gene expression level, replication-transcriptional confrontation, or the decrease in dS at gene terminals. However, the increase in dN may be partially related to gene locations. Furthermore, our results suggest that the increases in evolutionary rates in COGs are mainly mutation-driven, as the dN/dS ratios are similar between COGs and SOGs. The real cause of the increases in evolutionary rate in COGs (particularly dS) remains unclear but is worth further explorations. It is speculated that CTOs may somehow result in impediments in DNA replication (by altering, for example, DNA protein binding sites or secondary structure), which may in turn lead to the recruitment of error-prone DNA repair polymerases and the increase in mutation rate [30, 31].
We thank Dr. Bernhard O. Palsson and Dr. Jay Hong for sharing the K. pneumoniae gene expression data. This work was supported by the intramural funding of National Health Research Institutes (NHRI), Taiwan (PH-100-SP-02), and represented part of the collaborative project between NHRI and Dr. Bernhard O. Palsson’s laboratory at University of California, San Diego.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 9, 2011: Proceedings of the Ninth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S9.
- Mackiewicz P, Mackiewicz D, Kowalczuk M, Cebrat S: Flip-flop around the origin and terminus of replication in prokaryotic genomes. Genome Biol 2001, 2(12):INTERACTIONS1004.PubMedPubMed CentralView ArticleGoogle Scholar
- Eisen JA, Heidelberg JF, White O, Salzberg SL: Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 2000, 1(6):RESEARCH0011.PubMedPubMed CentralView ArticleGoogle Scholar
- Srivatsan A, Tehranchi A, MacAlpine DM, Wang JD: Co-orientation of replication and transcription preserves genome integrity. PLoS Genet 2010, 6(1):e1000810. 10.1371/journal.pgen.1000810PubMedPubMed CentralView ArticleGoogle Scholar
- Tillier ER, Collins RA: Replication orientation affects the rate and direction of bacterial gene evolution. J Mol Evol 2000, 51(5):459–463.PubMedGoogle Scholar
- Blattner FR, Plunkett G 3rd, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, et al.: The complete genome sequence of Escherichia coli K-12. Science 1997, 277(5331):1453–1462. 10.1126/science.277.5331.1453PubMedView ArticleGoogle Scholar
- McClelland M, Sanderson KE, Spieth J, Clifton SW, Latreille P, Courtney L, Porwollik S, Ali J, Dante M, Du F, et al.: Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 2001, 413(6858):852–856. 10.1038/35101614PubMedView ArticleGoogle Scholar
- Drummond DA, Wilke CO: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 2008, 134(2):341–352. 10.1016/j.cell.2008.05.042PubMedPubMed CentralView ArticleGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 2002, 12(6):962–968.PubMedPubMed CentralView ArticleGoogle Scholar
- Sharp PM, Emery LR, Zeng K: Forces that influence the evolution of codon bias. Philos Trans R Soc Lond B Biol Sci 2010, 365(1544):1203–1212. 10.1098/rstb.2009.0305PubMedPubMed CentralView ArticleGoogle Scholar
- Mira A, Ochman H: Gene location and bacterial sequence divergence. Mol Biol Evol 2002, 19(8):1350–1358. 10.1093/oxfordjournals.molbev.a004196PubMedView ArticleGoogle Scholar
- GenBank Database[ftp://ftp.ncbi.nih.gov/genomes/Bacteria/]
- Eyre-Walker A, Bulmer M: Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res 1993, 21(19):4599–4603. 10.1093/nar/21.19.4599PubMedPubMed CentralView ArticleGoogle Scholar
- Eyre-Walker A: The close proximity of Escherichia coli genes: consequences for stop codon and synonymous codon use. J Mol Evol 1996, 42(2):73–78. 10.1007/BF02198830PubMedView ArticleGoogle Scholar
- Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C: OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Research 2011, 39(suppl 1):D289-D294.PubMedPubMed CentralView ArticleGoogle Scholar
- Frank AC, Lobry JR: Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes. Bioinformatics 2000, 16(6):560–561. 10.1093/bioinformatics/16.6.560PubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22(22):4673–4680. 10.1093/nar/22.22.4673PubMedPubMed CentralView ArticleGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 1997, 13(5):555–556.PubMedGoogle Scholar
- Smith NG, Eyre-Walker A: Nucleotide substitution rate estimation in enterobacteria: approximate and maximum-likelihood methods lead to similar conclusions. Mol Biol Evol 2001, 18(11):2124–2126. 10.1093/oxfordjournals.molbev.a003754PubMedView ArticleGoogle Scholar
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30(1):207–210. 10.1093/nar/30.1.207PubMedPubMed CentralView ArticleGoogle Scholar
- Huang da W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009, 4(1):44–57.PubMedView ArticleGoogle Scholar
- Huang da W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009, 37(1):1–13. 10.1093/nar/gkn923PubMedPubMed CentralView ArticleGoogle Scholar
- Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2006, 2: 2006 0008.PubMedPubMed CentralView ArticleGoogle Scholar
- Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH: Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 2005, 102(40):14338–14343. 10.1073/pnas.0504070102PubMedPubMed CentralView ArticleGoogle Scholar
- Cherry JL: Expression level, evolutionary rate, and the cost of expression. Genome Biol Evol 2010, 2: 757–769. 10.1093/gbe/evq059PubMedPubMed CentralView ArticleGoogle Scholar
- Akashi H: Gene expression and molecular evolution. Curr Opin Genet Dev 2001, 11(6):660–666. 10.1016/S0959-437X(00)00250-1PubMedView ArticleGoogle Scholar
- Hudson RE, Bergthorsson U, Ochman H: Transcription increases multiple spontaneous point mutations in Salmonella enterica. Nucleic Acids Res 2003, 31(15):4517–4522. 10.1093/nar/gkg651PubMedView ArticleGoogle Scholar
- Aguilera A: The connection between transcription and genomic instability. EMBO J 2002, 21(3):195–201.PubMedPubMed CentralView ArticleGoogle Scholar
- Sharp PM, Shields DC, Wolfe KH, Li WH: Chromosomal location and evolutionary rate variation in enterobacterial genes. Science 1989, 246(4931):808–810. 10.1126/science.2683084PubMedView ArticleGoogle Scholar
- Wang JD, Berkmen MB, Grossman AD: Genome-wide coorientation of replication and transcription reduces adverse effects on replication in Bacillus subtilis. Proc Natl Acad Sci U S A 2007, 104(13):5608–5613. 10.1073/pnas.0608999104PubMedPubMed CentralView ArticleGoogle Scholar
- McDonald MJ, Wang WC, Huang HD, Leu JY: Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences. PLoS Biol 2011, 9(6):e1000622. 10.1371/journal.pbio.1000622PubMedPubMed CentralView ArticleGoogle Scholar
- Mirkin EV, Mirkin SM: Replication Fork Stalling at Natural Impediments. Microbiol Mol Biol Rev 2007, 71(1):13–35. 10.1128/MMBR.00030-06PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.