- Research article
- Open Access
nocoRNAc: Characterization of non-coding RNAs in prokaryotes
© Herbig and Nieselt; licensee BioMed Central Ltd. 2011
- Received: 17 August 2010
- Accepted: 31 January 2011
- Published: 31 January 2011
The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not.
We present NOCO RNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. NOCO RNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and NOCO RNAc to the genome of Streptomyces coelicolor and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner.
We have developed NOCO RNAc, a framework that facilitates the automated characterization of functional ncRNAs. NOCO RNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. NOCO RNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at http://www.zbit.uni-tuebingen.de/pas/nocornac.htm.
- Intergenic Region
- Transcription Factor Binding Site
- False Positive Prediction
- Rfam Database
- SIDD Site
In the past few years non-coding RNAs (ncRNAs) have been increasingly recognized to be involved in a variety of biological functions, especially gene regulation [1–4]. Several classes of regulatory or catalytic ncRNAs have been discovered. Some of them such as miRNAs or snoRNAs only occur in eukaryotes . In prokaryotes ncRNAs are of interest, for example because of their potential role in pathogenicity [6–9], their specialized housekeeping functions, or their involvement in various stress situations [10–12]. A special class of ncRNAs are antisense RNAs (asRNAs), which are located antisense to protein-coding genes, and which act as putative regulators via base pairing interaction with their antisense gene .
Several experimental techniques are used to identify bacterial ncRNAs [14–16]. However, these methods are laborious and expensive, especially if a large number of elements is analysed. Next-generation sequencing techniques have been applied to analyse complete transcriptomes of bacteria under various conditions, which also led to the discovery of numerous novel ncRNA transcripts [17–21]. However, ncRNAs that are not expressed under the specific conditions of the experiment will not be detected.
Therefore, computational predictions of genomic loci which contain a functional ncRNA are usually conducted to either complement the analyses of experimental data or to suggest candidates for further experiments . A plethora of computational methods for the prediction of functional ncRNAs have been developed (see  for a review). Most of them exploit the structural conservation and the higher structural stability of ncRNAs [24–29]. Other methods are based on sequence clustering , graph processing  or various machine learning approaches [32–35]. The aim of most of these methods is to identify regions that contain functional ncRNAs. However, most of the programs do not directly assess the question if the predicted ncRNA is transcribed or if it contains an untranscribed RNA motif. Furthermore, when applied to large genome alignments, programs such as RNAz use a window-based approach, so that the boundaries in particular of ncRNA transcripts are often imprecise. Another problem is the correct determination on which strand the ncRNA resides.
To address these problems some approaches, e.g. SIPHT, sRNAFinder, sRNAPredict, or sRNAscanner integrate heterogeneous data such as transcription start sites (TSS) and transcription termination signals [36–39]. In principle, known transcription factor binding sites (TFBS) could be used to predict the 5' start of ncRNA transcripts. However, the number of different transcription factors varies between species. In Streptomyces coelicolor, for example, there are 65 sigma factors  and for most of them a sequence pattern of their specific binding site is not known. Furthermore, since genome-wide TFBS annotations are often not available, a more general model is needed.
Here, we introduce NOCO RNAc (no n-co ding RNA c haracterization), a Java program for the prediction and characterization of ncRNA transcripts in bacteria. NOCO RNAc takes the coordinates of putative ncRNA loci as input and annotates them with transcriptional features to conduct strand-specific transcript predictions. While previous computational approaches to identify non-coding RNAs in bacterial genomes have restricted the analysis to intergenic regions [41–43], our approach is not limited to intergenic regions but also applied to predict cis-encoded asRNA transcripts. For the detection of the transcript's 3' end NOCO RNAc integrates the program TransTermHP  to predict Rho-independent terminator signals. The 5' start is predicted by the detection of destabilized regions in the genomic DNA. For this purpose we implemented the so-called SIDD model , which has been shown to be applicable to the detection of promoter regions in microbial genomes [46, 47]. Therefore, NOCO RNAc does not have to rely on information about known TFBS. The putative transcriptional features are then combined to classify ncRNA loci into either being an ncRNA transcript or not. For ncRNAs that are classified as transcripts the strand is automatically specified, and its boundaries are derived from the SIDD sites and the Rho-independent transcription termination signal. Those loci that are classified not to be a transcript might be false positive predictions or they contain cis-regulatory motifs. For the latter, NOCO RNAc incorporates other functionalities for the further analysis of the ncRNA loci such as the search for known RNA motifs from the Rfam database. Furthermore, NOCO RNAc provides methods for the prediction of RNA-RNA interactions between ncRNAs and mRNAs. All results can be studied in detail in NOCO RNAc's integrated interactive R environment.
We used RNAz [24, 25] and NOCO RNAc to perform a genome-wide computational screen for ncRNAs in Streptomyces coelicolor. Predictions and experimental validations of ncRNAs of S. coelicolor have been previously reported [48, 49]. All of the studies have restricted their search to intergenic regions. We used RNAz for the detection of ncRNA loci in S. coelicolor including the prediction of cis-encoded asRNA loci . These results were used to design a custom expression microarray targeting asRNA regions in the genome of S. coelicolor in addition to protein-coding genes and intergenic regions . In the trans-national Systems Biology consortium SysMO/STREAM we used this array to generate high resolution time-series gene expression data for S. coelicolor grown in fermenters . In the current study we use these data to validate predicted ncRNA transcripts as well as to compare expression profiles of asRNA transcripts predicted by NOCO RNAc with their sense partner gene.
Identification of transcription termination signals
To predict Rho-independent termination signals we integrated the program TransTermHP . This tool detects stem-loop motifs in whole genomes and scores them with respect to their potential ability to act as transcription terminators. The scoring of each motif is done for three parts, the stem, the loop and the tail, which is the single-stranded region following the 3' end of the stem-loop. The stem is scored with respect to its size and GC-richness. The loop is scored by its size and the tail is also scored with respect to nucleotide composition as, for example, a large number of AU-base pairs in this region promotes the dissociation of the transcript due to the lower stability of such base pairs. The three scores are then combined to a single confidence value for each predicted terminator.
Identification of promoter regions
For the identification of promoter regions we implemented the so-called SIDD model (Stress Induced Duplex Destabilization) . The approach not only considers the thermodynamic stability of the base pairs on a dinucleotide level, but it also takes into account the torsional energy that is needed to unwind the helix as well as the influence of superhelical stress.
Using this model, a SIDD profile is calculated for a stretch of genomic DNA. For each position it denotes the expected additional free energy needed to separate the base pair at that position. To calculate this profile for a region of length n the model has in theory to consider all 2 n possibilities to separate the helix in that region. As this would be too time-consuming, only biologically plausible separation patterns are taken into account, which results in a worst case runtime complexity of O(n3). Partition functions are used to calculate the SIDD value for each position. For further details we refer to the original publication .
We implemented the model as described in . To maximize memory and runtime efficiency only native Java arrays (int, double) were used. The calculation of the SIDD profile for a complete prokaryotic genome is accomplished by a sliding window. The SIDD calculation for the genome of S. coelicolor was conducted using a window size of 10,000 nt and a step size of 1,000 nt. Therefore, each position is contained in 10 windows and thus 10 values are calculated. We summarize them using a weighted average, where windows in which the position is near the center of the window get a higher weight than windows in which the position is near the border. This approach has been suggested in . The calculation of the SIDD profile for the genome of S. coelicolor takes about 48 h on a single core CPU and needs less than 512 MB memory. If more than one core/processor is available NOCO RNAc calculates the window profiles during the sliding window approach in parallel. Therefore, the procedure takes only some hours on a modern multicore system.
Prediction of ncRNA transcripts
All ncRNA loci are annotated with the transcriptional features that have been predicted at their locus. This annotation is used to decide if a locus potentially contains a transcript, or if it might be an untranscribed RNA motif. For the transcript prediction step, terminator signals and SIDD sites are combined. This not only allows the specification of the strand of the potential ncRNA transcript, but also the more exact delineation of the specific element. First, SIDD sites associated with predicted ncRNA regions are considered. The prediction process is applied to each SIDD site of the predicted ncRNA region, and for each site it is applied to both strands as SIDD sites are not strand-specific. Taking a SIDD site as a start point, the predicted transcript is extended in the direction of the currently processed strand. The end point is either the first high confidence terminator, which is a terminator signal with a confidence value of at least 76  or, if all signals have a lower value, the terminator with the highest confidence value which is found downstream of the SIDD site. If no terminator signals are found at all, the transcript is extended until the end of the predicted ncRNA region is reached, but only if the SIDD site, which has been taken as the start point, cannot belong to a protein-coding gene. Overlapping transcripts, which are located on the same strand, are joined after the prediction procedure. Furthermore, in the case that transcripts are predicted on both strands and the two predictions overlap, only the transcript with the better terminator confidence value is kept. The other prediction is trimmed by assigning an alternative terminator signal that is closer to the SIDD site, so that the two transcript do not overlap any more. If this is not possible, the transcript with the weaker terminator signal is discarded.
Searching the predicted elements for motifs from the Rfam database
We integrated a functionality to automatically search ncRNA loci for ncRNA motifs that are stored in the Rfam database . For this task we incorporated the programs cmsearch  and Erpin . Using a set of Rfam seeds, that can be retrieved from the database, motif descriptors are generated for both programs. By default motifs are searched with Erpin. However, for certain motifs it is not possible to setup an Erpin search automatically. In these cases cmsearch is used instead. If a multicore system is used, the procedure is parallelized.
Interactive R environment
Parts of the data structure are provided within an interactive R [56, 57] environment, allowing the user to perform a variety of statistical analyses to the results as well as to visualize them. This also includes some basic sequence operations by which the user can, for example, extract genomic sequences of previously selected features like predicted ncRNA-regions. Furthermore, each predicted ncRNA transcript can be visualized in the context of all detected transcriptional features by the use of a predefined plotting function. It is also possible to perform individual RNA-RNA interaction predictions between any elements that are contained in the environment.
All intermediate data is stored in the project folder. Therefore, it is possible to access specific feature information manually (e.g. predicted terminators or the SIDD profile). In addition, time-consuming procedures, like the SIDD calculation, only have to be performed once, as NOCO RNAc reads already produced results, if available. NOCO RNAc can also perform RNA-RNA interaction predictions utilizing IntaRNA . The user can specify the elements that will be included in the analysis. The interaction prediction can also be started in NOCO RNAc's R environment.
There are different ways to access the generated results. On the one hand all results are condensed in a single GFF file, which can be viewed by standard genome browsers. In addition, some general statistics are written to standard out, e.g. the number of ncRNA loci provided as input or the number of predicted ncRNA transcripts. On the other hand the user can access the data quite specifically by using NOCO RNAc's R environment. This is especially useful for the detailed investigation of subsets of the data or certain predicted elements that are of particular interest.
Genome-wide functional ncRNA prediction in S. coelicolor
For the genome-wide prediction of ncRNA loci we used the program RNAz , which takes a sequence alignment as input and classifies it as 'RNA' or 'OTHER'. The prediction approach of RNAz is mainly based on two principles: The first principle exploits the fact that functional ncRNAs usually exhibit a significantly more stable structure than non-functional ncRNA sequences. This is at least true if the function is based on a certain structure, which is, for example, not the case when dealing with protein-coding RNAs. The second principle is based on the so-called structure conservation index (SCI), which measures the structure conservation between the aligned sequences. It is assumed that the structure of functional RNAs is usually more conserved between related species than the structure of other sequences. The final classification is accomplished by an SVM that has been trained on the RNA families contained in the Rfam database.
As RNAz needs a multiple sequence alignment as input, we aligned the genomic sequences of S. coelicolor [RefSeq:NC_003888.3], S. avermitilis [RefSeq:NC_003155.4] and S. griseus [RefSeq:NC_010572.1] using the genome alignment software mauve (version 2.3.1) [59, 60]. The resulting alignment was converted to maf format. To be able to detect ncRNAs of different size we performed several runs of RNAz with different settings for the window size, i.e. 60, 80, 100, 120 and 160 nt. The step size was set to 20 nt. All windows that did not contain sequence information for all three species (e.g. if there is a large deletion in one of the genomes) were not considered in further analyses. After the application of RNAz, overlapping windows that had been classified as 'RNA' were joined to predicted ncRNA loci. As a threshold an SVM P-value of 0:5 was used. The predicted ncRNA loci were then used as input for NOCO RNAc.
For expression studies we used a custom-designed microarray, which contains 226,576 perfect match oligonucleotide probes interrogating 8,205 protein-coding regions, 10,834 intergenic regions with a tiling approach, and 3,672 regions antisense of protein-coding genes in the genome of Streptomyces coelicolor . In a previous study this array has been used to produce high resolution time-series expression data for the model organism Streptomyces coelicolor grown during submerged batch fermentations . S. coelicolor M145 wt was cultivated under phosphate limited conditions to monitor the effect of this limitation on the expression of protein-coding genes. Phosphate was depleted at 35 h after inoculation. Samples were taken at 32 time points, covering the interval from 20 h to 60 h after inoculation.
In order to profile the expression of the predicted ncRNA transcripts we aligned all probes of the chip to the predicted ncRNAs. All predicted transcripts that have at least 4 probes completely overlapping their genomic locus were added as a new probeset to the Affymetrix CDF descriptor of the chip. Normalized expression values were generated using RMA as described for the protein-coding genes [51, 52]. Expression profile analysis and visualization was done using Mayday .
Genome-wide detection and classification of ncRNAs
The alignment of the genomes of S. coelicolor, S. avermitilis and S. griseus produced by Mauve after pre-processing by rnazWindow covered 34.6% of S. coelicolor's genomic sequence. Starting from the genome alignment, using a desktop PC with 4 GB RAM the prediction of ncRNA loci with RNAz needed 24 hours, the computation of the SIDD profile took 48 hours, and the prediction of terminators using TransTermHP was finished after 30 s. Finally, NOCO RNAc used another 3 s for the transcript models and generation of the results.
RNAz predicted 4,707 ncRNA loci (P-value ≥ 0.5) for the reference organism S. coelicolor. Of these loci NOCO RNAc annotated 2,358 with a Rho-independent terminator signal and 2,237 with a SIDD site. Combining these annotations NOCO RNAc predicted 843 ncRNA transcripts of which 653 are located anti-sense to a protein-coding region. 10 predicted transcripts are partially overlapping a coding region in sense direction. 180 predicted transcripts are located in an intergenic region. The comparison of those elements to annotated ncRNAs revealed that 96 map to known ncRNA genes like rRNAs or tRNAs. Thus 84 putative novel intergenic ncRNA transcripts were predicted by NOCO RNAc.
A GFF file containing all predicted elements is provided as additional file 1. In addition, a table listing all predicted ncRNA transcripts together with supplementary information is provided as additional file 2.
Comparison of predicted ncRNA loci and transcripts to annotation from NCBI and Rfam for S. coelicolor
21 ncRNA genes
28 cis-regulatory motifs
For a further assessment of NOCO RNAc's performance we also applied SIPHT to the genome of S. coelicolor. SIPHT is a computational pipeline for the prediction and annotation of bacterial non-coding RNAs . This program predicts ncRNAs restricted to intergenic regions. However, it also as NOCO RNAc uses sequence and structure conservation, Rho-independent transcription terminators and, if available, transcription factor binding sites. Therefore, we deemed it to be most comparable with NOCO RNAc. We used the SIPHT web interface with standard parameters. Altogether SIPHT reported 391 intergenic ncRNA transcripts. We then also compared these results to the annotated elements. As for nocoRNAc the strand information of the predictions is taken into account. A summary of both comparisons is given in table 1. SIPHT only predicts two cis-regulatory elements incorrectly to be ncRNA transcripts, while NOCO RNAc only predicts one such element falsely. SIPHT finds 14 out of 86 known ncRNAs, while NOCO RNAc predicts 46 of these 86 correctly. In particular, SIPHT has only predicted one tRNA of the 65 annotated tRNAs, while NOCO RNAc's sensitivity for this class of ncRNAs is over 50%.
Time-series expression analysis of predicted ncRNA transcripts
For 403 of the 843 predicted ncRNA transcripts we measured the expression profile at 32 time points along the growth curve of S. coelicolor under phosphate limited conditions  using a custom design Affymetrix microarray . 92 elements are located in an intergenic region, of which 47 are putative novel ncRNA transcripts. First, we wanted to assess for how many predicted ncRNA transcripts expression was detected. As a threshold for minimal expression we choose the first quartile of the expression value distribution of the protein-coding genes. Using this threshold we found 317 of the 403 measured ncRNA transcripts to show expression in one time point at least. After variance filtering (regularized variance ≥0.025) we considered 71 of these predicted transcripts to be differentially expressed across the time-series.
For all 47 asRNAs with a variant expression profile we computed the expression profile correlation with their respective antisense genes. A boxplot of the distribution is shown in Figure 5 (right). The median pairwise correlation is 0.78 and about 75% of the pairs show an expression profile correlation above 0.4. The remaining 25% tend to have a low correlation or even a slight anticorrelation.
The genes in clusters C and D of Figure 6 encode developmental proteins involved in chromosome replication or RNA synthesis, for example. They also show a downregulation that is probably triggered by the depletion of phosphate.
A table containing expression data for all predicted ncRNA transcripts that have been measured is provided as additional file 3.
We presented NOCO RNAc, a program for the genome-wide prediction and characterization of ncRNA transcripts. As input NOCO RNAc uses predicted loci containing functional ncRNAs. In our study we used RNAz to predict the coordinates of ncRNA loci. However, NOCO RNAc is not limited to data generated by RNAz. Loci can also be predicted using other programs like QRNA  or EvoFold , for example. In addition, also loci from an RNA-seq experiment or that resulted from manual annotation can be taken as input. As NOCO RNAc itself runs on a single genome, the loci also do not have to be generated by a comparative approach. Nevertheless, we plan to integrate comparative methods in order to assess the confidence of the predicted transcriptional features that are used for transcript prediction in more detail.
For the classification which of the loci contain transcribed ncRNAs and to further characterize the loci, NOCO RNAc combines different methods for the prediction of transcriptional features. We demonstrated that NOCO RNAc is applicable to predict ncRNA transcripts in the context of previously detected ncRNA loci including strand-specification.
Most bacterial ncRNAs are transcribed from their own promoters, and transcription most often terminates at a strong Rho-independent terminator. For the detection of the latter we integrated TransTermHP. One of the main advantages of this approach is that it is very fast, and the method can define the 3' end of a transcript quite precisely. However, the model fails for transcripts whose transcription is terminated Rho-dependently. Therefore, NOCO RNAc can only be applied to those bacteria where Rho-independent termination is the major mechanism of transcription termination. One of the problems is the choice of a threshold value for a terminator signal. The authors of TransTermHP recommend to use 50 , which is implemented as default in NOCO RNAc. During transcript prediction all terminator signals detected in the genomic context of an ncRNA locus are considered and our model chooses the best one with regards to the local context and the confidence value. This, however, does not rule out that false positive predictions still remain.
For the prediction of transcription start sites NOCO RNAc integrates the SIDD model. Although SIDD sites do not specifically occur at transcription start sites [63, 64] and their association with promoter regions has mainly been shown for protein-coding genes [46, 47], we were able to show that this approach is also applicable to ncRNA genes. When comparing to the 21 known ncRNAs in S. coelicolor, we found 15 with a clear SIDD site. Though the energy value for SIDD sites of predicted ncRNAs were generally weaker than for protein-coding genes, the signal is still specific enough to detect their promoter region.
Furthermore, we also showed that there is a clear correlation of the presence of transcriptional features for an ncRNA locus and its RNAz P-value. This indicates that the transcriptional features that are used for the transcript predictions can be used to further increase the confidence of predicted ncRNAs.
NOCO RNAc does not predict long ncRNAs such as 23S ribosomal RNA. For such ncRNAs the transcript prediction is more difficult because RNAz is not able to detect a single contiguous locus for such long transcripts. Several loci scattered over the respective regions are predicted instead. This makes it very difficult to predict transcripts correctly as NOCO RNAc performs transcript prediction in the context of these ncRNA loci. Thus the quality of NOCO RNAc's transcript predictions significantly depends on the quality of the loci provided as input. Nevertheless, we have shown that NOCO RNAc can to some extent compensate inaccurate locus predictions.
To demonstrate NOCO RNAc's functionalities we have applied it to characterize non-coding RNAs in the genome of S. coelicolor. NOCO RNAc correctly predicted over 75% of the known ncRNA transcripts, and classified over 90% of the cis-regulatory motifs correctly. The identification of intergenic ncRNAs in S. coelicolor has been reported in previous studies. Pánek, et al. found 32 ncRNAs , of which we detect 15. Of the 9 ncRNAs that have been found by Swiercz, et al.  we detected 2. A comparison to SIPHT, a commonly used tool for bacterial ncRNA transcript prediction in intergenic regions, revealed that on S. coelicolor NOCO RNAc is not only competitive but slightly better with respect to ncRNA genes and the sensitivity for tRNA genes is even significantly higher. Altogether SIPHT detected more than twice as many intergenic ncRNA transcripts in comparison to NOCO RNAc, which might be due to the fact that SIPHT uses known transcription factor binding sites (TFBS) for the promoter region prediction, which are not sufficiently available for S. coelicolor, therefore possibly resulting in a larger number of false positive predictions in this organism. NOCO RNAc is superior in its general applicability, since it can always use information of promoter signals computed by the SIDD model, while TFBS data is often insufficiently available for many bacteria.
In a previous study, transcriptomic time-series data of unprecedented resolution were used to study the metabolic switch of S. coelicolor and precisely profile expression changes and allocate them to specific points of time during growth . In that study a custom design Affymetrix microarray was used that contained probes not only interrogating protein-coding genes but also predicted asRNAs regions as well as intergenic regions. Using that data thus allows not only to validate our predictions but also to compare the expression profiles of asRNAs with the protein-coding genes. Our analysis reveals that ncRNAs show similar complex expression dynamics as the coding genes, suggesting that they are involved in the same biological processes. Interestingly, antisense RNAs often showed a high expression correlation with their respective antisense gene. However, for those predicted elements for which no significant expression was detected we are not able to decide if they are false positive predictions or if they can be expressed under different conditions. As the proteome of the samples of the time-series is also currently analysed, we will integrate this data with the transcriptomic data to infer hypotheses about the potential function of the predicted ncRNA transcripts for which an expression was detected.
With NOCO RNAc we provide a program for the prediction of ncRNA transcripts to complement either in silico predictions of functional ncRNA loci or experimentally derived loci of expressed ncRNAs. A genome-wide expression study integrating the results of the application of NOCO RNAc to Streptomyces coelicolor, indicated highly interesting expression dynamics of ncRNAs.
Determining the function of ncRNAs is the major challenge following their computational prediction and experimental validation. Although there are first high-throughput methods giving rise to the functional potential of ncRNAs , the experimental assessment of functionality usually concentrates on single elements. Therefore, we integrated approaches in NOCO RNAc allowing the generation of hypotheses about the putative functionalities of the predicted elements. This includes, for example, the prediction of RNA-RNA interactions with mRNAs of protein-coding genes, which can provide hints about the potential regulatory function of the ncRNAs. A first application of this method to a subset of ncRNA transcripts predicted in S. coelicolor suggests that ncRNAs might even act as regulators in important metabolic processes such as antibiotic production.
We acknowledge the excellent technical help of K. Klein, J. Prechtel, S. Poths and M. Walter at the Microarray Facility Tübingen. We thank Florian Battke and Stephan Symons for proof-reading the manuscript. This project was supported by the BMBF grant No 0315003.
- Eddy SR: Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2001, 2(12):919–929. 10.1038/35103511View ArticlePubMedGoogle Scholar
- Storz G: An expanding universe of noncoding RNAs. Science 2002, 296(5571):1260–1263. 10.1126/science.1072249View ArticlePubMedGoogle Scholar
- Repoila F, Darfeuille F: Small regulatory non-coding RNAs in bacteria: physiology and mechanistic aspects. Biol Cell 2009, 101(2):117–131. 10.1042/BC20070137View ArticlePubMedGoogle Scholar
- Barrandon C, Spiluttini B, Bensaude O: Non-coding RNAs regulating the transcriptional machinery. Biol Cell 2008, 100(2):83–95. 10.1042/BC20070090View ArticlePubMedGoogle Scholar
- Wang X, Song X, Glass CK, Rosenfeld MG: The Long Arm of Long Noncoding RNAs: Roles as Sensors Regulating Gene Transcriptional Programs. Cold Spring Harb Perspect Biol 2011., 3: 10.1101/cshperspect.a003756Google Scholar
- Pichon C, Felden B: Small RNA genes expressed from Staphylococcus aureus genomic and pathogenicity islands with specific expression among pathogenic strains. Proc Natl Acad Sci USA 2005, 102(40):14249–14254. 10.1073/pnas.0503838102PubMed CentralView ArticlePubMedGoogle Scholar
- Giangrossi M, Prosseda G, Tran CN, Brandi A, Colonna B, Falconi M: A novel antisense RNA regulates at transcriptional level the virulence gene icsA of Shigella flexneri. Nucleic Acids Res 2010, 38(10):3362–3375. 10.1093/nar/gkq025PubMed CentralView ArticlePubMedGoogle Scholar
- Geissmann T, Chevalier C, Cros MJ, Boisset S, Fechter P, Noirot C, Schrenzel J, Francçis P, Vandenesch F, Gaspin C, Romby P: A search for small noncoding RNAs in Staphylococcus aureus reveals a conserved sequence motif for regulation. Nucleic Acids Res 2009, 37(21):7239–7257. 10.1093/nar/gkp668PubMed CentralView ArticlePubMedGoogle Scholar
- Abu-Qatouseh LF, Chinni SV, Seggewiss J, Proctor RA, Brosius J, Rozhdestvensky TS, Peters G, von Eiff C, Becker K: Identification of differentially expressed small non-protein-coding RNAs in Staphylococcus aureus displaying both the normal and the small-colony variant phenotype. J Mol Med 2010, 88(6):565–575. 10.1007/s00109-010-0597-2View ArticlePubMedGoogle Scholar
- Muffler A, Fischer D, Hengge-Aronis R: The RNA-binding protein HF-I, known as a host factor for phage Qbeta RNA replication, is essential for rpoS translation in Escherichia coli. Genes Dev 1996, 10(9):1143–1151. 10.1101/gad.10.9.1143View ArticlePubMedGoogle Scholar
- Zhang A, Altuvia S, Tiwari A, Argaman L, Hengge-Aronis R, Storz G: The OxyS regulatory RNA represses rpoS translation and binds the Hfq (HF-I) protein. EMBO J 1998, 17(20):6061–6068. 10.1093/emboj/17.20.6061PubMed CentralView ArticlePubMedGoogle Scholar
- Sledjeski DD, Whitman C, Zhang A: Hfq is necessary for regulation by the untranslated RNA DsrA. J Bacteriol 2001, 183(6):1997–2005. 10.1128/JB.183.6.1997-2005.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Brantl S: Regulatory mechanisms employed by cis-encoded antisense RNAs. Curr Opin Microbiol 2007, 10(2):102–109. 10.1016/j.mib.2007.03.012View ArticlePubMedGoogle Scholar
- Hüttenhofer A, Vogel J: Experimental approaches to identify non-coding RNAs. Nucleic Acids Res 2006, 34(2):635–646.PubMed CentralView ArticlePubMedGoogle Scholar
- Altuvia S: Identification of bacterial small non-coding RNAs: experimental approaches. Curr Opin Microbiol 2007, 10(3):257–261. 10.1016/j.mib.2007.05.003View ArticlePubMedGoogle Scholar
- Sharma CM, Vogel J: Experimental approaches for the discovery and characterization of regulatory small RNA. Curr Opin Microbiol 2009, 12(5):536–546. 10.1016/j.mib.2009.07.006View ArticlePubMedGoogle Scholar
- Sittka A, Sharma CM, Rolle K, Vogel J: Deep sequencing of Salmonella RNA associated with heterologous Hfq proteins in vivo reveals small RNAs as a major target class and identifies RNA processing phenotypes. RNA Biol 2009, 6(3):266–275. 10.4161/rna.6.3.8332View ArticlePubMedGoogle Scholar
- Albrecht M, Sharma CM, Reinhardt R, Vogel J, Rudel T: Deep sequencing-based discovery of the Chlamy-dia trachomatis transcriptome. Nucleic Acids Res 2010, 38(3):868–877. 10.1093/nar/gkp1032PubMed CentralView ArticlePubMedGoogle Scholar
- Irnov I, Sharma CM, Vogel J, Winkler WC: Identification of regulatory RNAs in Bacillus subtilis. Nucleic Acids Res 2010, 38(19):6637–6651. 10.1093/nar/gkq454PubMed CentralView ArticlePubMedGoogle Scholar
- Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A, Chabas S, Reiche K, Hackermüller J, Reinhardt R, Stadler PF, Vogel J: The primary transcriptome of the major human pathogen Heli-cobacter pylori. Nature 2010, 464(7286):250–255. 10.1038/nature08756View ArticlePubMedGoogle Scholar
- Schlüter J, Reinkensmeier J, Daschkey S, Evguenieva-Hackenberg E, Janssen S, Jänicke S, Becker J, Giegerich R, Becker A: A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti. BMC Genomics 2010, 17: 245.View ArticleGoogle Scholar
- Vogel J, Sharma CM: How to find small non-coding RNAs in bacteria. Biol Chem 2005, 386(12):1219–1238. 10.1515/BC.2005.140PubMedGoogle Scholar
- Pichon C, Felden B: Small RNA gene identification and mRNA target predictions in bacteria. Bioinformatics 2008, 24(24):2807–2813. 10.1093/bioinformatics/btn560View ArticlePubMedGoogle Scholar
- Washietl S, Hofacker IL, Stadler PF: Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA 2005, 102(7):2454–2459. 10.1073/pnas.0409169102PubMed CentralView ArticlePubMedGoogle Scholar
- Gruber AR, Findeiss S, Washietl S, Hofacker IL, Stadler PF: RNAZ 2.0: IMPROVED NONCODING RNA DETECTION. Pac Symp Biocomput 2010, 15: 69–79.Google Scholar
- Rivas E, Eddy SR: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2001, 2: 8–8. 10.1186/1471-2105-2-8PubMed CentralView ArticlePubMedGoogle Scholar
- Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D: Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2006, 2: e33. 10.1371/journal.pcbi.0020033PubMed CentralView ArticlePubMedGoogle Scholar
- Yao Z, Barrick J, Weinberg Z, Neph S, Breaker R, Tompa M, Ruzzo WL: A computational pipeline for high-throughput discovery of cis-regulatory noncoding RNA in prokaryotes. PLoS Comput Biol 2007., 3(7): 10.1371/journal.pcbi.0030126Google Scholar
- Uzilov AV, Keegan JM, Mathews DH: Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 2006, 7: 173–173. 10.1186/1471-2105-7-173PubMed CentralView ArticlePubMedGoogle Scholar
- Tseng HH, Weinberg Z, Gore J, Breaker RR, Ruzzo WL: Finding non-coding RNAs through genome-scale clustering. J Bioinform Comput Biol 2009, 7(2):373–388. 10.1142/S0219720009004126PubMed CentralView ArticlePubMedGoogle Scholar
- Childs L, Nikoloski Z, May P, Walther D: Identification and classification of ncRNA molecules using graph properties. Nucleic Acids Res 2009., 37(9): 10.1093/nar/gkp206Google Scholar
- Tran TT, Zhou F, Marshburn S, Stead M, Kushner SR, Xu Y: De novo computational prediction of non-coding RNA genes in prokaryotic genomes. Bioinformatics 2009, 25(22):2897–2905. 10.1093/bioinformatics/btp537PubMed CentralView ArticlePubMedGoogle Scholar
- Saetrom P, Sneve R, Kristiansen KI, Snove O, Grünfeld T, Rognes T, Seeberg E: Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming. Nucleic Acids Res 2005, 33(10):3263–3270. 10.1093/nar/gki644PubMed CentralView ArticlePubMedGoogle Scholar
- Yachie N, Numata K, Saito R, Kanai A, Tomita M: Prediction of non-coding and antisense RNA genes in Escherichia coli with Gapped Markov Model. Gene 2006, 372: 171–181. 10.1016/j.gene.2005.12.034View ArticlePubMedGoogle Scholar
- Salari R, Aksay C, Karakoc E, Unrau PJ, Hajirasouliha I, Sahinalp SC: smyRNA: a novel Ab initio ncRNA gene finder. PLoS One 2009., 4(5): 10.1371/journal.pone.0005433Google Scholar
- Livny J, Teonadi H, Livny M, Waldor MK: High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs. PLoS One 2008., 3(9): 10.1371/journal.pone.0003197Google Scholar
- Tjaden B: Prediction of small, noncoding RNAs in bacteria using heterogeneous data. J Math Biol 2008, 56(1–2):183–200. 10.1007/s00285-007-0079-5View ArticlePubMedGoogle Scholar
- Livny J, Brencic A, Lory S, Waldor MK: Identification of 17 Pseudomonas aeruginosa sRNAs and prediction of sRNA-encoding genes in 10 diverse pathogens using the bioinformatic tool sRNAPredict2. Nucleic Acids Res 2006, 34(12):3484–3493. 10.1093/nar/gkl453PubMed CentralView ArticlePubMedGoogle Scholar
- Sridhar J, Narmada SR, Sabarinathan R, Ou HY, Deng Z, Sekar K, Rafi ZA, Rajakumar K: sRNAscanner: A Computational Tool for Intergenic Small RNA Detection in Bacterial Genomes. PLoS One 2010., 5(8): 10.1371/journal.pone.0011970Google Scholar
- Bentley SD, Chater KF, Cerdeño-Tárraga AM, Challis GL, Thomson NR, James KD, Harris DE, Quail MA, Kieser H, Harper D, Bateman A, Brown S, Chandra G, Chen CW, Collins M, Cronin A, Fraser A, Goble A, Hidalgo J, Hornsby T, Howarth S, Huang CH, Kieser T, Larke L, Murphy L, Oliver K, O'Neil S, Rabbinowitsch E, Rajandream MA, Rutherford K, Rutter S, Seeger K, Saunders D, Sharp S, Squares R, Squares S, Taylor K, Warren T, Wietzorrek A, Woodward J, Barrell BG, Parkhill J, Hopwood DA: Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 2002, 417(6885):141–147. 10.1038/417141aView ArticlePubMedGoogle Scholar
- Axmann IM, Kensche P, Vogel J, Kohl S, Herzel H, Hess WR: Identification of cyanobacterial non-coding RNAs by comparative genome analysis. Genome Biol 2005., 6(9): 10.1186/gb-2005-6-9-r73Google Scholar
- Voss B, Georg J, Schön V, Ude S, Hess WR: Biocomputational prediction of non-coding RNAs in model cyanobacteria. BMC Genomics 2009, 10: 123–123. 10.1186/1471-2164-10-123PubMed CentralView ArticlePubMedGoogle Scholar
- Xiao B, Li W, Guo G, Li B, Liu Z, Jia K, Guo Y, Mao X, Zou Q: Identification of small noncoding RNAs in Helicobacter pylori by a bioinformatics-based approach. Curr Microbiol 2009, 58(3):258–263. 10.1007/s00284-008-9318-2View ArticlePubMedGoogle Scholar
- Kingsford CL, Ayanbule K, Salzberg SL: Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol 2007., 8(2): 10.1186/gb-2007-8-2-r22Google Scholar
- Benham CJ, Bi C: The analysis of stress-induced duplex destabilization in long genomic DNA sequences. J Comput Biol 2004, 11(4):519–543. 10.1089/cmb.2004.11.519View ArticlePubMedGoogle Scholar
- Wang H, Noordewier M, Benham CJ: Stress-induced DNA duplex destabilization (SIDD) in the E. coli genome: SIDD sites are closely associated with promoters. Genome Res 2004, 14(8):1575–1584. 10.1101/gr.2080004PubMed CentralView ArticlePubMedGoogle Scholar
- Wang H, Benham CJ: Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress. BMC Bioinformatics 2006, 7: 248–248. 10.1186/1471-2105-7-248PubMed CentralView ArticlePubMedGoogle Scholar
- Pánek J, Bobek J, Mikulík K, Basler M, J V: Biocomputational prediction of small non-coding RNAs in Streptomyces. BMC Genomics 2008, 9: 217–217.PubMed CentralView ArticlePubMedGoogle Scholar
- Swiercz JP, Bobek J, Bobek J, Haiser HJ, Di Berardo C, Tjaden B, Elliot MA: Small non-coding RNAs in Streptomyces coelicolor. Nucleic Acids Res 2008, 36(22):7240–7251. 10.1093/nar/gkn898PubMed CentralView ArticlePubMedGoogle Scholar
- D'Alia D, Nieselt K, Steigele S, Müller J, Verburg I, Takano E: Noncoding RNA of glutamine synthetase I modulates antibiotic production in Streptomyces coelicolor A3(2). J Bacteriol 2010, 192(4):1160–1164.PubMed CentralView ArticlePubMedGoogle Scholar
- Battke F, Herbig A, Wentzel A, Jakobsen Ø, Bonin M, Hodgson D, Wohlleben W, Ellingsen T, SysMO Stream Consortium, Nieselt K: A Technical Platform for Generating Reproducible Expression Data from Streptomyces Coelicolor Batch Cultivations. In Software Tools and Algorithms for Biological Systems. Edited by: Arabnia H. Springer; 2010:in press.Google Scholar
- Nieselt K, Battke F, Herbig A, Bruheim P, Wentzel A, Jakobsen ØM, Sletta H, Alam MT, Merlo ME, Moore J, Omara WA, Morrissey ER, Juarez-Hermosillo MA, Rodríguez-García A, Nentwich M, Thomas L, Iqbal M, Legaie R, Gaze WH, Challis GL, Jansen RC, Dijkhuizen L, Rand DA, Wild DL, Bonin M, Reuther J, Wohlleben W, Smith MC, Burroughs NJ, Martin JF, Hodgson DA, Takano E, Breitling R, Ellingsen TE, Wellington EM: The dynamic architecture of the metabolic switch in Streptomyces coelicolor. BMC Genomics 2010, 11: 10–10. 10.1186/1471-2164-11-10PubMed CentralView ArticlePubMedGoogle Scholar
- Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A: Rfam: updates to the RNA families database. Nucleic Acids Res 2009, (37 Database):136–140. 10.1093/nar/gkn766Google Scholar
- Nawrocki EP, Kolbe DL, Eddy SR: Infernal 1.0: inference of RNA alignments. Bioinformatics 2009, 25(10):1335–1337. 10.1093/bioinformatics/btp157PubMed CentralView ArticlePubMedGoogle Scholar
- Gautheret D, Lambert A: Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol 2001, 313(5):1003–1011. 10.1006/jmbi.2001.5102View ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2009. [ISBN 3–900051–07–0] [ISBN 3-900051-07-0]Google Scholar
- Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 2004, 5(10):R80. 10.1186/gb-2004-5-10-r80PubMed CentralView ArticlePubMedGoogle Scholar
- Busch A, Richter AS, Backofen R: IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. Bioinformatics 2008, 24(24):2849–2856. 10.1093/bioinformatics/btn544PubMed CentralView ArticlePubMedGoogle Scholar
- Darling AC, Mau B, Blattner FR, Perna NT: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 2004, 14(7):1394–1403. 10.1101/gr.2289704PubMed CentralView ArticlePubMedGoogle Scholar
- Darling AE, Mau B, Perna NT: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 2010., 5(6): 10.1371/journal.pone.0011147Google Scholar
- Battke F, Symons S, Nieselt K: Mayday-integrative analytics for expression data. BMC Bioinformatics 2010, 11: 121–121.PubMed CentralView ArticlePubMedGoogle Scholar
- Rodríguez-García A, Barreiro C, Santos-Beneit F, Sola-Landa A, Martín JF: Genome-wide transcriptomic and proteomic analysis of the primary response to phosphate limitation in Streptomyces coelicolor M145 and in a DeltaphoP mutant. Proteomics 2007, 7(14):2410–2429.View ArticlePubMedGoogle Scholar
- Bode J, Winkelmann S, Götze S, Spiker S, Tsutsui K, Bi C, A K P, Benham C: Correlations between scaffold/matrix attachment region (S/MAR) binding activity and DNA duplex destabilization energy. J Mol Biol 2006, 358(2):597–613. 10.1016/j.jmb.2005.11.073View ArticlePubMedGoogle Scholar
- Polonskaya Z, Benham CJ, Hearing J: Role for a region of helically unstable DNA within the Epstein-Barr virus latent cycle origin of DNA replication oriP in origin function. Virology 2004, 328(2):282–291. 10.1016/j.virol.2004.07.023View ArticlePubMedGoogle Scholar
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997, 25(5):955–964. 10.1093/nar/25.5.955PubMed CentralView ArticlePubMedGoogle Scholar
- Rederstorff M, Bernhart SH, Tanzer A, Zywicki M, Perfler K, Lukasser M, Hofacker IL, Hüttenhofer A: RN-Pomics: defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles. Nucleic Acids Res 2010., 38(10): 10.1093/nar/gkq057Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.