An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs
© Chang et al.; licensee BioMed Central Ltd. 2013
Published: 21 January 2013
Skip to main content
© Chang et al.; licensee BioMed Central Ltd. 2013
Published: 21 January 2013
Functional RNA molecules participate in numerous biological processes, ranging from gene regulation to protein synthesis. Analysis of functional RNA motifs and elements in RNA sequences can obtain useful information for deciphering RNA regulatory mechanisms. Our previous work, RegRNA, is widely used in the identification of regulatory motifs, and this work extends it by incorporating more comprehensive and updated data sources and analytical approaches into a new platform.
An integrated web-based system, RegRNA 2.0, has been developed for comprehensively identifying the functional RNA motifs and sites in an input RNA sequence. Numerous data sources and analytical approaches are integrated, and several types of functional RNA motifs and sites can be identified by RegRNA 2.0: (i) splicing donor/acceptor sites; (ii) splicing regulatory motifs; (iii) polyadenylation sites; (iv) ribosome binding sites; (v) rho-independent terminator; (vi) motifs in mRNA 5'-untranslated region (5'UTR) and 3'UTR; (vii) AU-rich elements; (viii) C-to-U editing sites; (ix) riboswitches; (x) RNA cis-regulatory elements; (xi) transcriptional regulatory motifs; (xii) user-defined motifs; (xiii) similar functional RNA sequences; (xiv) microRNA target sites; (xv) non-coding RNA hybridization sites; (xvi) long stems; (xvii) open reading frames; (xviii) related information of an RNA sequence. User can submit an RNA sequence and obtain the predictive results through RegRNA 2.0 web page.
RegRNA 2.0 is an easy to use web server for identifying regulatory RNA motifs and functional sites. Through its integrated user-friendly interface, user is capable of using various analytical approaches and observing results with graphical visualization conveniently. RegRNA 2.0 is now available at http://regrna2.mbc.nctu.edu.tw.
Numerous functional RNA motifs have been identified as playing significant roles in many essential biological processes, including transcriptional and post-transcriptional regulation of gene expression, control of mRNA stability, alternative splicing, and transcription termination. The biological activities of functional RNA motifs usually rely on a combination of their primary sequences and specific secondary structures, which act as target sites of RNA-binding factors or directly interact with translation machinery . For instance, riboswitches are metabolite-binding domain within a specific mRNA, and can regulate both transcription and translation by binding their corresponding targets [2, 3].
Several databases were established for collecting functional RNA molecules [1, 4–14]. UTRdb  is a database of 5' and 3' untranslated sequences of eukaryotic mRNAs. It provides specialized information including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional roles, and these patterns have been collected into the UTRsite database. Rfam [4, 5] is a database comprehensively collecting families of non-coding RNA (ncRNA) genes as well as cis-regulatory RNA elements. Each family is represented by a multiple sequence alignment of known and predicted representative members, and annotated with a consensus base-paired secondary structure. It facilitates the identification and classification of new members of known RNA families, and provides the glimpses of conservation of multiple ncRNA families across a wide taxonomic range. fRNAdb [6, 7] is a database hosting a large collection of ncRNA sequence data from public non-coding databases, and provides related annotations, such as sequence ontology classification and source organisms. AEdb is a database for alternative exons and their properties from numerous species, and it forms the manually curated component of alternative splicing database (ASD) . The data in AEdb is gathered from literature where these exons have been experimentally verified. The adenylate uridylate-rich elements (AREs or AU-rich element) mediate the rapid turnover of mRNA encoding proteins that regulate cellular growth and body response to exogenous agent such as microbes and environmental stimuli. ARED [9, 10] is a human AU-rich element-containing mRNA database. A 13-bp ARE pattern was computationally derived using MEME, and five clusters were generated from ARE sequences. NONCODE [11, 12] is an integrated knowledge database designed for analysis of ncRNAs. All ncRNAs in NONCODE were confirmed by consulting the references manually and more than 80% data are from experiments. microRNAs (miRNAs) are small RNA molecules, which are ~22 nt sequences, and participate in gene post-transcriptional regulation and degradation of mRNA by hybridizing to miRNA target sites. miRBase  is the central online repository for miRNA nomenclature, sequence data annotation and target prediction. It provides a range of data to facilitate studies of miRNA genomics. TRANSFAC  is a knowledge-base containing published data on eukaryotic transcription factors, their experimentally-proven binding site, and regulated genes.
Various approaches were developed for identifying functional RNA motifs or elements [15–26]. GeneSplicer  was developed for detecting splice sites in eukaryotic mRNA by combining several techniques, such as maximal dependence decomposition (MDD) and Markov model, that have already proven successful in characterizing the patterns around the donor and acceptor sites. polya_svm  was developed for predicting mRNA polyadenylation site using a Support Vector Machine (SVM) featuring 15 over-represented cis-regulatory elements in various regions surrounding. RBSfinder  is a probabilistic method to improve the accuracy of gene identification systems at finding precise translation start sites. TransTermHP  can rapidly and accurately detecting rho-independent transcription terminators. CURE  was developed for predicting C-to-U RNA editing site in plant mitochondria by incorporating both evolutionary and biochemical information. miRanda  was developed for finding genomic targets for miRNAs. RiboSW  is a systematic method for identifying 12 kinds of riboswitches based on RNA conserved functional sequences and conformations. PatSearch  was developed for searching specific combinations of oligonucleotide consensus sequences, secondary structure motifs and position-weight matrices (PWMs). ERPIN  is a practical approach for the automatic derivation of an RNA signature from a sequence alignment and secondary structure, and finding the occurrence in sequence databases. Several profiles have been constructed to search any input sequence for the presence of some RNA genes and elements on ERPIN web server. INFERNAL  is an implementation of a general stochastic context-free grammars (SCFG) based approach for RNA database searches and multiple alignment. It is used to annotate RNAs in genomes in conjunction with the Rfam families by covariance models, a special case of SCFGs designed for modeling RNA consensus sequence and structure. MATCH  is an approach for searching transcription factor binding sites with specific position-weight matrices (PWM). RNAMotif  is an RNA secondary structure definition and search algorithm, and commonly used for searching user-defined RNA motifs.
Analysis of functional RNA motifs and sites in RNA sequences can obtain useful information for deciphering RNA regulatory mechanisms. Our previous work, RegRNA , is widely used to identify regulatory motifs and miRNA target sites, and has been cited 50 times. However, various types of functional RNA motifs and identification approaches were continuously accumulated and developed in recent years. In order to comprehensively identify functional RNA motifs, a more complete and updated analysis platform is crucial.
This work presents an integrated web server, RegRNA 2.0, for identifying functional RNA motifs and sites in an input RNA sequence. Numerous data sources, such as Rfam , fRNAdb  and UTRsite , and identification approaches, such as GeneSplicer , RiboSW  and RBSfinder , were integrated in RegRNA 2.0, and other additional information, such as GC-content ratio and RNA accessibility, are also presented on the web page. User can submit an RNA sequence through our user-friendly interface, and obtain the predictive results with graphical visualization.
The functional RNA motifs and sites supported in RegRNA 2.0 are categorized into several types: (i) splicing donor/acceptor sites; (ii) splicing regulatory motifs; (iii) polyadenylation sites; (iv) ribosome binding sites; (v) rho-independent terminator; (vi) motifs in mRNA 5'-untranslated region (5'UTR) and 3'-UTR; (vii) AU-rich elements; (viii) C-to-U editing sites; (ix) riboswitches; (x) RNA cis-regulatory elements; (xi) transcriptional regulatory motifs; (xii) user-defined motifs; (xiii) similar functional RNA sequences; (xiv) microRNA target sites; (xv) non-coding RNA hybridization sites; (xvi) long stems; (xvii) open reading frames; (xviii) related information of an RNA sequence.
Statistics of types of functional RNA motifs supported in RegRNA 2.0
Types of functional RNA motifs
Number of entries
Splicing regulatory motifs
294 splicing motifs
Ribosome binding site
48 UTRsite motifs
5 ARE patterns
RNA editing sites
11 RNA elements
RNA cis-regulatory elements
Rfam CMs 
209 Rfam cis-reg families
Known functional RNAs
475,318 fRNAdb sequences
21,643 miRNA sequences
ncRNA hybridization sites
170,581 ncRNA sequences
2,171 transcription factor binding matrices
Open Reading Frame
Numerous analytical approaches and data sources were integrated in RegRNA 2.0 (Table 1). GeneSplicer , polya_svm , RBSfinder , TransTermHP , CURE , RiboSW , and ERPIN , are incorporated for identifying splicing sites, polyadenylation sites, ribosome binding sites, Rho-independent terminator, C-to-U editing sites, riboswitches, and RNA elements, respectively. MATCH  is used with matrices collected in TRANSFAC  to provide the possibility to search for a variety of different transcription factor finding sites. PatSearch  and UTRsite models are integrated for indentifying UTR motifs. INFERNAL  and Rfam CMs are integrated for identifying cis-regulatory families. miRanda  and miRNA sequences of miRBase are integrated for identifying miRNA target sites. BLAST  and sequences of fRNAdb is integrated for finding similar functional RNA sequences. The einverted of EMBOSS package  is utilized for identifying long stems, which might be involved in mechanisms of gene regulatory processes [31–33]. For identifying putative RNA-RNA interaction sites, BLAST is used to find the complementary subsequence of input sequence against NONCODE database, and RNAcofold of Vienna RNA Package  is used to compute the free energy of hybridization sites. RNAMotif  is integrated for searching user-defined RNA motifs. In addition, RegRNA 2.0 is capable of predicting ORFs of the input RNA sequence. The default options are for resulting protein of at least 80 amino acids beginning with a start codon (AUG, GUG or UUG) and ending with a stop codon (UAA, UAG or UGA). The fully overlapped ORFs are not shown. Other related information, such as GC-content ratio and RNA accessibility, are also provided for the input RNA sequence. RNAplfold and RNAfold of Vienna RNA package  are used for predicting RNA accessibility and RNA secondary structure, respectively.
RegRNA 2.0 provides an intuitive graphical visualization (map view, Figure 2c) and summarized information table (table view, Figure 2d) for predictive results. The graphic location maps are created for intuitively displaying the positions of predictive motifs. The top-most graph shows the predictive ORFs, and the following graphs shows the predictive functional RNA motifs or sites. User can see the brief introduction of a predictive motif, such as the name, the start/end positions and the binding factors, by moving the cursor on it, and a pop-up description will be shown on the screen directly (Figure 2e). Further analysis and additional information of a predictive motif, such as the predictive secondary structure and the corresponding RNALogo  graph, can be observed by clicking on the motifs of interest (Figure 2f). The details of predictive results can be obtained in summarized information table (Figure 2g).
The purine riboswitch is used as a case study to demonstrate the capability of RegRNA 2.0. Purine riboswitches, which are found in the 5'UTR of mRNAs act as cis-acting genetic regulatory elements composed of a metabolite-responsive aptamer domain in a specific secondary structure. It can regulate both transcription and translation by binding their corresponding targets. Additional file 1 illustrates a cartoon representation of the mechanism of genetic regulation by the guanine riboswitch . In the presence high concentrations of guanine or hypoxanthine, ligand binding stabilizes the three-way junction structure, allowing the mRNA to form the terminator element (cyan). Without ligand binding, the 3'side of the P1 stem (green) and the 5'side of the terminator are used to form an antiterminator element, allowing transcription to continue.
An RNA sequence with the accession number of EMBL, X83878, was used as an input for RegRNA 2.0. There exist a purine riboswitch and an operon of two genes, B. subtilis xpt and pubX, in X83878 according to the annotations of Rfam and EMBL database. The total length of X83878 is 2413 bps, and the location of purine riboswitch is from position 168 to 276. The location of CDS regions of xpt and pubX are from position 357 to 941 and from position 938 to 2254, respectively.
A comparison between RegRNA 2.0 and RegRNA
Ribosome binding sites
RNA editing sites
Yes (Rfam & ERPIN)
similar functional RNAs
ncRNA hybridization region
Open reading frame
Yes (RegRNA 2.0)
Motif region structure
Yes (RegRNA 2.0)
Yes (TRANSFAC 7.4)
Yes & Updated (TRANSFAC 2012.1)
Splicing regulatory motifs
Yes (AEDB 278 motifs)
Yes & Updated (AEDB 294 motifs)
Yes (UTRSite 40 motifs)
Yes & Updated (UTRSite 48 motifs)
Yes & Updated (RiboSW & Rfam)
miRNA target sites
Yes (744 miRNAs)
Yes & Updated (miRBase 21,643 miRNAs)
Yes (EMBOSS einverted)
RegRNA 2.0 is an easy to use web server for comprehensively identifying regulatory RNA motifs and functional sites. It extends the widely used analysis platform, RegRNA , by taking more types of motifs and analytical approaches into consideration. RegRNA 2.0 is convenient to use programs without having to download the code and get the programs to run. Through its integrated user-friendly interface, user is capable of using various analytical approaches and observing results with graphical visualization conveniently. The platform will be enhanced by supporting input of multiple RNA sequences and providing conservation analysis in the future.
The RegRNA 2.0 system is freely available at http://regrna2.mbc.nctu.edu.tw.
The authors approved the submission of this paper to BMC Bioinformatics for publication. The payment of publishing charges to BioMed Central for this article was supported by National Science Council of the Republic of China, No. NSC 101-2311-B-009-003-MY3 and NSC 100-2627-B-009-002. This publishing charge was supported in part by the UST-UCSD International Center of Excellence in Advanced Bio-engineering sponsored by the Taiwan National Science Council I-RiCE Program under Grant Number: NSC 101-2911-I-009-101, and Veterans General Hospitals and University System of Taiwan (VGHUST) Joint Research Program under Grant Number: VGHUST101-G5-1-1. This publishing charge is also partially supported by MOE ATU.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 2, 2013: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S2.
The authors would like to thank the National Science Council of the Republic of China, No. NSC 101-2311-B-009-003-MY3 and NSC 100-2627-B-009-002. This work was supported in part by the UST-UCSD International Center of Excellence in Advanced Bio-engineering sponsored by the Taiwan National Science Council I-RiCE Program under Grant Number: NSC 101-2911-I-009-101, and Veterans General Hospitals and University System of Taiwan (VGHUST) Joint Research Program under Grant Number: VGHUST101-G5-1-1. This work was also partially supported by MOE ATU.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.