Riboswitch Detection Using Profile Hidden Markov Models
© Singh et al; licensee BioMed Central Ltd. 2009
Received: 19 March 2009
Accepted: 8 October 2009
Published: 8 October 2009
Riboswitches are a type of noncoding RNA that regulate gene expression by switching from one structural conformation to another on ligand binding. The various classes of riboswitches discovered so far are differentiated by the ligand, which on binding induces a conformational switch. Every class of riboswitch is characterized by an aptamer domain, which provides the site for ligand binding, and an expression platform that undergoes conformational change on ligand binding. The sequence and structure of the aptamer domain is highly conserved in riboswitches belonging to the same class. We propose a method for fast and accurate identification of riboswitches using profile Hidden Markov Models (pHMM). Our method exploits the high degree of sequence conservation that characterizes the aptamer domain.
Our method can detect riboswitches in genomic databases rapidly and accurately. Its sensitivity is comparable to the method based on the Covariance Model (CM). For six out of ten riboswitch classes, our method detects more than 99.5% of the candidates identified by the much slower CM method while being several hundred times faster. For three riboswitch classes, our method detects 97-99% of the candidates relative to the CM method. Our method works very well for those classes of riboswitches that are characterized by distinct and conserved sequence motifs.
Riboswitches play a crucial role in controlling the expression of several prokaryotic genes involved in metabolism and transport processes. As more and more new classes of riboswitches are being discovered, it is important to understand the patterns of their intra and inter genomic distribution. Understanding such patterns will enable us to better understand the evolutionary history of these genetic regulatory elements. However, a complete picture of the distribution pattern of riboswitches will emerge only after accurate identification of riboswitches across genomes. We believe that the riboswitch detection method developed in this paper will aid in that process. The significant advantage in terms of speed, of our pHMM-based approach over the method based on CM allows us to scan entire databases (rather than 5'UTRs only) in a relatively short period of time in order to accurately identify riboswitch candidates.
Recent discoveries of noncoding RNAs (ncRNAs), RNA molecules that do not code for proteins but function directly, reveal that they are abundant, widespread and perform truly diverse functions. [1, 2] Significant and rapid advancements in RNA-mediated genetic control studies have established the importance of RNA in gene regulation [3, 4]. The catalytic and regulatory roles of RNAs like ribozymes and riboswitches lend support to the hypothesis of RNA world and highlight the importance of RNA in the primordial world [5, 6].
Riboswitches are cis-acting regulatory RNAs residing in the 5' untranslated regions (UTRs) of primarily prokaryotic mRNAs. They are complex folded structures that act as high affinity receptors for specific cellular metabolites [7–9]. On metabolite binding they undergo conformational change, which modulates gene expression at post-transcriptional level, either through premature termination of transcription  or inhibition of translation initiation . They are composed of two structural domains: an aptamer domain  and an expression platform . The aptamer domain binds the metabolite with high specificity resulting in the alteration of the RNA folding pattern mainly in the expression platform. Switching between two alternative RNA conformations, one of which is favoured in the absence of the bound metabolite and the other in its presence, leads to regulation of gene expression. The aptamer domain is highly conserved both at sequence as well as structure level among widely divergent organisms whereas the expression platform is highly variable even amongst the same riboswitch class. Riboswitches regulate genes in several metabolic pathways involved in the biosynthesis of vitamins, amino acids and purines [14, 15].
Riboswitches have various important applications. Since they are believed to be the descendants of ancient metabolite sensors, they can be useful in gaining valuable insights into how gene regulation mechanisms evolved from the primitive forms of life to the more complex ones. Riboswitches have also been used as potential drug targets for antibacterial and antifungal agents . Examples of such antimicrobial drugs are Pyrithiamine, which targets the TPP riboswitch  and S-(2-aminoethyl)-L-cysteine (AEC) which acts by binding to the lysine riboswitch . Artificial riboswitches have also been engineered for the manipulation of gene expression; for example a theophylline-sensing synthetic RNA switch causes reduced access to an adjacent Shine Dalgarno sequence on theophylline addition . Elucidating the underlying principles of riboswitch-mediated regulation may lead to the development of engineered ligands capable of modulating gene expression. More detailed characterization of the distribution and function of riboswitches across and within different genomes is essential to determine their precise role as riboregulators and potential drug targets.
Enormous growth of genome sequence data makes it practically infeasible to discover riboswitches solely by experimental means. In order to understand the extent to which organisms use these regulatory RNAs, time efficient algorithms for genome wide identification of riboswitches are required. Algorithms for detecting RNA homologs can be divides into two classes, those which are specific to a particular RNA class (e.g. tRNAscan-SE, miRscan etc.) and those which are general approaches applicable to all structured RNAs (e.g. INFERNAL). Each approach has its advantages and disadvantages. The specific tools use family specific properties to maximize speed and sensitivity but a new approach is required for each new RNA class. General tools can be used to detect members of any RNA class; however they are slower.
The most sensitive general-purpose method available for riboswitch search is the Covariance Model (CM). CM can be viewed as profile stochastic context free grammar which scores a combination of sequence consensus and RNA secondary structure consensus. Searches using CM require high quality hand curated RNA sequence alignments along with covariation information. These searches are complicated due to the incorporation of two levels of information and therefore require a huge amount of computing time. The search time scales roughly with the cube of the query length, so it becomes practically infeasible to search databases using larger RNA models.
The aim of this study is to develop a fast and efficient method for riboswitch identification. We propose profile Hidden Markov Models (pHMMs)  for consensus modelling of riboswitch sequences and their applicability for riboswitch detection. The method was used to search the Refseq database for riboswitches belonging to different classes. The whole genome search results as well as computational time required for the searches were compared with the Covariance Model. We find that our pHMM-based method is able to detect riboswitches belonging to eight of the ten families with high sensitivity and specificity while being more than a hundred times faster than the CM. We also compared our method with other web-based tools available for riboswitch discovery such as RibEx and Riboswitch finder. In both cases, our method is either more sensitive or as sensitive as the other method in detecting riboswitches. Our results indicate that pHMMs provide a fast and effective alternative for genome wide riboswitch searches.
Results and Discussion
Hidden Markov Models (HMMs) [21, 22] provide a coherent theory for probabilistic modelling of proteins and nucleotide sequences. HMMs have been demonstrated to be effective in detecting conserved patterns in multiple sequences . A profile HMM (pHMM) [20, 24] is an HMM with a structure that allows insertions and deletions in the model, and models gaps in a position dependent manner to give position sensitive gap scores. pHMMs can be constructed from a set of sequences belonging to a family and can be used for selective and sensitive database search for finding other members of that family. In this study we used two well known pHMM packages, SAM (referred in the text in uppercase italics to distinguish it from the Sam riboswitch family)  and HMMER  to construct pHMMs for each riboswitch family and used them to search for riboswitches in the Refseq database. SAM is known to be sensitive at model estimation while HMMER is known for more accurate model scoring . Therefore SAM was used for pHMM construction and HMMER was used for database searching (as described in "Methods").
Performance evaluation of the models constructed for different riboswitch families
Sensitivity and Specificity for different riboswitch families.
Comparison of pHMMs with the Covariance Models
CMs and pHMMs were used to scan the Refseq database for the candidates belonging to each of the ten riboswitch families. These families show different levels of sequence conservation and are of variable length. Some families like FMN and Sam are highly conserved while others like Cobalamin and Lysine show low sequence conservation.
Percentage of CM hits covered by pHMM.
% of CM hits covered by pHMM I#
% of CM hits covered by pHMM II#
This riboswitch class is characterized by the greatest degree of sequence conservation among members that are widely distributed across diverse bacterial species. When CM search results for FMN family were compared with that of pHMM, it was found that 99.40% of the CM search hits were obtained using pHMM based search. Exclusive CM and exclusive pHMM hits were analyzed in detail. When hits that are located upstream of hypothetical or putative genes were ignored, the percent hits covered by pHMM increased to 100%. Thus it is plausible that none of the exclusive CM hits appear be true positives. However one genuine hit was picked exclusively by pHMM.
This riboswitch class is also widely distributed amongst bacterial genomes. It has the largest average length and shows poor sequence conservation. A comparison of CM and pHMM results showed that 99.41% of the CM hits were reported by pHMM. After removing hits upstream to hypothetical and putative genes, this coverage increased to 99.59%. Seven genuine hits were found exclusively by CM search and forty-seven genuine riboswitch candidates were detected exclusively by pHMM search. The validity of the exclusive pHMM hits was determined by taking into account the genomic context in which they appeared.
Glms is the only known riboswitch to exhibit ribozyme activity. It also shows high degree of sequence conservation and is found only in a few bacterial groups. For this family 96.83% of the total CM hits were also picked by the pHMM method. On closer inspection of the exclusive CM hits, it was found that many of these were in AT-rich repetitive regions that are unlikely to be valid riboswitches. Considering them as false positives and after excluding hits in the upstream of hypothetical and putative genes, only five genuine riboswitches were found exclusively by CM search and one genuine riboswitch candidate was found exclusively by the pHMM method.
The Lysine riboswitch shows low sequence conservation and is not very abundant in bacterial species. As in the case of the Glms riboswitch, many of the exclusive CM hits were in AT-rich repeat regions. After removing all such spurious hits, 97.45% of CM hits were recovered by pHMM search. When hits lying upstream to hypothetical and putative genes were discarded, only eleven exclusive CM hits and two exclusive pHMM hits were obtained.
The Purine riboswitch is found in few bacterial groups and shows intermediate sequence conservation. For the Purine riboswitch 99.83% of the total CM hits were found using the pHMM model. One exclusive pHMM hit and one exclusive CM hit was found. There were no hits lying upstream to hypothetical or putative genes.
The Sam riboswitch shows high-level sequence conservation. 99.44% of the total CM hits were recovered using the pHMM search method. After removing hits upstream to hypothetical as well as putative genes only three exclusive CM hits and seventeen exclusive pHMM hits were obtained.
This is the most abundant riboswitch and is known to be present even in eukaryotes. It has intermediate level of sequence conservation. When CM hits were compared with those obtained using the pHMM method, it was found that 99.95% of the CM hits overlapped with the pHMM set. One exclusive CM hit and five exclusive pHMM hits were found to be true riboswitches on the basis of their genomic context. No hits upstream to hypothetical or putative genes were present in exclusive CM set.
PreQ1 has an unusually small aptamer domain with a simplified secondary structure consisting of a single stem loop structure. 90.94% of the CM hits were also obtained by pHMM search. After hits upstream of hypothetical and putative genes were eliminated, the coverage increased to 98.84%. However twenty four exclusive pHMM hits were found.
Glycine riboswitch is the only known metabolite binding riboswitch that consists of two metabolite binding aptamer domains in tandem. 99.66% of the CM search hits were obtained using the pHMM method. After discarding hits lying upstream to putative and hypothetical genes, twenty-seven exclusive pHMM hits were obtained; however no exclusive CM hits were detected.
Sam alpha riboswitch
The Sam alpha riboswitch is found predominantly in alpha proteobacteria. It is a short riboswitch with a relatively simple structure composed of a single hairpin. When CM hits were compared with the profile HMM results, it was found that the pHMM method covered only 69.09% of the CM hits. After discarding hits lying upstream to putative and hypothetical genes, forty-two exclusive CM hits were obtained and pHMM coverage of CM hits increases to 75.14%. Only two hits were detected exclusively by the pHMM method.
When we had nearly completed our analysis with covariance models using Infernal 0.72 , the new Infernal version 1.0 was released . Since CM search requires a large amount of computing time, the new version implements two rounds of filtering to reduce the search time. The HMM filtering technique as described in [33, 34] is applied first and then query-dependant banded CYK maximum likelihood search algorithm is used as a second filter . It has been found that the default filters accelerate the similarity search by about 30-fold overall, while sacrificing a small amount of sensitivity. However, the models with little primary sequence conservation cannot be effectively accelerated by primary sequence based filters . Although version 1.0 is faster than 0.72, it is still quite slow compared to pHMM searches. The comparison of riboswitch search times using Infernal 1.0 and our pHMM-based method, for different riboswitch families, is shown in the Additional file 9.
We also used Infernal 1.0 to scan the Refseq database for scanning the riboswitch families and found that at the same threshold (i.e. same as the one used for infernal 0.72 version), the hits reported by both the versions were similar except for TPP and PreQ1 where Infernal 0.72 reported more hits than Infernal 1.0. However Infernal 1.0 was found to be more specific as it did not report spurious hits in AT repetitive regions. Comparison of pHMMs with CM generated using Infernal 1.0 did not change the reported pHMM coverage of CM hits much (data not shown).
Comparison with pHMM based heuristic for ncRNA detection
Extremely slow scans using CMs have inspired the use of heuristics to improve speed. Rfam uses a BLAST based heuristic. For each ncRNA family, the known members are BLASTed against RFAMSEQ; the full CM is run only on matches returned by BLAST. These searches are acceptably fast, but the BLAST heuristic may miss family members that would be found with a regular (slower) CM search. Profile HMM based filters such as rigorous filers and Maximum-Likelihood(ML) heuristics have also been developed [34, 35]. Rigorous filters guarantee that all homologs detectable by a given CM are selected by the filter (i.e. ensures high sensitivity) but does so at the expense of speed since building rigorous filters can take several hours . In ML-heuristic, profile HMMs are constructed from a given CM. The HMM transition and emission probabilities are designed to make the HMM maximally similar to CM . These pHMM based filters have been implemented in the RAVENNA package. For each family CM, ML-heuristic profile HMM was built and used to scan the RefSeq database. The search speed was greatly enhanced as compared to CMs, nevertheless they were still slower (ranging from twice as slow to more than 10 times slower, depending on the riboswitch family) than purely sequence based profiles. The computational time required by an ML-heuristic profile HMM and sequence based pHMMs is compared in Figure 2. The number of hits obtained for most of the families (when an ML-heuristic profile HMM is used) is the same as that obtained from the CM searches. Therefore the percentage coverage statistics does not change.
Comparison with other web based tools available for riboswitch identification
To determine the efficacy of our method relative to other riboswitch detection methods, we carried out a comparison of our approach with the Riboswitch finder and RibEx packages.
Comparison of the performance of the RibEx package with pHMMs.
Number of sequences in the test set
Number of sequences predicted by RibEx
Number of sequences predicted by pHMM
Another tool available for riboswitch identification is Riboswitch finder . It uses sequence patterns, secondary structure prediction and scoring functions for the detection of a riboswitch in a given sequence. However this software is specifically designed for the purine-sensing riboswitch only. Earlier Riboswitch finder has reported a total of 18 putative purine riboswitches in genomic sequences of Bacillus anthracis, Bacillus cereus, Enterococcus faecalis, Lactobacillus plantarum, Bacillus stearothermophilus, Clostridium tetani, Listeria innocua and Vibrio parahaemolyticus. We scanned these genomes with Purine specific pHMM model and not only recovered the hits reported by Riboswitch finder but also found two new hits, one in Bacillus anthracis and the other in Bacillus cereus. We also scanned full members of Purine riboswitch family available in Rfam using Riboswitch finder. Riboswitch finder could detect only 114 out of 122 sequences listed in Rfam. In contrast, our pHMM-based method detected all of them.
Accurate identification of riboswitches across entire genomes of varying lengths is the first step towards analysing the patterns in their intra and inter-genomic distribution. The distribution patterns of riboswitches can reveal important information regarding their evolution. It is therefore imperative to develop a framework for rapid and efficient detection of riboswitches across diverse genomes. Riboswitches are different from other ncRNA's by virtue of their relatively longer lengths and distinctive folding patterns. This is often manifest in the high level of primary sequence conservation that is observed between riboswitches belonging to the same family. This aspect has been exploited in our method of riboswitch detection.
The strength of the pHMM based approach for riboswitch identification lies in its speed as well as its accuracy (for all except two families) in identifying riboswitches. The success of the pHMM based approach to riboswitch identification depends on several factors such as the degree of primary sequence conservation, the presence of distinct and easily distinguishable sequence motifs in the aptamer domain and the availability of sufficiently large number of training sequences for model building, which adequately capture the distinct features of each riboswitch class. If the training set is small but the primary sequence conservation is high with distinct and easily identifiable motifs then the effectiveness of the pHMMs in detecting riboswitches will be high as in the case of FMN, Glms and Purine. Even for families with overall low sequence conservation (such as Cobalamin and Lysine) but which carry short stretches of multiple distinct motifs, pHMM performs extremely well. However if a family lacks highly conserved sequence motifs or has low complexity motifs, then the performance of pHMM will be poor as in the case of Sam alpha and PreQ1. Therefore these riboswitch families, which are characterized by short aptamer domains, lacking highly conserved sequence motifs cannot be found with high sensitivity and specificity using this approach.
We believe that the riboswitch identification framework developed in this paper (see also http://ccbb.jnu.ac.in/data/models/ for resources related to this paper) will be useful in screening genomic sequences to accurately and rapidly identify not only riboswitches but any other class of RNA's that are relatively long and characterized by multiple distinct sequence motifs.
Training dataset for model building
Sequences for pHMM construction for each riboswitch family were obtained from the Rfam database (version 8.1) http://www.sanger.ac.uk/Software/Rfam/. Rfam is a comprehensive collection of ncRNA families, represented by multiple sequence alignments and profile stochastic context-free grammars (CM) . "Seed sequences" which represent a set of known members of a riboswitch family were used to train the pHMM. It is necessary to remove redundant sequences from the training and testing data as it influences the performance of a method [42, 43]. Therefore prior to model building, the training sequences were clustered on the basis of sequence similarity using blastclust . Sequences that were 90% similar over 90% of their length were considered to be duplicates and hence were eliminated from the seed sequences thus generating the training set.
Test dataset for model evaluation
The Rfam database was developed for the annotation of structured RNA families of genomic sequences, but it has been widely used as a source of reliable alignments and structures for the purposes of training as well as benchmarking RNA sequence and secondary structure analysis software. In order to test the performance of our method we obtained sequences from Rfam. For each RNA family Rfam provides seed sequences, which represent the known members of a particular family and the full collection of family sequences, which contains known members as well as those predicted by CM search. We downloaded fasta sequences of all the Rfam members filtered to less than 90% identity. This data not only includes the riboswitch family sequences but also contains over 597 other regulatory RNAs which have been compiled after scanning over 400 complete genomes. This data set was screened and the training sequences used for building pHMMs for each of the riboswitch family were removed. Fifty random sequences were also generated and included in the test set.
Profile Hidden Markov Model construction
There are two packages available for pHMM construction, SAM and HMMER. The model was estimated using SAM's expectation maximization algorithm, buildmodel. The alignment of the training sequences to the resulting HMM was accomplished with SAM's align2model program. pHMMs were then constructed using the modelfromalign program which uses alignment generated by the align2model program. The profiles thus obtained were converted to HMMER-compatible format using the program sam2hmmer available with the SAM package. The profiles were then used to search microbial sequences in the RefSeq database version 28 using the hmmsearch program from HMMER. The pHMMs for different riboswitch families are provided in Additional files 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19. The detailed commands and the pHMMs can be obtained at http://ccbb.jnu.ac.in/data/models/
CM model construction
In order to objectively compare the computing times of the pHMM and CM methods, it was necessary to carry out riboswitch searches using both methods on the same computing platform. Therefore, covariance models were constructed using the Infernal software package version 0.72 http://infernal.janelia.org/. CM describes both the secondary structure and the sequence consensus of an RNA. CM construction needs sequence alignment along with secondary structure annotation, therefore they were trained on the seed sequence alignments available in Rfam (version 8.1) using the cmbuild program from Infernal. These are manually adjusted alignments annotated with secondary structure information. CMs thus constructed were then used to search microbial genomes in the Refseq database using cmsearch program from Infernal. Rfam "gathering threshold" was taken as the cutoff threshold for each family (both for CMs as well as for ML-heuristic pHMMs). All the hits scoring above the threshold for the respective families were considered as legitimate riboswitch candidates.
Calculating pHMM coverage of CM hits
The results of the pHMM and the CM searches were compared to obtain the sets of common hits picked by both the approaches and the hits picked exclusively either by the pHMM or the CM method. Known riboswitches are generally present at the 5'-ends (UTRs) of the genes implicated in the metabolism of their target molecules. Therefore, genomic contexts of the hits can be used to ascertain the authenticity of the riboswitches identified exclusively by either of the search methods. The exclusive hits obtained from both the approaches were examined with respect to the genomic context of the downstream gene to calculate the percentage of CM hits covered by the pHMM. The percentage was calculated in two different ways and is reported in Table 2. Hits located within the genes or far upstream of the genes (thousands of base pairs upstream) were considered as false positives. Hits lying in repetitive regions were ignored. Hence, the estimation of the percentage coverage of CM hits by pHMM hits was calculated after removing all the above mentioned false positives. For a conservative estimate we included the hits lying upstream of hypothetical or putative genes because such hits may possibly be indicative of genuine riboswitches. However, in the second case we calculated the percentage coverage by removing the hits upstream of hypothetical and putative genes also. In this case, only the hits upstream of genes known to be involved in the corresponding ligand biosynthesis pathway were considered to be legitimate candidates for calculation of percentage coverage.
Funding: The work was partially funded by a Centre of Excellence (COE) grant provided by the Dept. of Biotechnology, Govt. of India to the Centre for Computational Biology and Bioinformatics (CCBB). Part of the computing work was carried out using the High Performance Computing Facility at Jawaharlal Nehru University.
- Eddy SR: Non-coding RNA genes and the modern RNA world. Nature Rev Genetics 2001, 2: 919–929. 10.1038/35103511View ArticleGoogle Scholar
- Moulton V: Tracking down noncoding RNAs. Proc Nat Acad Sci 2005, 102: 2269–2270. 10.1073/pnas.0500129102PubMed CentralView ArticlePubMedGoogle Scholar
- Kutter C, Svoboda P: miRNA, siRNA, piRNA: Knowns of the unknown. RNA Biol 2008, 5: 181–188.View ArticlePubMedGoogle Scholar
- Moazed D: Small RNAs in transcriptional gene silencing and genome defence. Nature 2009, 457: 413–420. 10.1038/nature07756PubMed CentralView ArticlePubMedGoogle Scholar
- Poole AM, Jeffares DC, Penny D: The path from the RNA world. J Mol Evol 1998, 46: 1–17. 10.1007/PL00006275View ArticlePubMedGoogle Scholar
- Jeffares DC, Poole AM, Penny D: Relics from the RNA world. J Mol Evol 1998, 46: 18–36. 10.1007/PL00006280View ArticlePubMedGoogle Scholar
- Tucker BJ, Breaker RR: Riboswitches as versatile gene control element. Curr Opin Struct Biol 2005, 15: 342–348. 10.1016/j.sbi.2005.05.003View ArticlePubMedGoogle Scholar
- Mandal M, Breaker RR: Gene regulation by riboswitches. Nat Rev Mol Cell 2004, 5: 451–463. 10.1038/nrm1403View ArticleGoogle Scholar
- Winkler WC, Breaker RR: Regulation of bacterial gene expression by riboswitches. Annu Rev Microbiol 2005, 59: 487–517. 10.1146/annurev.micro.59.030804.121336View ArticlePubMedGoogle Scholar
- Winkler WC, Cohen-Chalamish S, Breaker RR: An mRNA structure that controls gene expression by binding FMN. Proc Natl Acad Sci USA 2002, 99: 15908–15913. 10.1073/pnas.212628899PubMed CentralView ArticlePubMedGoogle Scholar
- Nou X, Kadner RJ: Adenosylcobalamin inhibits ribosome binding to btuB RNA. Proc Natl Acad Sci USA 2000, 97: 7190–7195. 10.1073/pnas.130013897PubMed CentralView ArticlePubMedGoogle Scholar
- Ellington AD, Szostak JW: In vitro selection of RNA molecules that bind specific ligands. Nature 1990, 346: 818–822. 10.1038/346818a0View ArticlePubMedGoogle Scholar
- Nudler E, Gusarov I: Analysis of the intrinsic transcription termination mechanism and its control. Methods Enzymol 2003, 371: 369–382. full_textView ArticlePubMedGoogle Scholar
- Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS: Regulation of lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch? Nucleic Acids Res 2003, 31: 6748–6757. 10.1093/nar/gkg900PubMed CentralView ArticlePubMedGoogle Scholar
- Mandal M, Breaker RR: Adenine riboswitches and gene activation by disruption of a transcription terminator. Nat Struct Mol Biol 2003, 11: 29–35. 10.1038/nsmb710View ArticlePubMedGoogle Scholar
- Lea CR, Piccirilli JA: 'Turning on' riboswitches to their antibacterial potential. Nat Chem Biol 2007, 3: 16–17. 10.1038/nchembio0107-16View ArticlePubMedGoogle Scholar
- Sudarsan N, Cohen-Chalamish S, Nakamura S, Emilsson GM, Breaker RR: Thiamine Pyrophosphate riboswitches are targets for the antimicrobial compound Pyrithiamine. Chem Biol 2005, 12: 1325–1335. 10.1016/j.chembiol.2005.10.007View ArticlePubMedGoogle Scholar
- Blount KF, Wang JX, Lim J, Sudarsan N, Breaker RR: Antibacterial lysine analogs that target lysine riboswitches. Nat Chem Biol 2007, 3: 44–49. 10.1038/nchembio842View ArticlePubMedGoogle Scholar
- Wieland M, Hartig JS: Artificial riboswitches: synthetic mRNA-based regulators of gene expression. Chembiochem 2008, 9: 1873–1878. 10.1002/cbic.200800154View ArticlePubMedGoogle Scholar
- Durbin R, Eddy SR, Krogh A, Mitchison G: Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press; 1998.View ArticleGoogle Scholar
- Krogh A, Brown M, Mian IS, Sjolander K, Haussler D: Hidden Markov models in computational biology: applications to protein modeling. J Mole Biol 1994, 235: 501–1531.View ArticleGoogle Scholar
- Hughey R, Krogh A: Hidden Markov models for sequence analysis: Extension and analysis of the basic method. Volume 12. CABIOS; 1996:95–107.Google Scholar
- Eddy S, Mitchison G, Durbin R: Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol 1995, 2: 9–23. 10.1089/cmb.1995.2.9View ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden markov models. Bioinformatics 1998, 14: 755–763. 10.1093/bioinformatics/14.9.755View ArticlePubMedGoogle Scholar
- Karplus K, Barrett C, Hughey R: Hidden Markov Models for detecting remote protein homologies. Bioinformatics 1998, 14: 846–856. 10.1093/bioinformatics/14.10.846View ArticlePubMedGoogle Scholar
- Eddy SR: HMMER: Profile Hidden Markov Models for biological sequence analysis.2001. [http://hmmer.janelia.org/]Google Scholar
- Wistrand M, Sonnhammer EL: Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics 2005, 6: 99–109. 10.1186/1471-2105-6-99PubMed CentralView ArticlePubMedGoogle Scholar
- Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16: 412–424. 10.1093/bioinformatics/16.5.412View ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007, 35: D61-D65. 10.1093/nar/gkl842PubMed CentralView ArticlePubMedGoogle Scholar
- Eddy SR, Durbin R: RNA sequence analysis using covariance models. Nucleic Acids Res 1994, 22: 2079–88. 10.1093/nar/22.11.2079PubMed CentralView ArticlePubMedGoogle Scholar
- Nawrocki EP, Eddy SR: Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol 2007, 3: e56. 10.1371/journal.pcbi.0030056PubMed CentralView ArticlePubMedGoogle Scholar
- Nawrocki EP, kolbe DL, Eddy SR: Infernal 1.0: inference of RNA alignments. Bioinformatics 2009, 25: 1335–1337. 10.1093/bioinformatics/btp157PubMed CentralView ArticlePubMedGoogle Scholar
- Weinberg Z, Ruzzo WL: Faster genome annotation of non-coding RNA families without loss of accuracy. In Proceedings of the Eighth Annual International Conference on Computational Molecular Biology (RECOMB). ACM press; 2004:243–251.Google Scholar
- Weinberg Z, Ruzzo WL: Exploiting conserved structure for faster annotation of non-coding RNAs without Loss of accuracy. Bioinformatics 2004, 20(Suppl 1):i334-i341. 10.1093/bioinformatics/bth925View ArticlePubMedGoogle Scholar
- Weinberg Z, Ruzzo WL: Sequence based heuristics for faster annotation of non-coding RNA families. Bioinformatics 2005, 22: 35–39. 10.1093/bioinformatics/bti743View ArticlePubMedGoogle Scholar
- Abreu-Goodger C, Merino E: RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements. Nucleic Acids Res 2005, 33: W690-W692. 10.1093/nar/gki445PubMed CentralView ArticlePubMedGoogle Scholar
- Abreu-Goodger C, Ontiveros-Palacios N, Ciria R, Merino E: Conserved regulatory motifs in bacteria: riboswitches and beyond. Trends Genet 2004, 20: 475–479. 10.1016/j.tig.2004.08.003View ArticlePubMedGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the 2nd International Conference on ISMB. AAAI Press; 1994:28–36.Google Scholar
- Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 1998, 14: 48–54. 10.1093/bioinformatics/14.1.48View ArticlePubMedGoogle Scholar
- Bengert P, Dandekar T: Riboswitch finder - a tool for identification of riboswitch RNAs. Nucleic Acids Res 2004, 32: W154–159. 10.1093/nar/gkh352PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 2005, 33: D121–124. 10.1093/nar/gki081View ArticlePubMedGoogle Scholar
- Rashid M, Saha S, Raghava GPS: Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics 2007, 8: 337–346. 10.1186/1471-2105-8-337PubMed CentralView ArticlePubMedGoogle Scholar
- Bagos PG, Liakopoulos TD, Spyropoulos IC, Hamodrakas SJ: A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins. BMC Bioinformatics 2004, 5: 29–42. 10.1186/1471-2105-5-29PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.