R3D-BLAST2: an improved search tool for similar RNA 3D substructures
© The Author(s) 2017
Published: 28 December 2017
RNA molecules have been known to play a variety of significant roles in cells. In principle, the functions of RNAs are largely determined by their three-dimensional (3D) structures. As more and more RNA 3D structures are available in the Protein Data Bank (PDB), a bioinformatics tool, which is able to rapidly and accurately search the PDB database for similar RNA 3D structures or substructures, is helpful to understand the structural and functional relationships of RNAs.
Since its first release in 2011, R3D-BLAST has become a useful tool for searching the PDB database for similar RNA 3D structures and substructures. It was implemented by a structural-alphabet (SA)-based method, which utilizes an SA with 23 structural letters to encode RNA 3D structures into one-dimensional (1D) structural sequences and applies BLAST to the resulting structural sequences for searching similar substructures of RNAs. In this study, we have upgraded R3D-BLAST to develop a new web server named R3D-BLAST2 based on a higher quality SA newly constructed from a representative and sufficiently non-redundant list of RNA 3D structures. In addition, we have modified the kernel program in R3D-BLAST2 so that it can accept an RNA structure in the mmCIF format as an input. The results of our experiments on a benchmark dataset have demonstrated that R3D-BLAST2 indeed performs very well in comparison to its earlier version R3D-BLAST and other similar tools RNA FRABASE, FASTR3D and RAG-3D by searching a larger number of RNA 3D substructures resembling those of the input RNA.
R3D-BLAST2 is a valuable BLAST-like search tool that can more accurately scan the PDB database for similar RNA 3D substructures. It is publicly available at http://genome.cs.nthu.edu.tw/R3D-BLAST2/.
Besides being involved in protein synthesis, RNAs have been found to perform other diverse functions in the cell, such as processing and modification of RNAs, regulation of gene expression, and degradation and translocation of proteins . In principle, it is widely believed that the functions of RNAs are largely determined by their three-dimensional (3D) structures. In the past few years, both the number and the size of experimentally solved RNA 3D structures in the Protein Data Bank (PDB)  and Nucleic Acid Database (NDB)  have dramatically increased. Therefore, automatic software tools capable of rapidly and accurately searching the PDB database for similar RNA 3D structures or substructures are helpful for the annotation of RNA structures and functions. Since computing similarity between two RNA 3D structures is an intractable task , currently existing tools, including RNA FRABASE [5, 6], FASTR3D , R3D-BLAST  and RAG-3D , all employ some heuristic approaches to scan the PDB database for similar RNA 3D structures and/or substructures.
In principle, both RNA FRABASE and FASTR3D use pattern-based approaches to search the PDB database for RNAs that have exactly the same secondary (2D) structure as that of the query RNA. As for R3D-BLAST, it reduces RNA 3D structures into one-dimensional (1D) structural sequences according to some local structure features in the nucleotide backbone conformation and applies BLAST to the resulting 1D structural sequences for searching similar RNA 3D substructures. RAG-3D exploits a coarse-grained graph representation to describe RNA 3D structures as simplified 3D graphs and searches for RNA 3D substructures with the same graph topology (i.e. pattern of vertex connectivity) as the query RNA substructure.
The above method we used to implement R3D-BLAST  is the so-called structural alphabet (SA)-based method, which utilizes an SA with 23 structural letters to encode RNA 3D structures from the PDB database into 1D sequences of structural letters and continues to apply BLAST , a popular bioinformatics tool to find homologous nucleotide or amino acid sequences just according to their sequence similarity, to search the SA-encoded sequences for similar RNA 3D substructures. In fact, the search performance of R3D-BLAST largely depends on the capability of the SA letters for representing the most common backbone conformations of RNA nucleotides. As reported in , two pseudo-torsion angles (i.e. η and θ), which are dihedral angles defined based on C4 ′ and P atoms from consecutive bases, are adequate to represent the backbone conformation of an RNA nucleotide. Therefore, the SA mentioned above was previously constructed from a collection of 117 RNA 3D structures (with 9527 nucleotides in total) using the η and θ values of their nucleotide backbones. Since the public release of R3D-BLAST in 2011, however, several hundreds of new RNA 3D structures have been experimentally determined and also deposited in the PDB database. Therefore, it can be expected that these newly determined RNA 3D structures should allow us to construct a new and sufficiently high-quality SA that can be used to further improve the search performance of R3D-BLAST.
Another reason to upgrade our R3D-BLAST is that the PDB data files used by R3D-BLAST to retrieve their RNA 3D structures or uploaded by the user to run R3D-BLAST were in the PDB format only. However, the PDB format now is a legacy format, because the size of a structure represented in a single PDB formatted file was limited to 99,999 atoms and the relationships among their data items were implicit . The mmCIF (macromolecular Crystallographic Information File) format released in 1997 does not have the limitations of the PDB format described above . Therefore, the PDB entries have been mainly distributed in the mmCIF format since it became the standard format of PDB archive distribution in 2014.
In this study, we have upgraded our RNA structural search tool R3D-BLAST to develop a new web server named R3D-BLAST2 (meaning R3D-BLAST version 2) based on a totally new SA that is constructed from a representative and sufficiently non-redundant list of 876 atomic-resolution RNA 3D structures (with 65,154 nucleotides in total). In addition, we have modified the kernel program in R3D-BLAST2 so that it can retrieve RNA 3D structures from the PDB data files in the mmCIF format and also allows the user to upload an mmCIF formatted file to search for similar RNA 3D substructures. For validation, we have used a benchmark dataset of RNA 3D structures to test R3D-BLAST2 and compare its search performance with its previous version R3D-BLAST and other similar RNA structural search tools, such as RNA FRABASE, FASTR3D and RAG-3D. Our experimental results have finally shown that R3D-BLAST2 indeed outperforms R3D-BLAST, as well as RNA FRABASE, FASTR3D and RAG-3D, by searching a larger number of RNA 3D substructures resembling those of the query RNA.
Algorithm of R3D-BLAST2
The structural alphabet of 23 conformational clusters with their associated capital letters and the η and θ pseudo-torsion angles of their center nucleotides
Inevitably, R3D-BLAST2 may return some RNA 3D substructures that actually do not resemble any query RNA substructure. In fact, the E-values of these substructures are usually high. Therefore, we further equipped R3D-BLAST2 with an optional filter, which can screen out some returned RNA 3D substructures that do not pass user-defined thresholds of root mean square deviation (RMSD), structural alignment score (SAS) and/or percentage of structural identity (PSI), where SAS equals to 100×RMSD/(number of aligned residues)  and PSI is defined as a percentage of superimposed residues within 4.0 Å with respect to the length of the shorter of the two aligned structures . For the sake of reducing running time, the above filter option in R3D-BLAST2 is not enabled by default.
Usage of R3D-BLAST2
Results and discussion
A benchmark dataset of eight RNA 3D structures
Search results of RNA FRABASE (version 2), FASTR3D, R3D-BLAST, RAG-3D and R3D-BLAST2
In addition, except pseudoknot and ribozyme in the benchmark dataset, R3D-BLAST2 still identified more RNA structure hits with 100% query coverage whose entire 3D structures are highly resemble the query RNA when comparing with RNA FRABASE and FASTR3D. Recall that RNA FRABASE and FASTR3D were both developed to search for RNAs that have the same 2D structure as the query RNA without any insertions and deletions. As demonstrated in our experimental results, therefore, they could inevitably miss those RNAs that possess the same overall 3D structures but different 2D structures and/or lengths. When queried with the pseudoknot and ribozyme in the benchmark dataset, both RNA FRABASE and FASTR3D returned more RNA structure hits as compared to the search result of R3D-BLASTS2 with 100% query coverage. In fact, some regions around these RNA 3D structures identified by RNA FRABASE or FASTR3D are not quite similar to the query RNA and as a result, they were removed from those RNA structure hits returned by R3D-BLAST2. In other words, all the pseudoknots and ribozymes returned by both RNA FRABASE and FASTR3D still can be found by R3D-BLAST2, but with a query coverage less than 100%. On the other hand, R3D-BLAST2 returned a large number of other structurally similar RNA substructures that actually are missed by both RNA FRABASE and FASTR3D.
Comparison of running time (in seconds) for RNA FRABASE (version 2), FASTR3D, R3D-BLAST, RAG-3D and R3D-BLAST2
In this study, we upgraded our previous RNA structural search tool R3D-BLAST to develop a new web server named R3D-BLAST2 based on a newly constructed structural alphabet with a higher capability of representing the most common backbone conformations of RNA nucleotides. In contrast to the previous version, R3D-BLAST2 now can retrieve RNA 3D structures from the PDB data files in the mmCIF format and also allow the user to upload an mmCIF formatted file as an input to search for its similar RNA 3D substructures. According to our experimental results on a benchmark dataset, R3D-BLAST2 indeed outperforms its previous version R3D-BLAST and other similar RNA structural search tools RNA FRABASE, FASTR3D and RAG-3D by searching a larger collection of RNA 3D substructures resembling a query RNA substructure. It can be expected that R3D-BLAST2 will become a valuable tool to annotate RNA structures and functions because RNA molecules with the same function usually share similar 3D substructures.
This study was partially supported by Ministry of Science and Technology of Republic of China under grant MOST104-2221-E-007-027-MY2.
The publication costs of this paper were funded by Ministry of Science and Technology of Republic of China under grant MOST104-2221-E-007-027-MY2.
Availability of data and materials
All data analyzed during this study are included in this published article.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 18 Supplement 16, 2017: 16th International Conference on Bioinformatics (InCoB 2017): Bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-18-supplement-16.
CLL conceived of the study, carried out analyses and wrote the manuscript. CYY, JCL and KTC implemented the software, conducted the experiments and analyzed their results. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Gesteland RF, Cech T, Atkins JF. The RNA World: the Nature of Modern RNA Suggests a Prebiotic RNA world, 3rd edn. New York: Cold Spring Harbor Laboratory Press; 2006.Google Scholar
- Rose PW, Prlic A, Bi CX, Bluhm WF, Christie CH, Dutta S, Green RK, Goodsell DS, Westbrook JD, Woo J, Young J, Zardecki C, Berman HM, Bourne PE, Burley SK. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2015; 43:345–56.View ArticleGoogle Scholar
- Coimbatore Narayanan B, Westbrook J, Ghosh S, Petrov AI, Sweeney B, Zirbel CL, Leontis NB, Berman HM. The nucleic acid database: new features and capabilities. Nucleic Acids Res. 2014; 42:114–22.View ArticleGoogle Scholar
- Kolodny R, Linial N. Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci USA. 2004; 101:12201–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Popenda M, Blazewicz M, Szachniuk M, Adamiak RW. RNA FRABASE version 1.0: an engine with a database to search for the three-dimensional fragments within RNA structures. Nucleic Acids Res. 2008; 36:386–91.View ArticleGoogle Scholar
- Popenda M, Szachniuk M, Blazewicz M, Wasik S, Burke EK, Blazewicz J, Adamiak RW. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures. BMC Bioinforma. 2010; 11:231.View ArticleGoogle Scholar
- Lai CE, Tsai MY, Liu YC, Wang CW, Chen KT, Lu CL. FASTR3D: a fast and accurate search tool for similar RNA 3D structures. Nucleic Acids Res. 2009; 37:287–95.View ArticleGoogle Scholar
- Liu YC, Yang CH, Chen KT, Wang JR, Cheng ML, Chung JC, Chiu HT, Lu CL. R3D-BLAST: a search tool for similar RNA 3D substructures. Nucleic Acids Res. 2011; 39:45–9.View ArticleGoogle Scholar
- Zahran M, Sevim Bayrak C, Elmetwaly S, Schlick T. RAG-3D: a search tool for RNA 3D substructures. Nucleic Acids Res. 2015; 43:9474–88.View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25:3389–402.View ArticlePubMedPubMed CentralGoogle Scholar
- Wadley LM, Keating KS, Duarte CM, Pyle AM. Evaluating and learning from RNA pseudotorsional space: quantitative validation of a reduced representation for RNA structure. J Mol Biol. 2007; 372:942–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Berman HM, Burley SK, Kleywegt GJ, Markley JL, Nakamura H, Velankar S. The archiving and dissemination of biological structure data. Curr Opin Struct Biol. 2016; 40:17–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Bourne PE, Berman HM, McMahon B, Watenpaugh KD, Westbrook JD, Fitzgerald PM. The macromolecular crystallographic information file (mmCIF). Methods Enzymol. 1997; 277:571–90.View ArticlePubMedGoogle Scholar
- Leontis NB, Zirbel CL. Nonredundant 3D structure datasets for RNA knowledge extraction and benchmarking In: Leontis NB, Westhof E, editors. RNA 3D Structure Analysis and Prediction. Berlin Heidelberg: Springer: 2012. p. 281–98.View ArticleGoogle Scholar
- Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007; 315:972–6.View ArticlePubMedGoogle Scholar
- Henikoff S, Henikoff JG. Amino-acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992; 89:10915–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA. 1990; 87:2264–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Bundschuh R, Olsen R, Hwa T. The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 2001; 29:351–61.View ArticlePubMedPubMed CentralGoogle Scholar
- Kolodny R, Koehl P, Levitt M. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mole Biol. 2005; 346:1173–88.View ArticleGoogle Scholar
- Capriotti E, Marti-Renom MA. SARA: a server for function annotation of RNA structures. Nucleic Acids Res. 2009; 37:260–5.View ArticleGoogle Scholar