FRASS: the web-server for RNA structural comparison
© Kirillova et al; licensee BioMed Central Ltd. 2010
Received: 22 February 2010
Accepted: 16 June 2010
Published: 16 June 2010
The impressive increase of novel RNA structures, during the past few years, demands automated methods for structure comparison. While many algorithms handle only small motifs, few techniques, developed in recent years, (ARTS, DIAL, SARA, SARSA, and LaJolla) are available for the structural comparison of large and intact RNA molecules.
The FRASS web-server represents a RNA chain with its Gauss integrals and allows one to compare structures of RNA chains and to find similar entries in a database derived from the Protein Data Bank. We observed that FRASS scores correlate well with the ARTS and LaJolla similarity scores. Moreover, the-web server can also reproduce satisfactorily the DARTS classification of RNA 3D structures and the classification of the SCOR functions that was obtained by the SARA method.
The FRASS web-server can be easily used to detect relationships among RNA molecules and to scan efficiently the rapidly enlarging structural databases.
Three-dimensional (3D) structures of RNA have been extensively studied because of the importance of these molecules that are involved into several biological processes. Structural analysis is also essential for molecular evolution investigations and the development of structure prediction tools.
The flexibility of the RNA backbone, limited to six torsion angles per base, is restricted. It was shown [1–3] that RNA molecules can be represented by using collections of stable structures or structural motifs. Therefore, most of the 3D tools developed for RNA molecules handle only local structural elements or motifs.
PRIMOS  uses the 3D worm representation of a RNA molecule in which two coordinates correspond to the pseudo-torsions describing the nucleotide conformation and a third coordinate defines the sequence. PRIMOS was used to perform structural motif searches and is limited to the comparison of different conformations of the same molecule. COMPADRES  is a modified version of PRIMOS, developed to discover new motifs. The NASSAN method  uses the graph representation of a RNA structure and applies the Ullmann subgraph isomorphism algorithm  for structural comparison. The efficiency of the Ullmann algorithm rapidly decrease with the number of nodes in the query motif (subgraph) and it is not suitable for large molecules. In the method proposed by Apostolico et al. , RNA motifs are represented by the histograms of the distances between all backbone atoms and the centroid of the phosphate atoms. Thus, 3D comparisons are reduced to comparisons of the shapes of the histograms. The FR3D  method was also developed for finding local motifs in RNA structures. It applies the base centered approach in which each base is represented by the position of its glycosidic nitrogen in 3D space and by the rotation matrix that allows its orientation with respect to a common frame. The automated approach for identification of RNA conformational motifs proposed by Hershkovitz et al.  is based on binning of the values of the torsion angles defining the nucleotide conformation. This method makes possible to use alphabetic descriptions of 3D structures. It was later extended, by the RNA Ontology Consortium, into the "consensus modular string nomenclature" for RNA structures, which includes 46 discrete conformers represented by two characters . Although the techniques described above were successfully applied for comparison, searching, and discovering of structural motifs, a unique, universal, and effective method for RNA structural comparison is still lacking. Thus, the well-known classification of RNA structural motifs - the SCOR database - was carried out manually . This fact can be explained by the irregularity of the RNA 3D structures and discrepancies in the structural motifs definitions .
The structural analysis of large and intact RNA molecules, and not only of small fragments, became increasingly more common during the last few years [1, 2]. Therefore, methods like DIAL , ARTS , SARSA , SARA , and LaJolla , suitable for comparison of large global folds, are of key importance. DIAL (dihedral alignment) defines dihedral/pseudo-dihedral similarity using six backbone torsions and includes also nucleotide sequence and base-pairing similarities. The ARTS (alignment of RNA tertiary structure) method produces the score consisting of two components that correspond to the numbers of spatially close base pairs and of single nucleotides. The method was applied to create the DARTS database that is a RNA classification mainly based on global spatial resemblance . SARA produces the structure alignment using unit-vector strategy  and estimates the similarity degree of sequence, secondary structure and 3D structure . The SARA web-server provides functional annotations according to the SCOR classification . The SARSA  and LaJolla  algorithms transform 3D structures into 1D strings of characters that can be aligned and compared with faster techniques.
In the past years the number and size of novel RNA structures has dramatically increased. However, comparison methods based on structural alignments are not sufficiently rapid for interactive scanning of large databases. Quite a few alternative algorithms, which do not produce structural alignments, were designed for rapid comparison of protein structures [20–27]. They are effective for scanning large databases and also provide a quantitative measure of fold similarity to visualize the structural diversity of proteins on 2D or 3D maps using Principal Component Analysis (PCA) [20, 21]. One of these structural comparison methods is the Gauss-integrals representation of a protein structure [21–23]. The structure is represented only by positions of backbone Cα atoms and, therefore, can be applied also to RNA molecules, with minor modifications.
In this article, we describe a web-server developed for global comparison of RNA structures using Gauss-integrals . The FRASS web-server is free and open to all users and there is no login requirement. The server allows both pair-wise comparisons between two structures provided by the user and database scanning, in which the user wants to extract database entries similar to his query. The correlation between the Gauss-integrals based distances and the ARTS and LaJolla similarity scores was observed to be considerable. Moreover, by considering as a benchmark the DARTS and SCOR classifications of RNA structures, it was proven that the method based on the Gauss-integrals allows one to automatically build satisfactory classifications.
In the Gauss-integrals approach [21–23], the RNA backbone is regarded as a polygonal space curve μ that is a series of connected line segments. The segments are specified by a sequence of phosphorous atoms (P1, P2, .... PN). The first segment is the section between P1 and P2 atoms, the second segment is the section between P2 and P3, ect. The shape of the polygonal space curve is described by a 30-dimensional vector containing the length of the backbone and the 29 Gauss integrals of the first, second and third order.
The Gauss Integrals are defined by mutual spatial superposition of the segments and represent the mathematical description of the backbone shape of a RNA molecule. The method can not be used for small molecules because the third-order Gauss-integral defined by the spatial superposition of six segments and can be computed only for molecules containing more than 7 nucleotides. 2,703 RNA structures out of the 3,353 deposited into the PDB contain more than 7 nucleotides. The distance between two RNA structures was measured by the Euclidean distance between the two 30-dimensional vectors, one for each RNA molecule.
The RNA structures were downloaded from the Protein Data Bank. Each monomeric RNA molecule was stored in a single PDB-formatted file and its Gauss integrals were computed. Given that the Protein Data Bank is frequently updated, a program for rapidly updating the RNA structure database was developed. The GI program for Gauss-integral calculations is freely available at . In our modified algorithm, the RNA phosphorous atoms are used for backbone description. The program that allows one to scan the RNA structural database and to find the entries that are the most similar to the query was also written locally. By definition, the Gauss-integrals can be used to describe a single polymer chain. Presently, the web-served can process only single chains, though it must be considered that a significant percentage of RNA structures are formed by more than one chain. In this case, each chain should be processed separately.
Comparison with ARTS and LaJolla similarity scores
Pearson correlation coefficients are equal to -0.98 in both cases. The correlations are negative since the Gauss-integrals based scores are distances while the ARTS and LaJolla scores are similarities.
Benchmarking against the DARTS and SCOR classifications
An effective way to test the classification ability of a method is to compute a ROC curve. In the present study, the DARTS and SCOR classifications of RNA structures was used as an external benchmark. The database DARTS contains 1,333 RNA structures determined experimentally. They are classified into 94 clusters. Only the 789 single chain entries of DARTS were retained, since the web server described in the present manuscript handles only monomeric molecules. Gauss-integrals based distances were thus computed for 310,866 pairs of RNA structures. The NR95-SCOR dataset available at  contains 60 RNA chains that have more than 20 and less 300 nucleotides and are assigned to SCOR functional classes with SARA. This results in 1,770 unique pairs of RNA structures.
where true positive (TP) and false positive (FP) that are the number of correctly and incorrectly predicted pairs of the same DARTS/SCOR cluster, while true negative (TN) and false negative (FN) are the number of correctly and incorrectly predicted pairs of different clusters. Different points in the ROC curve are obtained by varying the Gauss-integrals based distance value under which two structures are considered to be similar and to belong to the same DARTS/SCOR cluster.
The computation of the Gauss integrals is O(n3) in time and it was observed that for long molecules (~1,500 nt) it takes about half an hour on a standard PC. Therefore, dealing with a large database, the Gauss integrals must be pre-computed and stored on a hard disk. The computing of Euclidean distances between pre-calculated Gauss integrals is on the contrary extremely fast and the database scanning takes very few seconds on a standard PC.
Although the fact that FRASS does not produce structural alignments, it must be observed that, in general, methods that generate structural alignments are not suitable to work with large databases containing long RNA chains: although ARTS is O(n3) in time, DIAL is O(n2) in time, and LaJolla and SARSA are O(1) in time, nothing can be pre-computed and stored on a hard disk for further elaboration. The SARSA and LaJolla methods, which transform 3D structures into 1D strings, are faster than other techniques. In particular, SARSA was shown to be faster than DIAL, though no quantitative information was published . LaJolla takes about 15 minutes on a standard PC to generate 5,050 alignments of RNA chains (see the datasets described in the paragraph "Comparison with ARTS and LaJolla similarity scores"). On the contrary, the computation of 5,050 Gauss-integrals based distances takes only about one second. Moreover, structural alignments and function assignments with the SARA server are limited to RNA structures with less than 1,000 nucleotides, since computations are very demanding.
Global similarity of 23S ribosomal RNAs
The large 23S ribosomal RNA from Haloarcula marismortui, the crystal structure of which was refined at 2.4 Å resolution , was chosen to test the web-server. As a query, we selected the chain 0, containing about 2,700 nucleotides, taken from 1FFK file of the Protein Data Bank. The most similar structure found in the database using the FRASS web-server is the 23S ribosomal RNA from Deinococcus radiodurans (PDB identification code 3CF5, chain X, about 2,700 nucleotides) . The Gauss-integrals distance between the two structures is equal to 1.2 that reveals their high structural similarity because 96% of distances computed for all pairs of RNA database are larger than 1.2. The similarity ARTS score equal to 3,294.00 corresponds to 588 aligned base-pairs and 2,118 aligned nucleotides. The high global similarity detected by both methods supports the similar biological activity of the two molecules that was also analyzed in recent, detailed comparisons of their structures and functions [28, 29].
The FRASS web-server is an effective tool for global comparison and classification of RNA structures. The similarity measure, based on the Gauss-integrals, is related to the backbone shape of a single RNA chain, represented by the positions of the phosphorous atoms. It is alternative and complementary to other similarity scores that considers base-pairs. Given the simplification of the backbone representation, computations are extremely fast. The web-server allows thus database scanning that can be used to detect relationships among RNA molecules, and to assign function to a new experimentally determined structure on the base of the structural similarity.
Availability and requirements
Project name: FRASS
Project home page: http://sourceforge.net/projects/frass/
Operating systems: Platform independent for web-server, Linux for downloaded software
Programming languages: C and Perl
License: GNU GPL
Funding support from the BIN-III programme of the Austrian GEN-AU is gratefully acknowledged as well as the king hospitality at the Vienna University by Prof. Kristina Djinovic.
- Hendrix DK, Brenner SE, Holbrook SR: RNA structural motifs: building blocks of a modular biomolecule. Q Rev Biophys 2005, 38: 221–243. 10.1017/S0033583506004215View ArticlePubMedGoogle Scholar
- Holbrook SR: RNA structure: the long and the short of it. Curr Opin Struct Biol 2005, 15: 302–308. 10.1016/j.sbi.2005.04.005View ArticlePubMedGoogle Scholar
- Leontis NB, Lescoute A, E W: The building blocks and motifs of RNA architecture. Curr Opin Struct Biol 2006, 16: 279–287. 10.1016/j.sbi.2006.05.009View ArticlePubMedGoogle Scholar
- Duarte CM, Wadley LM, Pyle AM: RNA structure comparison, motif search and discovery using a reduced representation of RNA conformational space. Nucleic Acids Res 2003, 31(16):4755–4761. 10.1093/nar/gkg682View ArticlePubMedPubMed CentralGoogle Scholar
- Wadley LM, Pyle AM: The identification of novel RNA structural motifs using COMPADRES: an automated approach to structural discovery. Nucleic Acids Res 2004, 32(22):6650–6659. 10.1093/nar/gkh1002View ArticlePubMedPubMed CentralGoogle Scholar
- Harrison AM, South DR, Willett P, Artymiuk PJ: Representation, searching and discovery of patterns of bases in complex RNA structures. J Comput Aided Mol Des 2003, 17(8):537–549. 10.1023/B:JCAM.0000004603.15856.32View ArticlePubMedGoogle Scholar
- Ullman JR: An Algorithm for Subgraph Isomorphism. J Assoc for Comput Machinery 1976, 23: 31–42.View ArticleGoogle Scholar
- Apostolico A, Ciriello G, Guerra C, Heitsch CE, Hsiao C, Williams LD: Finding 3D motifs in ribosomal RNA structures. Nucleic Acids Res 2009, 37(4):e29. 10.1093/nar/gkn1044View ArticlePubMedPubMed CentralGoogle Scholar
- Sarver M, Zirbel CL, Stombaugh J, Mokdad A, Leontis NB: FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. J Math Biol 2008, 56(1–2):215–252. 10.1007/s00285-007-0110-xView ArticlePubMedPubMed CentralGoogle Scholar
- Hershkovitz E, Tannenbaum E, Howerton SB, Sheth A, Tannenbaum A, Williams LD: Automated identification of RNA conformational motifs: theory and application to the HM LSU 23S rRNA. Nucleic Acids Res 2003, 31(21):6249–6257. 10.1093/nar/gkg835View ArticlePubMedPubMed CentralGoogle Scholar
- Richardson JS, Schneider B, Murray LW, Kapral GJ, Immormino RM, Headd JJ, Richardson DC, Ham D, Hershkovits E, Williams LD, et al.: RNA backbone: Consensus all-angle conformers and modular string nomenclature (an RNA Ontology Consortium contribution). RNA 2008, 14: 465–481. 10.1261/rna.657708View ArticlePubMedPubMed CentralGoogle Scholar
- Klosterman S, Tamura M, Holbrook SR, Brenner SE: SCOR: a Structural Classification of RNA database. Nucleic Acids Res 2002, 30(1):392–394. 10.1093/nar/30.1.392View ArticlePubMedPubMed CentralGoogle Scholar
- Ferrè F, Ponty Y, Lorenz WA, Clote P: DIAL: a web server for the pairwise alignment of two RNA three-dimensional structures using nucleotide, dihedral angle and base-pairing similarities. Nucleic Acids Res 2007, 35: W659–668. 10.1093/nar/gkm334View ArticlePubMedPubMed CentralGoogle Scholar
- Dror O, Nussinov R, Wolfson H: ARTS: alignment of RNA tertiary structures. Bioinformatics 2005, 21(Suppl 2):ii47-ii53. 10.1093/bioinformatics/bti1108View ArticlePubMedGoogle Scholar
- Chang YF, Huang YL, Lu CL: SARSA: a web tool for structural alignment of RNA using a structural alphabet. Nucleic Acids Res 2008, 36: W19-W24. 10.1093/nar/gkn327View ArticlePubMedPubMed CentralGoogle Scholar
- Capriotti E, Marti-Renom MA: SARA: a server for function annotation of RNA structures. Nucleic Acids Res 2009, 37(Web-Server-Issue):W260-W265. 10.1093/nar/gkp433View ArticlePubMedPubMed CentralGoogle Scholar
- Bauer RA, Rother K, Moor P, Reinert K, Steinke T, Bujnicki JM, Preissner R: Fast Structural Alignment of Biomolecules Using a Hash Table, N-Grams and String Descriptors. Algorithms 2009, 2(2):692–709. 10.3390/a2020692View ArticleGoogle Scholar
- Abraham M, Dror O, Nussinov R, Wolfson HJ: Analysis and classification of RNA tertiary structures. RNA (New York, NY) 2008, 14(11):2274–2289.View ArticleGoogle Scholar
- Kedem K, Chew LP, Elber R: Unit-vector RMS (URMS) as a tool to analyze molecular dynamics trajectories. Proteins 1999, 37: 554–564. 10.1002/(SICI)1097-0134(19991201)37:4<554::AID-PROT6>3.0.CO;2-1View ArticlePubMedGoogle Scholar
- Choi IG, Kwon J, Kim SH: Local feature frequency profile: a method to measure structural similarity in proteins. Proc Natl Acad Sci USA 2004, 101: 3797–3802. 10.1073/pnas.0308656100View ArticlePubMedPubMed CentralGoogle Scholar
- Røgen P, Fain B: Automatic classification of protein structure by using Gauss integrals. Proc Natl Acad Sci USA 2003, 100: 119–124. 10.1073/pnas.2636460100View ArticlePubMedPubMed CentralGoogle Scholar
- Røgen P: Evaluating protein structure descriptors and tuning Gauss integral based descriptors. J Phys Condens Matter 2005, 17: S1523-S1538. 10.1088/0953-8984/17/18/010View ArticleGoogle Scholar
- Nielsen BG, Røgen P, Bohr HG: Gauss-integral based representation of protein structure for predicting the fold class from the sequence. Math Comput Model 2006, 43: 401–412. 10.1016/j.mcm.2005.11.014View ArticleGoogle Scholar
- FRASS - RNA Structure Comparison[http://protein.bio.unipd.it/frass/]
- GI: Gauss Integrals - a tool for 3D protein structure description, comparison and classification[http://www2.mat.dtu.dk/people/Peter.Roegen/Gauss_Integrals.html]
- Ban N, Nissen P, Hansen J, Moore PB, Steitz TA: The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science 2000, 289(11):905–920. 10.1126/science.289.5481.905View ArticlePubMedGoogle Scholar
- Harms JM, Wilson DN, Schluenzen F, Connell SR, Stachelhaus T, Zaborowska Z, Spahn CM, P F: Translational regulation via L11: molecular switches on the ribosome turned on and off by thiostrepton and micrococcin. Molecular cell 2008, 30(11):26–38. 10.1016/j.molcel.2008.01.009View ArticlePubMedGoogle Scholar
- Wilson DN, Harms JM, Nierhaus KH, Schlünzen F, P F: Species-specific antibiotic-ribosome interactions: implications for drug development. Biol Chem 2005, 386(12):1239–1252. 10.1515/BC.2005.141View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.