 Research Article
 Open Access
 Published:
LCSTA to identify similar fragments in RNA 3D structures
BMC Bioinformatics volume 18, Article number: 456 (2017)
Abstract
Background
In modern structural bioinformatics, comparison of molecular structures aimed to identify and assess similarities and differences between them is one of the most commonly performed procedures. It gives the basis for evaluation of in silico predicted models. It constitutes the preliminary step in searching for structural motifs. In particular, it supports tracing the molecular evolution. Faced with an everincreasing amount of available structural data, researchers need a range of methods enabling comparative analysis of the structures from either global or local perspective.
Results
Herein, we present a new, superpositionindependent method which processes pairs of RNA 3D structures to identify their local similarities. The similarity is considered in the context of structure bending and bonds’ rotation which are described by torsion angles. In the analyzed RNA structures, the method finds the longest continuous segments that show similar torsion within a userdefined threshold. The length of the segment is provided as local similarity measure. The method has been implemented as LCSTA algorithm (Longest Continuous Segments in Torsion Angle space) and is incorporated into our MCQ4Structures application, freely available for download from http://www.cs.put.poznan.pl/tzok/mcq/.
Conclusions
The presented approach ties torsionanglebased method of structure analysis with the idea of local similarity identification by handling continuous 3D structure segments. The first method, implemented in MCQ4Structures, has been successfully utilized in RNAPuzzles initiative. The second one, originally applied in Euclidean space, is a component of LGA (LocalGlobal Alignment) algorithm commonly used in assessing protein models submitted to CASP. This unique combination of concepts implemented in LCSTA provides a new perspective on structure quality assessment in local and quantitative aspect. A series of computational experiments show the first results of applying our method to comparison of RNA 3D models. LCSTA can be used for identifying strengths and weaknesses in the prediction of RNA tertiary structures.
Background
A comparison of contents stored in NCBI Reference Sequence Database (RefSeq) [1] and Protein Data Bank (PDB) [2] brings to a conclusion that there is a large, everwidening gap between the numbers of known sequences and structures of biomolecules. Today, this gap is being filled with the use of computational methods that address the problem of RNA and protein 3D structure prediction. Following that, a necessity to estimate the quality of computational models and fidelity of predictors arises. Since the 1990s, CASP (Critical Assessment of protein Structure Prediction) experiment has taken the challenge of assessing protein structure prediction [3]. RNAPuzzles initiative launched in 2011 and drawing on the solutions implemented in CASP, followed to support the RNA community [4, 5]. Both experiments have significantly contributed to a development of measures and methods for validation and assessment of 3D structure models predicted in silico [6]. The resulting algorithms have been applied not only in the evaluation of predicted proteins and RNAs. They are also used for validation and analysis of experimentally solved structures, clustering 3D models, identification of structure motifs, tracking conformational changes, exploring the sequencestructure relationship, etc. [6,7,8,9,10,11,12,13,14].
RNAPuzzles, a collective experiment for blind RNA structure prediction, uses the following approaches to assess submitted RNA 3D models: (i) Root Mean Square Deviation (RMSD), (ii) Interaction Network Fidelity (INF) [15], (iii) Deformation Index (DI), (iv) Clash score by MolProbity [16], and (v) Mean of Circular Quantities (MCQ) [17]. Except that, a few other RNA evaluation methods have been developed and applied in various projects [8, 18]. All of them relate to various attributes of the considered RNA 3D structures, but their common feature is that the structures are mainly evaluated globally. Similarly, most structure assessment methods in CASP treat protein models globally, and only a few touch an aspect of local similarity. Such approach is fully understood and seems sufficient when we deal with the evaluation and ranking of many models submitted to the competition. However, when analyzing individual structures, finding their strengths and weaknesses, comparing substructures, or identifying motifs, a local assessment is necessary. In such cases, local evaluation of the 3D model complements global analysis and significantly enhances our knowledge of the structure.
So far, one approach has been proposed to enable a local view on predicted RNA 3D model compared to the target structure. It is based on a concept of spheres built along RNA backbone and providing the scene for preview and RMSDbased evaluation of sphereenclosed atom subsets. It has been first implemented as a standalone application named RNAlyzer [8], and later released as RNAssess webserver [19]. In the case of proteins, LocalGlobal Alignment (LGA) is one of the most common approaches enabling local analysis [20]. LGA comprises two methods, Longest Continuous Segments (LCS) and Global Distance Test (GDT). The first one identifies the longest continual fragment within predicted protein structure which – compared to the target – has the RMSD below a given threshold. The second method computes the percentage of residues fitting below predefined distance cutoff. LGA is the reference method used to evaluate protein structures in CASP.
The methods mentioned in the previous paragraph operate in Euclidean space where each structure is represented as a set of atoms with coordinates in the Cartesian system. As all other approaches which consider molecule structures in Euclidean space and apply RMSDbased evaluation, they deal with the computationally demanding problem of optimum 3D structure alignment. This problem can be omitted when switching to the space of torsion angles. The 3D structure of RNA can be represented by a set of eight torsion angles that describe the course of its backbone and arrangement of the bases. Such representation makes a comparison of structures independent of their alignment in space and simplifies the computation. This concept has been followed in MCQ4Structures method [17] that expresses structure similarity as Mean of Circular Quantities (MCQ).
Here, we propose a new method that integrates a concept of RNA 3D structure comparison in the space of torsion angles [17] with the idea of identifying longest continuous segments displaying local similarity [20]. Two segments are considered similar if their MCQ value is below the predefined threshold. The method has been implemented as LCSTA algorithm (Longest Continuous Segments in Torsion Angle space) and incorporated into MCQ4Structures software. It is freely available at http://www.cs.put.poznan.pl/tzok/mcq/.
Methods
LCSTA has been designed as the local similarity measure. It aims to compare two RNA 3D structures, S (structure of the target) and S′ (structure of the model), and identify similar fragments within them. It runs either in sequenceindependent or sequencedependent mode. In the first mode, the compared structures can have different lengths, and the relationship between their residues can be unknown. Thus, no preliminary analysis of the sequences of S and S′ is required here. In the second mode, the method processes structures of the same length. LCSTA operates in the space of torsion angles, so it is superpositionindependent and does not involve finding the optimum alignment of structures. The method scans both structures stepwise along their backbones and uses a moving search window to select segments for a comparison. In this routine, a divide and conquer formula is followed to determine the window size in each step. For a pair of windowhighlighted segments, LCSTA computes MCQ value over a set of torsion angles related to the segments. Next, it checks whether the MCQ value is below the threshold. At the output, LCSTA provides the length of the longest continuous segment satisfying similarity condition (i.e., fitting below the threshold) and segment location (its first and last residue numbers). The resulting segment’s length (referred to as LCS) is the measure of local similarity. Both components of the method, that is divide and conquer procedure and MCQbased measure, are described in the following paragraphs.
Divide and conquer procedure
Divide and conquer (D&C) is a technique used to optimize the process of solving the problem by recursively splitting it into smaller subproblems and using their solutions to build the solution of the input problem. In our method, we apply D&C approach to determine lengths of the search window in consecutive steps of the algorithm. The example recursion tree visualizing divideandconquerdriven computation in LCSTA algorithm is presented in Fig. 1.
The initial window size in LCSTA is equal to the number n of residues in the predicted model (WinSize = n). In each iteration, the algorithm checks whether a feasible solution (namely continuous segment with MCQ below the threshold) exists for current window size. In the case of a negative result, WinSize is divided by 2 (and rounded up to the least succeeding integer). Otherwise, it is incremented to a value halfway between current size and WinSize of grandparent iteration (i.e., iteration i2, where i is the order number of current iteration) except the first iteration where n1 is taken as an upper bound of WinSize. Next, the computation runs recursively for both sizes of the search window, thus branching into two subproblems. The algorithm stops if further reduction of the window size is impossible (WinSize = 1) and all possible solutions for that WinSize value have been checked, or if the optimum solution is found. Such computation pattern, known as binary tree recursion, is one of the most commonly used in the implementation of the D&C method. Its time complexity is O(log_{2} n), where n is the instance size (in our problem n is the number of residues in S′ – structure of predicted model).
MCQbased measure
The MCQbased distance measure has been developed for trigonometric representation of the molecule 3D structure [17]. In this representation, a shape of every RNA residue is described by eight torsion angles from the set T = {α, β, γ, δ, ε, ζ, P, χ}. Each torsion angle in RNA molecule is defined by atom quadruple (the details can be found in [17, 21]) and determines rotation around particular chemical bond. It is computed as a dihedral angle between two planes defined by a pair of overlapping atom triples. Having a chain ABCD of four atoms, we can easily determine the torsion angle between the plane passing through A, B, C, and the plane passing through B, C, D.
When the RNA structure is composed of n residues, then its trigonometric representation is a matrix containing 8n values of torsion angles t _{ ij }, where i = 1,...,n, j = 1,...,T, and T is a set of torsion angles defined for RNA (t _{ ij } is torsion angle of type j within residue i). To measure the distance between two structures, S and S′, of equal length (n residues), given in trigonometric representations, we apply formula (1) for computing mean of circular quantities [17]:
The twoargument arctan(y, x) is used to distinguish results from the whole range [−π; π). This is possible, because the function calculates angle value from the positive X halfaxis to the vector between points (0, 0) and (x, y) in a Cartesian coordinate system. In particular, this means that, unlike oneargument \( \arctan \left(\raisebox{1ex}{$y$}\!\left/ \!\raisebox{1ex}{$x$}\right.\right) \) the twoargument variant is welldefined for x = 0 and in general arctan(y, x) ≠ arctan(−y, −x) which is not true for oneargument function.
In formula (1), the following function is used to obtain the distance between two angles:
Where
and
MCQ has been defined as a distance measure, and it shows the dissimilarity of two threedimensional structures of the same length. Thus, the greater is its value, the more the two structures differ. And accordingly, the smaller the MCQ value, the greater is the similarity of compared structures.
It should be noted, that set T of torsion angles defined for RNA originally contained eight types of angles. However, MCQ is flexible, and any subset of T can be used to measure it. For example, if the user is interested to consider ribose ring only, then MCQ can be computed involving pseudotorsion angle P (or, alternatively, τ_{0}, τ_{1}, τ_{2}, τ_{3}, τ_{4} angles). In the presented version of the algorithm we use original set T = {α, β, γ, δ, ε, ζ, P, χ}.
Finally, let us add that originally MCQ value is computed in radians. In our application, it is next converted into degrees and so presented to the user.
LCSTA algorithm
The LCSTA algorithm compares two RNA 3D structures (hereby referred to as the target and the model) provided in PDB or mmCIF file formats. At the input, the user should also specify the MCQ threshold value in degrees and select the mode (sequenceindependent or sequencedependent). At the output, the algorithm provides the longest continuous segment (its location within both structures), its length and actual MCQ value. If more than one solution exists, all of them are shown to the user.
LCSTA applies divide and conquer approach (Fig. 1) to find the optimum solution, i.e., the longest continuous segment in the model whose MCQbased similarity to the target fragment is below the specified MCQ threshold. The computation proceeds as follows. First, the algorithm computes MCQ between entire structures. If its value does not exceed the threshold, the whole model structure is returned as the optimum solution. Otherwise, the size of the current search window is determined according to the D&C procedure described in the previous sections. Next, a set of candidate segments is constructed based on the model structure: the search window moves along the model from its 5′ to 3′end, and all windowhighlighted fragments are put into the candidate set. Thus, the current candidate set contains all segments with length equal to the current window size. After that, for every segment from the candidate set the algorithm checks if it is a feasible solution. This part of the algorithm differs between the modes. In the sequenceindependent mode, the check is done by positioning the candidate segment stepwise along the target structure, i.e., the candidate segment moves along the target structure every single residue. In the sequencedependent mode, the candidate segment is compared to the corresponding fragment of the target structure. Two sets of torsion angles, one describing the candidate and the other describing the target segment, are computed. Based on that, the MCQ value between the positioned segments is determined. If the MCQ is below the userdefined threshold, the candidate segment is a feasible solution. If the feasible solution exists in the candidate set, the algorithm tries to find the longer segment (window size is enlarged for the next iteration). Otherwise, shorter segments are considered (window size is reduced for the next iteration). The procedure iterates until the stopping condition is satisfied.
Below, we show the pseudocode of LCSTA focusing on the general steps of the algorithm running in the sequenceindependent mode. In the sequencedependent mode, the comparison of corresponding segments is done within one FOR EACH loop, instead of two nested loops.
The LCSTA algorithm in sequenceindependent mode runs with the worstcase computational complexity of O(n ^{2}log_{2} n). In the sequencedependent mode the complexity is O(nlog_{2} n), where n denotes the number of residues in the predicted model. This computational complexity is due to the complexity of D&C being O(log_{2} n), and the number of comparisons performed for every candidate segment in a single iteration.
Accessibility and usage
LCSTA algorithm has been implemented as a new functionality of MCQ4Structures [17], running as standalone Java Web start application. It is freely available for download at http://www.cs.put.poznan.pl/tzok/mcq/.
Results and discussion
In this section, we present the results of LCSTA experimental runs over selected RNA 3D structures. We analyze the algorithm’s output in the case of structure processing in sequenceindependent and sequencedependent mode, and we observe the impact of MCQ threshold value on local and global similarity assessment.
For a pair of compared RNA structures, LCATA algorithm provides the following output data: (i) LCS  a length of optimum solution (the longest continuous segment) measured as the number of residues in the segment, (ii) target structure coverage by the resulting segment, that is the ratio of segment to structure length (in percentages), (iii) actual MCQ value of the segment, and (iv) segment location within the structures (number of the first and last residue). If more than one optimum solution exists for two input structures, all of them are given to the user. The data are provided in plain text format and can be downloaded as CSV file.
In the first experiment, we have run LCSTA algorithm for two RNA 3D models submitted to RNAPuzzles challenge 18 which was compared to the target structure of exonuclease resistant RNA from Zika virus (PDB id: 5TPY) [22]. Model 1 predicted by RNAComposer [23, 24] in the server category, and model 1 submitted by Chen group [25] in the human category were selected for examination. In the paper, they are referred to as RNAComposer_1 and Chen_1, respectively. Both models were processed by LCSTA running in two modes, sequenceindependent and sequencedependent one. In each mode, we have planned to apply the following values of MCQ threshold: 5, 10, 15, 20, 25, 30, 35 and 40 degrees. The experiment runs with MCQ threshold set to 5° returned no optimum solution for any model. On the other hand, for MCQ threshold equal to 25° the algorithm output the entire 71 ntlong structure with actual MCQ value of 23.48° in the case of RNAComposer_1, and 23.81° for Chen_1 model. This meant that MCQ of the whole model was below 25°threshold in both cases. With 25° constituting the breakout point of the experiment no further increasing of the threshold was necessary.
Tables 1 and 2 present the results of RNAComposer_1 and Chen_1 models’ processing by LCSTA with respect to the target structure in sequenceindependent and sequencedependent mode, respectively. For every MCQ threshold between 10° and 25°, we can see the position of the longest continuous segment within the model (and the target) marked with a value of 1 in the character string, segment size (LCS) and its actual MCQ value. In any case, RNAComposer_1 model dominates Chen_1, as far as LCS value is concerned. In all cases except one, the single optimum solution has been found. Only for MCQ threshold set to 10°, three segments with LCS = 9 have been identified within RNAComposer_1 model in sequenceindependent mode. A closer look at the results makes us find that the most significant diversity in segment length and location within both models is observed for MCQ threshold equal to 20°. Solutions obtained for this threshold value have been visualized using PyMOL in Figs. 2 and 3. In every figure, the longest continuous segment identified in the model (colored) has been superimposed onto the target structure (grey) at the location of the corresponding target segment. As shown in the figures, different segments have been identified in the considered models.
To complete similarity analysis in the first experiment, we have decided to use the other similarity measure for evaluating LCSTA results. It can be assumed that two fragments with similar torsion display the similarity also in the space of atom coordinates. Thus, to verify this assumption, we have processed RNAComposer_1 and Chen_1 models using RNAssess [19]. This tool supports the identification of local similarity between two RNA 3D structures in the sequencedependent mode. RNAssess compares model and target structures using the idea of moving spheres and computing RMSD between RNA fragments included in the corresponding spheres (one sphere positioned in the model, the second one – in the target). The results of the comparison are provided in the graphical form (line graphs, 2D and 3D maps). To present the results of RNAComposer_1 and Chen_1 processing with reference to the target structure, we have selected 2D maps (see Fig. 4). The value of RMSD computed for sphere positioned in particular place along RNA chain is represented by colour. Dark blue areas represent fragments of high similarity. It can be observed that location of fragments identified by LCATA (Table 2) coincides with dark blue areas of RNAssess maps (Fig. 4). Thus, for our example structures, the similarity in torsion angle space is accompanied by the similarity in Euclidean space of atom coordinates. This is true for MCQ threshold not exceeding 20 degrees (above this threshold LCSTA returns the whole structure as a result). Our analysis finished with computing RMSD for identified fragments of RNAComposer_1 and Chen_1 models. In the case of fragments found within RNAComposer_1 model in sequencedependent mode, their RMSD values were equal to 0.702 Å for MCQ threshold = 10° and 0.959 Å for MCQ threshold = 15°, while the global RMSD of RNAComposer_1 equals 24.48 Å. For Chen_1 the RMSD of the LCSTAprovided fragment was 2.011 Å for MCQ threshold = 15° (no feasible solution was found in this model for smaller threshold), while global RMSD of the model was only 3.144 Å.
In the second experiment, we have investigated multiple models predicted in RNAPuzzles challenge 18 and challenge 19. Altogether, 53 models were submitted in challenge 18, and 54 in challenge 19. From these sets, we have selected one model per each participant (namely, model 1) and we compared it to the target structure, i.e., exonuclease resistant RNA from Zika virus (PDB id: 5TPY) [22] in challenge 18, and twister sister (TS) ribozyme (PDB id: 5T5A) [26] in challenge 19. Experimental results concerning the selected models are presented in Tables 3–4 and Fig. 5 for challenge 18, and Tables 5–6 and Fig. 6 for challenge 19. In the tables, one can see LCS value, i.e., the length of the resulting segment found within each model for different MCQ thresholds, and actual MCQ of this segment. The best solution (LCS of the longest continuous segment found among all models) in human and server category is printed in bold. If more models include a segment with the biggest LCS, the one with the smallest actual MCQ is considered the winner. The figures complement tabular data by showing, for each model and MCQ threshold, the percentage of target structure covered by the optimum solution.
Eleven participants submitted their predictions for challenge 18. Thus, 11 RNA 3D models were selected for the analysis with LCSTA (Tables 3–4, Fig. 5). This number includes six human predictions (Fig. 5, solid lines) and five serverpredicted ones (Fig. 5, dotted lines). In the human category, the Das_1 model has appeared to win for all MCQ thresholds. Among server predictions, RW3D_1 model, generated by Das server (unpublished), has been the best. This is true for both modes of LCSTA. In the case of sequenceindependent analysis and MCQ threshold set to 10°, RW3D_1 dominates Das_1 (Table 3). However, this relationship is not the same in the sequencedependent mode (Table 4). A comparison of the results for Das_1 and RW3D_1 with MCQ threshold = 10° in both modes shows that there is one, accurately predicted 12 ntlong segment in Das_1 which is identified by LCSTA in both modes. However, for RW3D_1 the longest segment below 10° threshold (with LCS = 18) corresponds very well to the other part of the target structure. This influences the overall quality of RW3D_1 prediction and makes it globally a little worse than that of Das_1. Nevertheless, the accuracy and quality of both models are very high. MCQ computed for each of these models in total, does not exceed 20 degrees. Thus, starting from threshold set to 20°, the optimum solution in both cases covers 100% of the structure (Fig. 5).
Challenge 19 has also attracted 11 participants, including six in the human category (Fig. 6, solid lines) and five in the group of servers (Fig. 6, dotted lines). Thus, 11 predicted models were processed with LCSTA (Tables 5–6 and Fig. 6). This experiment’s results show a greater diversity in the relationship between the models than in the case of challenge 18. In the human category, the situation is similar for both LCSTA modes. Das_1 proves the best for MCQ threshold = 5°, however, when the threshold value increases by accepting values 10, 15, 20, 25 and 30 degrees, RNAComposerH_1 dominates all other models as far as LCS and actual MCQ are concerned. In the server category, the longest segments have been found in RNAComposer_1 [23, 24], RW3D_1 and simRNA_1 [27] models, depending on the MCQ threshold and LCSTA mode. This shows that although globally the considered models seem quite similar, the differences on a local level can be significant. Thus, local analysis of the model can indicate the direction for further development and improvement of the prediction approach. From these results, we can also see that global ranking of models based on LCSTA value highly depends on the MCQ threshold.
Molecules selected for the above analysis are mediumsize RNA structures. Their processing by both alignmentbased and alignmentfree algorithms is possible, although it is more timeconsuming in the case of the first group of methods. The difference between computing times by both groups increases significantly with the increase in molecule size. The length of RNA chain can also influence the quality of results generated by alignmentbased algorithms which provide a suboptimum solution. However, this is not the case of alignmentfree approach, including LCSTA. To show that our algorithm also works for longer RNAs, we have applied it to process RNA 3D models submitted to RNAPuzzles challenge 7 and challenge 8. In the first case, we have chosen one model per each participant (namely, model 1) and we compared it to the target structure of Varkud satellite ribozyme (PDB id: 4R4V) [28]. Similarly, the first model submitted by each participant in challenge 8 was selected and analyzed with reference to the target structure of SAM I/IVriboswitch (PDB id: 4 L81) [29]. Altogether, we have processed seven models from challenge 7 and 6 models from challenge 8. For all cases LCSTA algorithm provided the results, finding similar fragments positioned along the entire structure. These experiments’ results are presented in Additional file 1.
Conclusions
In the paper, we have addressed the problem of identifying similar fragments within RNA 3D structures and tertiary structure similarity assessment on the local level. We have introduced LCSTA method that finds fragments displaying high similarity in torsion angle space. The method has been implemented in Java and added to MCQ4Structures standalone application, freely available at http://www.cs.put.poznan.pl/tzok/mcq/. We have shown an example application of the method in processing and analysis of RNA 3D structures predicted within RNAPuzzles challenge 18 and 19.
Our algorithm is computationally nondemanding and userfriendly. At the input, it requires PDB or mmCIF files with RNA 3D structures and MCQ threshold value. The results are easy to compare and interpret. Thus, we hope it will be of wide interest in the RNA community.
LCSTA has the potential to open new avenues in the RNA structural bioinformatics, particularly in the field of evaluating predicted RNA 3D models, local similarity assessment, as well as in structure motif/module identification and examination. Our future works will follow in this direction. We are going to perform largescale tests of the method to define reliable MCQ thresholds. We plan to analyze the relationship between LCSTA results and the secondary structure motifs of the analyzed RNA structures. This kind of analysis can indicate RNA motifs or fragments which are particularly hard (or easy) to predict. Finally, we plan to supplement the algorithm with the graphical output.
Abbreviations
 CASP:

Critical Assessment of protein Structure Prediction
 CSV:

CommaSeparated Values
 D&C:

Divide and conquer
 GDT:

Global Distance Test
 INF:

Interaction Network Fidelity
 LCS:

Longest Continuous Segments
 LCSTA:

Longest Continuous Segments in Torsion Angle space
 LGA:

LocalGlobal Alignment
 MCQ:

Mean of Circular Quantities
 RMSD:

Root Mean Square Deviation
References
 1.
Pruitt KD, Tatusova T, Brown GR, Maglott DRNCBI. Reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40:D130–5.
 2.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–42.
 3.
Moult J, Pedersen JT, Judson R, Fidelis KA. Largescale experiment to assess protein structure prediction methods. Proteins. 1995;23:ii–v.
 4.
Cruz JA, Blanchet MF, Boniecki M, Bujnicki JM, Chen SJ, Cao S, et al. RNApuzzles: a CASPlike evaluation of RNA threedimensional structure prediction. RNA. 2012;18:610–25.
 5.
Miao Z, Adamiak RW, Antczak M, Batey RT, Becka A, Biesiada M, et al. RNApuzzles round III: 3D RNA structure prediction of five riboswitches and one ribozyme. RNA. 2017;23:655–72.
 6.
Miao Z, Westhof E. RNA structure: advances and assessment of 3D structure prediction. Annu Rev Biophys. 2017;46:483503.
 7.
Blazewicz J, Szachniuk M, Wojtowicz ARNA. Tertiary structure determination: NOE pathway construction by tabu search. Bioinformatics. 2005;21:2356–61.
 8.
Lukasiak P, Antczak M, Ratajczak T, Bujnicki JM, Szachniuk M, Popenda M, Adamiak RW, Blazewicz J. RNAlyzer  novel approach for quality analysis of RNA structural models. Nucleic Acids Res. 2013;41:5978–90.
 9.
Szostak N, Royo F, Rybarczyk A, Szachniuk M, Blazewicz J, del Sol A, FalconPerez JM. Sorting signal targeting mRNA into hepatic extracellular vesicles. RNA Biol. 2014;11:836–44.
 10.
Zok T, Antczak M, Riedel M, Nebel D, Villmann T, Lukasiak P, Blazewicz J, Szachniuk M. Building the library of RNA 3D nucleotide conformations using clustering approach. Int J Appl Math Comp. 2015;25:689–700.
 11.
Rybarczyk A, Szostak N, Antczak M, Zok T, Popenda M, Adamiak RW, Blazewicz J, Szachniuk M. New in silico approach to assessing RNA secondary structures with noncanonical base pairs. BMC Bioinformatics. 2015;16:276.
 12.
Gudanis D, Popenda L, Szpotkowski K, Kierzek R, Gdaniec Z. Structural characterization of a dimer of RNA duplexes composed of 8bromoguanosine modified CGG trinucleotide repeats: a novel architecture of RNA quadruplexes. Nucleic Acids Res. 2016;44:2409–16.
 13.
Wiedemann J, Milostan M. StructAnalyzer  a tool for sequence versus structure similarity analysis. Acta Biochim Pol. 2016;63:753–7.
 14.
Miskiewicz J, Tomczyk K, Mickiewicz A, Sarzynska J, Szachniuk M. Bioinformatics study of structural patterns in plant microRNA precursors. Biomed Res Int. 2017; doi: 10.1155/2017/6783010.
 15.
Parisien M, Cruz JA, Westhof E, Major F. New metrics for comparing and assessing discrepancies between RNA 3D structures and models. RNA. 2009;15:1875–85.
 16.
Chen VB, Arendall WB 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: allatom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010;66:12–21.
 17.
Zok T, Popenda M, Szachniuk M. MCQ4Structures to compute similarity of molecule structures. Cent Eur J Oper Res. 2014;22:457–74.
 18.
Wang J, Zhao Y, Zhu C, Xiao Y. 3dRNAscore: a distance and torsion angle dependent evaluation function of 3D RNA structures. Nucleic Acids Res. 2015;43:e63.
 19.
Lukasiak P, Antczak M, Ratajczak T, Szachniuk M, Popenda M, Adamiak RW, Blazewicz J. RNAssess  a webserver for quality assessment of RNA 3D structures. Nucleic Acids Res. 2015;43:W502–6.
 20.
Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31:3370–4.
 21.
Richardson JS, Schneider B, Murray LW, Kapral GJ, Immormino RM, Headd JJ, et al. RNA backbone: consensus allangle conformers and modular string nomenclature (an RNA ontology consortium contribution). RNA. 2008;14:465–81.
 22.
Akiyama BM, Laurence HM, Massey AR, Costantino DA, Xie X, Yang Y, Shi PY, Nix JC, Beckham JD, Kieft JS. Zika virus produces noncoding RNAs using a multipseudoknot structure that confounds a cellular exonuclease. Science. 2016;354:1148–52.
 23.
Popenda M, Szachniuk M, Antczak M, Purzycka KJ, Lukasiak P, Bartol N, et al. Automated 3D structure composition for large RNAs. Nucleic Acids Res. 2012;e112:40.
 24.
Antczak M, Popenda M, Zok T, Sarzynska J, Ratajczak T, Tomczyk K, Adamiak RW, Szachniuk M. New functionality of RNAComposer: an application to shape the axis of miR160 precursor structure. Acta Biochim Pol. 2016;63:737–44.
 25.
Xu X, Zhao P, Chen SJ. Vfold: a webserver for RNA structure and folding thermodynamics prediction. PLoS One. 2014;9:e107504.
 26.
Liu Y, Wilson TJ, Lilley DMJ. The structure of a nucleolytic ribozyme that employs a catalytic metal ion. Nat Chem Biol. 2017;13:508–13.
 27.
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JM. SimRNA: a coarsegrained method for RNA folding simulations and 3D structure prediction. Nucleic Acids Res. 2016;44:e63.
 28.
Suslov NB, DasGupta S, Huang H, Fuller JR, Lilley DMJ, Rice PA, Piccirilli JA. Crystal structure of the Varkud satellite ribozyme. Nat Chem Biol. 2015;11:840–6.
 29.
Trausch JJ, Xu Z, Edwards AL, Reyes FE, Ross PE, Knight R, Batey RT. Structural basis for diversity in the SAM clan of riboswitches. PNAS. 2014;111:6624–9.
Acknowledgements
This research was carried in the European Centre for Bioinformatics and Genomics, Poznan University of Technology (Poznan, Poland) and supported by the Leading National Research Centre Program (KNOW) granted by the Polish Ministry of Science and Higher Education.
Funding
This work has been supported by the Polish Ministry of Science and Higher Education and the Institute of Bioorganic Chemistry, PAS within intramural financing program. The authors acknowledge partial support by the National Science Center, Poland [2016/23/B/ST6/03931, 2016/23/N/ST6/03779].
Availability of data and materials
All predicted RNA 3D models used in our computational experiments are available at RNAPuzzles website: http://ahsoka.ustrasbg.fr/rnapuzzlesv2/results/. The target structures can also be accessed via this webpage.
Author information
Affiliations
Contributions
JW, TZ, and MS conceived the study. MM and MS prepared a specification of the project. JW and MM designed the LCSTA algorithm. JW made an implementation, supported by TZ who authored the basic method for MCQ computation. JW carried computational tests further analyzed with the aid of MM and MS. MS coordinated the project. JW, MM, and MS drafted the manuscript, JW and MM prepared the figures. All authors were involved in discussions, as well as reading and approving the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1: Table S1.
LCSTA results for predicted models of 4R4V structure in the sequenceindependent mode. Table S2. LCSTA results for predicted models of 4R4V structure in the sequencedependent mode. Table S3. LCSTA results for predicted models of 4 L81 structure in the sequenceindependent mode. Table S4. LCSTA results for predicted models of 4 L81 structure in the sequencedependent mode. Figure S1. LCSTA results for predicted models of 4R4V in (a) sequenceindependent and (b) sequencedependent mode. Figure S2. LCSTA results for predicted models of 4 L81 in (a) sequenceindependent and (b) sequencedependent mode. Table S5. Longest segments found within example models of 4 L81 structure in the sequencedependent mode. Figure S3. Results of (a) Bujnicki_1, (b) Das_1, and (c) Dokholyan_1 model comparison to the target structure (4 L81) by RNAssess. (PDF 465 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Wiedemann, J., Zok, T., Milostan, M. et al. LCSTA to identify similar fragments in RNA 3D structures. BMC Bioinformatics 18, 456 (2017). https://doi.org/10.1186/s1285901718676
Received:
Accepted:
Published:
Keywords
 RNA 3D structure
 Structure comparison
 Local similarity
 Torsion angles