Skip to main content

Predicting folding pathways between RNA conformational structures guided by RNA stacks

Abstract

Background

Accurately predicting low energy barrier folding pathways between conformational secondary structures of an RNA molecule can provide valuable information for understanding its catalytic and regulatory functions. Most existing heuristic algorithms guide the construction of folding pathways by free energies of intermediate structures in the next move during the folding. However due to the size and ruggedness of RNA energy landscape, energy-guided search can become trapped in local optima.

Results

In this paper, we propose an algorithm that guides the construction of folding pathways through the formation and destruction of RNA stacks. Guiding the construction of folding pathways by coarse grained movements of RNA stacks can help reduce the search space and make it easier to jump out of local optima. RNAEAPath is able to find lower energy barrier folding pathways between secondary structures of conformational switches and outperforms the existing heuristic algorithms in most test cases.

Conclusions

RNAEAPath provides an alternate approach for predicting low-barrier folding pathways between RNA conformational secondary structures. The source code of RNAEAPath and the test data sets are available at http://genome.ucf.edu/RNAEAPath.

Introduction

RNA molecules play critical roles in the cell. The secondary structures of RNA molecules have been extensively studied because they provide insights into the functionality of RNAs. Native (functional) RNA secondary structures are usually thermodynamically stable and many of them are also the minimum free energy (MFE) structures. Nevertheless, at times, RNA molecules may fold into alternative secondary structures in order to participate in certain biological processes. For example, the SV-11 RNA folds into a metastable conformational structure and acts as a template for its own replication using Qβ replicase [1, 2]. Further, RNA conformational switches can transform between alternative secondary structures dynamically in response to various environmental stimuli (such as heat shock and cold shock) [3–6], and carry out RNA-mediated biological activities, such as switching on or off downstream gene translation activities [7–9], regulating RNA splicing via multiple-state splicesomal conformations [10], and regulating the life cycles of virus [11].

The conformational transformations between alternative structures involve the folding of an RNA molecule into a series of sequential adjacent intermediate structures [12]. RNA folding pathways provide valuable information for understanding the catalytic and regulatory functions of RNAs (such as hok/sok of plasmid R1 [13]). RNA folding pathways may also impact subsequence biological events (such as formation of tertiary structures). Furthermore, prediction algorithms can help the design of RNA switches by providing prescribed structural alternatives.

In this paper, we present a new approach, RNAEAPath, for computing near optimal direct or indirect folding pathways between two secondary structures of an RNA molecule. We guide the search for low energy barrier folding pathways by integrating a variety of strategies for simulating the formation and destruction of RNA stacks in a flexible framework. Benchmark tests on conformational switches show that RNAEAPath produces lower energy barrier folding pathways and outperforms the existing heuristic approaches in most test cases.

Preliminary

Consider an RNA sequence as a string x = x1 ... x n of n letters over alphabet ∑ = {A, U, G, C}. A pair of complementary nucleotides x i and x j , can form hydrogen bonds and interact with each other, denoted by x i · x j . In this paper, we only consider the canonical base pairings (A · U and G · C) and the wobble base pairing (G · U). A secondary structure S of the RNA sequence x is a set of disjoint paired bases (i, j), where 1 ≤ i <j ≤ n. S may be represented by a length n string of dots and brackets, where dots represent unpaired bases and brackets represent paired bases. An RNA structure can comprise of stacks which are lists of consecutive base pairs ({(i, j), (i + 1, j - 1), . . . , (i + w, j - w)} such that x i · x j , . . . , x i+w · x j-w ), and unstacking base pairs. A secondary structure is pseudoknotted if it contains two base pairs (i, j) and (i', j') with i <i' <j <j'. In this paper, we only consider pseudoknot-free structures. A base pair is compatible with a secondary structure if the base pair can be added to the structure without leading to a pseudoknotted structure or pairing a base with more than one partner. A stack is compatible with S if each base pair in the stack is either in S or is compatible with S.

The free energy of a secondary structure S is denoted by E(S). The set of neighboring structures of S consists of all structures that differ from S by an addition or deletion of exactly one base pair. For two secondary structures A and B, the distance between A and B is the number of base pairs in A not in B plus the number of base pairs in B not in A (i.e. |(A - B) ∪ (B - A)|). A folding pathway from A to B is a sequence of intermediate structures A = S0, . . . , S m = B such that for all 0 ≤ i <m, intermediate structure S i +1 is a neighboring structure of S i . A folding pathway is direct if the intermediate structures contain only base pairs in A and B (i.e. S i ⊆ A ∪ B for 1 ≤ i <m) and otherwise is indirect. The saddle point of a pathway is an intermediate structure with the highest energy, and the energy barrier of a pathway is the energy difference between its saddle point and the initial structure. Since the folding of RNA structures is thermodynamically-driven and tends to avoid high-energy intermediate structures, current computational methods aim to find RNA folding pathways with the lowest energy barriers.

Previous studies

A lot of research has been done on predicting low energy barrier folding pathways. Morgan and Higgs proposed a greedy algorithm that employs the Nussinov model [14, 15] for computing direct folding pathways with minimum energy barrier. They also described a heuristic that samples low energy structures from the partition function and glues them together by direct pathways [16]. The Nussinov model is simple and easy to implement, in which base stacking and loop entropies have no energetic contributions. Based on this model, Thachuk et al. [17] developed an exact algorithm, PathwayHunter, which exploits elegant properties of bipartite graphs for finding the globally optimal direct pathways. However, the Nussinov model is not as accurate as the Turner energy model [18, 19] for approximating RNA thermodynamics. An exact solution based on the Turner energy model is also available. BARRIERS [20, 21], exactly computes the globally optimal folding pathways between any two locally optimal secondary structures. BARRIERS reads an energy sorted list of RNA secondary structural conformations produced by RNAsubopt [22] and is able to compute both direct and indirect low energy barrier pathways.

Nevertheless, the above exact solutions are all exponential in time, because the problem itself is NP-hard [23]. Many heuristic algorithms have also been proposed following the seminal work of Morgan and Higgs. Flamm et al. [24] used breadth-first search in their heuristics (in Vienna RNA Package [25]) and kept the best k candidates at each step to bound the search. Voss et al. [26] devised a straightforward strategy for greedily searching direct pathways. Geis et al. [27] described a greedy heuristic to explore the search space of direct pathways and they also integrated look ahead techniques to diminish the search space. Recently, Dotu et al. [28] developed RNATabuPath, a fast heuristic that employs a TABU semi-greedy search to construct near optimal (both direct and indirect) folding trajectories. In addition, other heuristic approaches, by splitting the pathways into shorter pathways and solving each individually, have also been proposed [29, 30]. There are also other formula presented for the prediction of RNA folding kinetics (see Flamm and Hofacker's review [31] for a systematic discussion).

Many of the existing heuristic algorithms start from an initial structure A, and, at each single step i, walk from the intermediate structure S i to one of its neighbors S i +1 until finally the end structure B is reached. The definition of neighborhood relationships as well as the fitness functions can be different. The fitness function of S i is usually defined on the free energy of S i , or the distance from S i to B, or a function of both. In general, greedy algorithms select the 'best' neighbor structure that has the best fitness. In contrast, semi-greedy algorithms may select any one from the top k structures for randomization. RNATabuPath, which is more sophisticated and outperforms other methods [28], keeps a tabu list for saving recently taken moves such that they can not be applied in certain steps until being removed from the tabu list. In general, during the construction of a folding pathway, these heuristic algorithms select the next intermediate structures from a set of neighboring structures that have the top lowest free energy or have the top shortest distance to B (or the combination of both).

Motivations

However, using energy to guide the construction of folding pathways in the above-mentioned heuristic algorithms has its downsides. The RNA energy landscapes can be extremely large and rugged [11, 32] and the ruggedness of RNA energy landscape may cause the energy-guided search to become trapped in a local optimum. Similar to using structural rearrangements for modeling RNA folding kinetics [33], we want to construct candidate folding pathways in a manner that make it easier to jump out of local optima. It has been revealed that stacking base pairs contribute significantly to the stabilization of RNA secondary structures [34, 35]. The dominant RNA folding pathways involve the formation and destruction of the stacks, and the cooperative formation of a stack along with the partial melting of an incompatible stack [36]. In this paper, we propose to guide the construction of pathways by the formation and destruction of stacks (not by free energy or by distance to the end structure). We still select the constructed folding pathways according to their energy barriers. Although the construction of folding pathways is not driven by thermodynamics, the selection of folding pathways is based on energy barriers. Guiding the construction of folding pathways by coarse grained movements of RNA stacks may help reduce the search space and makes it easier to jump out of local optima. In the rest of this paper, the Methods section describes the representation of folding pathways and the detailed strategies employed by RNAEAPath. The Results and Discussion section presents benchmarking results of RNAEAPath against existing methods followed by concluding remarks in the Conclusions section.

Methods

Representation of RNA folding pathways

Given an initial structure A and an end structure B, we use a sequence of actions successively applied to A, rather than a sequence of intermediate structures, to represent a folding pathway from A to B. Representing a pathway by an action chain can avoid cyclic additions and deletions of base pairs and make it easy to simulate the formation and deletion of RNA stacks. A similar representation has also been employed in the previous work of Thachuk et al. [17].

We use two types of actions, add i,j and del i,j in the representation of RNA folding pathways. For an intermediate secondary structure S of an RNA sequence x, the action add i,j denotes the 'add'ition of base pair (i, j) to S (i.e. add i,j (S) = S ∪ {(i, j)}) and del i,j denotes the 'del'etion of base pair (i, j) from S (i.e. del i,j (S) = S - {(i, j)}). An action is direct if it concerns a base pair in A ∪ B and indirect otherwise. The simplest direct pathways from A to B concern sequential deletions of all base pairs in A - B followed by additions of all base pairs in B - A.

Consider an example sequence x = GGGGAAAACCCCUUUU with initial and final structures shown in Figure 1. This simple pathway is obtained by first deleting all GC pairs from A until the RNA is single stranded, and then adding all AU pairs until B is obtained. Note that each intermediate structure S i differs from both its successor and predecessor by exactly one base pair. The actions in the example are all direct actions and the energy barrier is 5.50 - (-6.60) = 12.10 kcal/mol.

Figure 1
figure 1

A simple folding pathway that converts an RNA sequence from structure A to B. A simple folding pathway that converts an RNA sequence from structure A to B. The leftmost column shows a simple direct pathway from A to B, the center column shows the free energies (in kcal/mol) of the intermediate structures, and the rightmost column presents the action chain a1, . . . , a8 for this pathway.

An addition action add i,j (S) conflicts with S if either x i or x j is already paired in S, and it clashes with S if there exists a base pair { ( x i ′ , x j ′ ) ∈ S | i < i ′ < j < j ′   or   i ′ < i < j ′ < j } . A deletion action del i,j (S) conflicts with S if (x i , x j ) ∉ S. An addition or deletion action is valid and can be applied to S properly if it neither conflicts with nor clashes with S.

A pathway from A to B can be represented by an action chain, which is a sequence of valid actions a1, . . . , a m such that S0 = A, S t = a t (S t -1) for 1 ≤ t ≤ m and S m = B. Note that an action chain for A to B implies a sequence of valid actions that can be successively applied to A without introducing conflicts or clashes and produce B. We use the term "action chain" when the sequence is certified to be valid, and the term "sequence of actions" if its validity is not guaranteed.

This representation of a pathway p from A to B has the following important properties. First, every folding pathway can be represented by a unique action chain and every action chain represents a unique folding pathway (note that it is not necessarily true for a sequence of actions). Second, rearranging the order of actions in p results in a new sequence of actions which represents a new folding pathway from A to B when it is valid. (It is an action chain that can be successively applied to A properly and obtain B.) Third, introducing a pair of complementary actions (e.g. add i,j and del i,j ) to p results in a new sequence of actions which also represents a new folding pathway from A to B if it is valid.

In RNAEAPath, folding pathways are represented in the form of action chains, instead of a sequence of intermediate structures. This representation makes the life cycle of a folding pathway transparent to the algorithm and also makes it easier for us to simulate the cooperative formation and destruction of RNA stacks by re-arranging the order of actions or introducing multiple pairs of complementary actions.

Predicting low energy barrier folding pathways

Given an RNA sequence x, an initial structure A and a final structure B, RNAEAPath computes a near optimal low energy barrier folding pathway from A to B in an evolutionary algorithm framework [37]. Figure 2 elucidates the overall paradigm for RNAEAPath. In this algorithm, the population of each generation is comprised of folding pathways ordered by their fitness. The functions M y ( p ) are mutation strategies, each of which takes in a pathway p and produces a set of offspring pathways. These mutation strategies are central to the effectiveness of RNAEAPath and will be discussed in the Mutation strategies subsection. ℓ1, ℓ2, ℓ3, MAX and γ are positive integer control parameters.

Figure 2
figure 2

Overview of RNAEAPath. Overview of RNAEAPath. In this procedure, the input is an RNA sequence x with the start and end structures A and B, and the output is the best folding pathway in k iterations (OPT k ). For notations, k is the number of iterations, ℙ0 is the initial population, and ℙk is the folding pathway population of the kth iteration. T contains all the offspring folding pathways produced by applying mutation strategies M 1 ,…, M Y to each pathway p in the (k - 1)st population. O k is an ordered list of offspring folding pathways of the kth generation, from which, the population for the next iteration ( ℙ k +1) is selected. Folding pathways in ℙ k and O k are sorted based on their fitness and ℙ k [1 . . . ℓ] are the top ℓ best folding pathways in ℙ k .

The initial population of RNAEAPath, ℙ0, is filled with a set of simple pathways. Then, the algorithm goes through several iterations. ℙ k -1 is the population of the k - 1st iteration. In the kth iteration, the algorithm produces O k (an ordered list of pathways) and ℙ k (the population of the kth iteration) from ℙ k -1. O k stores the best ℓ1 pathways in ℙ k -1 and the best ℓ2 pathways produced by each p ∈ ℙ k -1. More specifically, each pathway p ∈ ℙ k -1 produces t y k offsprings through every mutation strategy M y ( 1 ≤ y ≤ Y ) . The resulting offsprings produced by p are stored in a temporary list T, and the top ℓ2 pathways are added to O k . Finally, the best solution of the kth iteration, termed as OPT k , is the best pathway in O k . And, ℙ k (the population of the kth iteration) is composed of the best ℓ3 pathways of O k and will be used in the next iteration to produce ℙ k +1. This helps keep the diversity of the population large, since ℙ k contains at most ℓ2 offsprings produced by each p ∈ ℙ k -1, no matter how many high-qualified offsprings are produced by each pathway. The algorithm terminates when a stopping condition is met, and it returns the best solution of the last iteration. Since O k retains the best ℓ1 pathways from ℙ k -1 in each iteration, the best one ever encountered by the algorithm is retained in lists O k and ℙ k , and stored in OPT k . So, OPT k has no worse fitness when compared to OPT k -1, and RNAEAPath always returns the best action chain it ever discovered.

In the remaining of this section, we discuss details regarding fitness evaluation, initialization of the population, stopping conditions and mutation strategies of RNAEAPath.

Fitness of action chains

The order of folding pathways (valid action chains) is primarily determined by their energy barriers. In case of a tie, the order is determined by the average of energy differences between the initial structure A and intermediate structures. Note that lower energies are preferred in the previous two methods of ordering. If a tie still exists, then shorter action chains are preferred. Action chains are ordered arbitrarily if their relative order can not be determined based on these three criteria.

The initial population of folding pathways

The initial population, â„™0, contains 4 simple pathways from A to B formed by first deleting all base pairs in A - B and then adding those in B - A, similar to the pathway shown in Figure 1. Although we can also arrange base pair deletions and additions in an arbitrary order, we tailor them in a manner that simulates successive degradation and formation of RNA stacks. This is because random deletions and additions of base pairs tend to form additional unpaired loop regions that introduce entropic penalties (see Figure 3 for an illustration). We can degrade or form each stack either from the outmost base pair to the innermost base pair or vice verse. Usually, it yields a lower energy barrier if we degrade a stack from the outmost base pair to the innermost base pair and form a stack from the innermost base pair to the outmost base pair. However, for the sake of simplicity and generosity, we construct 4 simple pathways in â„™0, which degrade all the stacks from the same direction and form all the stacks from the same direction. These simple pathways constitute a diversified and unbiased initial population for the algorithm to start from.

Figure 3
figure 3

Two different folding pathways that form an identical stack. Two different folding pathways that form an identical stack. Left: The stack is formed successively. Right: The stack is constructed by random formation of base pairs. The right pathway yields a higher energy barrier because the randomly introduced base pairs form unpaired loop regions that result in additional entropic penalties.

The number of offsprings produced by each mutation strategy

In each generation, the expected total number of offsprings produced by each individual is a constant positive integer L. The number of offsprings that each individual produces using mutation strategy M y , ( 1 ≤ y ≤ Y ) , in the kth generation, is denoted by ℓ M y k . In the initial generation, ℓ M y 0 is equivalent to L/Y for all the mutation strategies. In the kth generation, ℓ M y k is determined adaptively according to the quality of the offsprings produced using M y in the k - 1st iteration. Let b M y k - 1 be number of offsprings that are both produced through M y and selected to construct ℙk-1, the population of the k - 1st generation. Then, ℓ M y k in the kth generation is computed as follows.

ℓ M y k = max L min ( b y k - 1 / ℓ M y k - 1 ) ∑ y ′ = 1 Y ( b y ′ k - 1 / ℓ M y ′ k - 1 ) L

Mutation strategies that have produced more high quality offsprings in the (k - 1)st iteration are allowed to generate more offsprings in the kth generation. In contrast, mutation strategies that perform poorly in the k - 1st generation, are only allowed to generate a small number (L min , with default value 3) of offsprings. Note that, the sum of ℓ M y k for 1 ≤ y ≤ Y may be greater than ℓ.

Stopping conditions

The algorithm terminates when (1) the current best solution achieves the lowest possible value |E(B) - E(A)|, or (2) when no improvement has been found over γ consecutive iterations (a plateau), or (3) when MAX number of iterations have passed and successive iterations do not discover better results. Note that the algorithm may simulate further than MAX iterations if improvements are made in the very last iteration and it stops immediately if no improvement is made between successive iterations. More specifically, the algorithm stops when any of the following conditions is satisfied:

  1. 1.

    the energy barrier of OPT k is equivalent to |E(B) - E(A)|.

  2. 2.

    k >γ and the fitness of OPT k is equivalent to that of OPT k - γ .

  3. 3.

    k ≥ MAX and the fitness of OPT k is equivalent to that of OPT k -1.

Mutation strategies

In RNAEAPath, the mutation strategies employed to evolve folding pathways can be categorized into three types: (1) rearranging the order of actions, (2) introducing indirect pathways and (3) formation of a single stack or cooperative conversion of a pair of incompatible stacks. In this section, let M 1 ,…, M Y denote the mutation strategies and let p = a1, . . . , a m denote the input pathway A = S0, . . . , S m = B. For each mutation strategy M y ( p ) , we describe the process for generating one new pathway q using each mutation strategy when given p.

Type 1: reordering of actions

As described in the subsection of representation of RNA folding pathways, shuffling the order of actions of the input pathway p can result in a new pathway from A to B. In RNAEAPath, two mutation strategies of this type are employed. M 1 changes the position of an arbitrary action, and M 2 swaps the positions of two arbitrary actions.

M1: Let M 1 t 1 , t 2 ( p ) denote the sequence of actions obtained by first removing an action a t 1 (1 ≤ t1 ≤ m) from p and then inserting it after a t 2 , for all t2 ∈ {0,..., t1 - 1, t1 + 1, . . . , m}. Note that the resulting sequence of actions may not necessarily be a valid action chain. For instance, in Figure 1, M 1 1 , 4 ( p ) = a 2 , a 3 , a 4 , a 1 , a 5 ,…, a 8 and M 1 3 , 2 ( p ) =p are valid action chains, while M 1 8 , 1 ( p ) = a 1 , a 8 , a 2 ,…, a 7 is not.

The procedure for computing M 1 t 1 , t 2 ( p ) is described in the following.

  1. 1.

    Choose t1 uniformly at random from the interval [1, m].

  2. 2.

    Compute the interval [l, u], (t1 <l <u <m), where l is the minimum and u is the maximum such that for all t2 ∈ [l, u] and t2 ≠ t1, M 1 t 1 , t 2 ( p ) is a valid action chain.

  3. 3.

    Choose t2 from the interval [l, u].

    • 3.1. If a t 1 is an addition operation, for all l ≤ t <t' ≤ u and t ≠ t' ≠ t1, the probability of choosing t is greater than that of t'.

    • 3.2. Otherwise (a deletion operation), for all l ≤ t ≤ t' ≤ u and t ≠ t' ≠ t1, the probability of choosing t is less than that of t'

We do not choose t2 (t2 ≠ t1) uniformly at random in [l, u], instead, we tend to place addition operations in the front part of p, and deletion operations in the later part of p. This is because adding base pairs early and deleting them late during the folding may help stabilize the intermediate secondary structures. (Please see Additional file 1 for the detailed description of the discrete probability.)

M 2 : Let M 2 t 1 , t 2 ( p ) denote the sequence of actions obtained by swapping a t 1 with a t 2 . If the resulting sequence of actions is a valid action chain, let it be q; otherwise, restart the process. For example, in Figure 1, M 2 1 , 8 ( p ) is not a valid action chain, while M 2 2 , 4 ( p ) = a 1 , a 4 , a 3 , a 2 , a 5 ,…, a 8 is. t1 and t2 are chosen uniformly at random from {(t1, t2): 1 ≤ t1 <t2 ≤ m}.

Mutation strategies of type 1 provide methods for shuffling the order of actions of an input pathway and generating slightly different new pathways. However, these strategies are not capable of introducing additional (indirect) base pairs, and the offsprings of a direct pathway produced through type 1 strategies are also direct. In the following, we will describe mutation strategies that are able to construct indirect pathways from a direct pathway.

Type 2: introducing indirect pathways by adding a pair of complementary actions

Morgan and Higgs [16] pointed out that the optimal folding paths are generally indirect pathways. This idea was further described by Dotu et al. [28]. The temporary formation of base pairs, especially those base pairs that do not belong to A ∪ B, may lower the energies of intermediate structures and thus render better folding pathways. Similarly, temporary deletion and reformation of a base pair also can create an indirect pathway.

M 3 : Let M 3 t 1 , t 2 , + ( i , j ) ( p ) denote the sequence of actions obtained by introducing an addition action add i,j after a t 1 and its complementary action del i,j after a t 2 . Let M 3 t 1 , t 2 - ( i , j ) ( p ) denote the sequence of actions obtained by introducing a deletion action del i,j after a t 1 and its complementary action add i,j after a t 2 . For example, in Figure 1, M 3 1 , 7 , + ( 1 , 16 ) ( p ) = a 1 ,add1,16, a2, ..., a7, del1,16, a8. The procedures for computing M 3 t 1 , t 2 , + ( i , j ) ( p ) and M 3 t 1 , t 2 , - ( i , j ) ( p ) are similar to each other. In the following, we only describe the procedure for computing M 3 t 1 , t 2 , + ( i , j ) ( p ) .

  1. 1.

    Choose t1 uniformly at random from the interval [1, m], and obtain the associated intermediate structure S t 1 .

  2. 2.

    Find a set of base pairs that neither conflict with nor clash with S t 1 and choose a base pair (i, j) uniformly at random from the set.

  3. 3.

    Compute the interval [l, u], (t1 <l <u <m), where l is the minimum and u is the maximum such that for all values t2 ∈ [l, u] the resulting sequence of actions of M 3 t 1 , t 2 , + ( i , j ) ( p ) is a valid action chain.

  4. 4.

    Choose t2 from the interval [l, u] with the probability of choosing t greater than that of t' for all t >t'. (This is because (i, j) is not likely to be deleted soon after its formation.)

Mutation strategy M 3 is capable of producing an indirect pathway from a direct pathway. In addition, a proper combination of multiple applications of M 3 may result in a pathway which simulates the successive formation and deletion of a temporary stack during the folding. Take the pathway p in Figure 1 as an example, we can construct a pathway q that forms a temporary stack consisting of all the GU base pairs via a multiple application of M 3 ,q= M 3 5 , 7 , + ( 3 , 14 ) ( M 3 3 , 7 , + ( 2 , 15 ) ( M 3 1 , 7 , + ( 1 , 16 ) ( p ) ) ) .

Type 3: formation of a single stack or simultaneous formation and deletion of a pair of incompatible stacks

In this section, we will introduce mutation strategies for producing pathways that involve with formation and deletion of stacks. To perform this type of strategies, we first need to find all possible stacks in an RNA sequence x. We use the algorithm of Bafna et al. [38] to find the set of all possible stacks with more than 3 consecutive base pairs, and denote it by STA(x). There are two strategies in Type 3: formation of a single stack ( M 4 ) and simultaneous formation and destruction of a pair of incompatible stacks ( M 5 ) .

M 4 : Let M 4 t , h ( p ) denote the sequence of actions obtained by forcing the formation of a stack stack h ∈ STA after action a t , where stack h is compatible with S t . The following describes the procedure for computing M 4 t , h ( p ) .

  1. 1.

    Choose t uniformly at random from the interval [1, m], and obtain the associated intermediate structure S t .

  2. 2.

    Find a set of stacks that neither conflict with nor clash with S t , and pick up a stack stack h uniformly at random from the set.

  3. 3.

    Ensure that each base pair (i, j) in {stack h - S t } is sequentially (from the innermost base pair to the outmost base pair) formed after a t .

    • 3.1. If an action add i,j appears in {at+1, . . . , a m }, move it up and place it after a t using strategy M 1 .

    • 3.2. Otherwise, introduce a pair of complementary actions add i,j and del i,j to p after a t using strategy M 3 .

We can introduce additional stacks that are compatible with S t using M 4 by forcing a sequence of addition actions successively forming base pairs in {stack h - S t }, after a t .

M 5 : Let M 5 t , h ( p ) denote the sequence of actions obtained by forcing the formation of a stack stack h ∈ STA which is incompatible with S t , after action a t . Shown on the right side of Figure 4 is a folding pathway which simultaneously destructs and forms a pair of incompatible stacks. Shown on the left side is a simple folding pathway which has exactly the same start and end structures, while it folds into a single stranded structure during the folding. Usually, the pathway on the right has lower energy barrier than the one on the left because it never folds into a single stranded structure. The folding pathway on the right side of Figure 4 can be introduced using strategy M 5 . And, the procedure for computing M 5 t , h ( p ) is as follows:

  1. 1.

    Choose an arbitrary deletion action a t = del i,j from p, and obtain the associated intermediate structure S t .

  2. 2.

    Find a set of stacks which either conflicts with or clashes with S t , and choose a stack stack h uniformly at random from the set.

  3. 3.

    For each base pair (i', j') in {stack h - S t } that is compatible with S t , place add i ', j ' to p after a t using strategy M 4 .

  4. 4.

    For each base pair (i', j') in {stack h - S t } that is incompatible with S t ,

    • 4.1. Find all the base pairs (i*, j*) in S t that are incompatible with (i', j'), and ensure that each base pair (i*, j*) is deleted before the action add i ', j' .

    • 4.2. If a action de l i * , j * appears in {a t +1, . . . , a m }, move it up before add i ', j 'using strategy M 1 .

    • 4.3. Otherwise, introduce a pair of complementary actions de l i * , j * and ad d i * , j * using strategy M 3 .

Using M 5 , we can introduce the simultaneous formation of a stack stack h , which is incompatible with S t , and destruction of existent stacks (or base pairs) that hamper the formation of stack h . Since cooperative formation and destruction of stacks may contribute additional stacking energies for stabilizing the intermediate structures, better folding pathways with lower energy barriers may be rendered.

Figure 4
figure 4

Two different folding pathways with the identical initial and final secondary structures. Two different folding pathways with the identical initial and final secondary structures. Left: Folding pathway 1 destroys a stack completely before an incompatible stack is formed. Right: Folding pathway 2 destructs a stack and forms an incompatible stack simultaneously.

Results and discussion

Benchmark tests

We benchmarked RNAEAPath against existing methods (BARRIEERS [20, 21], PathwayHunter [17], Find-path [24], and RNATabuPath [28]) by predicting low energy barrier folding pathways between two designated RNA secondary structures of 18 conformational switches. All the conformational switches were taken from the work of Dotu et. al [28]. Five of them are riboswitches, including rb1, rb2, rb3, rb4, and rb5. The metastable structures of these riboswitches have been experimentally determined by inline probing [9, 39]. The thirteen remaining cases concern conformational switches, including hok, SL (Spliced leader RNA), s15, s-box leader, thiM leader, ms2, HDV, dsrA, ribD leader, amv, alpha operon and HIV-1 leader. Sequences of these conformational switches can also be obtained from paRNAss web site [40], and some of the metastable secondary structures were computationally determined using RNAbor [41].

We summarize the results computed by PathwayHunter, the results computed by BARRIERS, the results computed by Findpath (with the look ahead parameter k = 10), the best results over 1000 runs found by RNATabuPath, and the best results over 1 run and 5 runs found by RNAEAPath in Table 1 respectively. And we use '-' to mark test cases that methods fail to apply to in the table. For all methods, free energies of the intermediate structures of the folding pathways (including PathwayHunter) are evaluated based on the Turner model using RNAeval (with -d1 option) from the Vienna RNA Package [25]. The default configuration parameters of RNAEAPath are as follows. MAX is 10, γ is 5, L is 100, ℓ1 is 10, ℓ2 is 5 and ℓ3 is 100. Due to the stochastic nature of the evolutionary algorithm, we report the best energy barrier of RNAEAPath found over both 1 run and 5 runs.

Table 1 Benchmarks of BARRIERS, PathwayHunter, Findpath, RNATabuPath, and RNAEAPath for predicting folding pathways between conformational switches on the 18 test cases

BARRIERS is the only exact solution that produces indirect pathways based on the Turner model. BARRIERS has already been compared with existing heuristic algorithms on the same test cases in the work of Dotu et al. [28]. We put the results of BARRIERS in the table just for the sake of comparison. It has been pointed out that BARRIERS gives provably globally optimal pathways in 4 out of 18 cases (i.e. SL, attenuator, s15 and dsrA). BARRIERS can not be directly applied to 5 cases because either the initial or the end structure is not locally optimal (i.e. rb2, sbox leader, ms2, amv and alpha operon), and can not converge in the remaining cases. Possibly due to the fact that both the number of RNA secondary conformations to consider and the computational resources required increase exponentially with the growing length of the RNA sequence and the growing range of energy barrier. PathwayHunter is an exact algorithm capable of producing the optimal direct folding pathways based on the Nussinov model. PathwayHunter can not be directly applied to 10 cases, because it requires the pair of input structures being able to form a 'pairwise-optimal' bipartite conflicting graph (see the work of Thachuk et al. [17] for details). It is not surprising that the performance of the exact algorithm, PathwayHunter, evaluated by free energy (in kcal/mol), is worse than the heuristic algorithms. This is because PathwayHunter is optimized based on the Nussinov model and only produces direct pathways, while the optimal direct pathways predicted based on the Nussinov model may not be the optimal pathways (considering both direct and indirect pathways) based on the Turner model. All the remaining three methods are heuristics capable of producing both direct and indirect pathways based on the Turner model. Findpath produces folding pathways very quickly, however it performs worse than both RNATabuPath and RNAEAPath in most cases. RNATabuPath performs better than Findpath, but produces less optimal pathways than RNAEAPath. The energy barriers predicted by RNAEAPath over 5 runs are exactly the same as RNATabuPath in 5 cases, worse in 1 case, and better in all the remaining 12 cases.

Other heuristic algorithms (including a greedy algorithm of Voss et al. [26], a semi-greedy modification of the greedy algorithm, a greedy algorithm of Morgan, and Higgs [16] for predicting direct pathways and a variant of the Morgan-Higgs greedy algorithm capable of producing indirect pathways), that have been shown to perform considerably worse than RNATabuPath [28], are not listed.

By analyzing the best folding pathways produced by RNAEAPath, we found that most high-quality pathways involve the melting of stacks in the initial structure, the (possibly simultaneous) construction of stacks in the final structure, and the formation of auxiliary temporary stacks for obtaining folding pathways with lower energy barriers. We may take the lowest energy barrier folding pathway of rb2 found by RNAEAPath, shown in Figure 5 as an example. The stack colored in red is an auxiliary temporary stack introducing intermediate structures with lower free energies (which is constructed using M 4 ). Some of the stacks in the initial structure (in blue) are gradually melting, while at the same time, an incompatible stack (in green) is being formed (which is constructed using M 5 ). The stack colored in red is an auxiliary temporary stack introducing intermediate structures with lower free energies. This example convinces us that the advantages of RNAEAPath mainly come from employing mutation strategies that guide the construction of folding pathways by the formation and destruction of stacks and introducing additional stacking interactions that are important for stabilizing the intermediate structures. Detailed low energy barrier folding pathways for all the test cases are available on RNAEAPath web site.

Figure 5
figure 5

The best folding pathway predicted for rb2. The near optimal indirect pathway between the two conformational secondary structures of the adenine riboswitch from V. vulnificus (rb2) predicted RNAEAPath.

Control parameters and performance

In order to evaluate the performance of RNAEAPath with different parameter configurations, we played with several other control parameters, including â„“1, the number of top offsprings preserved in the next generation, varying from 1 to 16, â„“3, the size of population in each generation, varying from 80 to 120 and L, the total number of offsprings each individual is expected to produce, varying from 80 to 120. The detailed results are shown in Additional file 1. In general, RNAEAPath produces pathways of roughly the same quality for most test cases with different control parameters, among which the default parameter setting is the best.

We explored the relationship between the performance of RNAEAPath and the number of generations completed by plotting energy barriers of the best folding pathways produced by RNAEAPath with the default parameters in each generation, as shown in Figure 6. In general, the energy barriers decrease dramatically in the first one or two generations, and then the decrements slow down and finally plateau within 10 generations. For instance, in the case of rb3, the predicted energy barriers of folding pathways in the initial population is 27.3 kcal/mol. It decreases by 7.2 kcal/mol (24.9%) through the first two generations and decreases by 2.5 kcal/mol (9.2%) through the next three generations. Through all the remaining generations, no further improvement is made.

Figure 6
figure 6

Performance of RNAEAPath in each generation. Energy barriers (in kcal/mol) of the best folding pathways of 18 conformational switches by the end of each generation during a typical run of RNAEAPath using the default parameters.

We also evaluated the execution time for each run of RNAEAPath. All the tests were performed on a 32 bit PC with 2.4 GHz Quad-processor and 3.2 GB memory, running Fedora 11. With the default control parameters, RNAEAPath terminates in 1 minute in the best case (rb4), 445 minutes in the worst case (hok), and 43 minutes on average. The detailed running times are shown in Additional file 1. We did not perform direct comparisons between the running time of RNATabuPath and that of RNAEAPath, since RNATabuPath is only accessible via web server.

Conclusions

In conclusion, we have presented a new algorithm, RNAEAPath, for predicting low energy barrier folding pathways between conformational structures. RNAEAPath guides the construction of folding pathways through the destruction and formation of RNA stacks using various types of mutation strategies, and integrates them in a well-established computational framework of evolutionary algorithm. These mutation strategies can help reduce the search space and make it easier to jump out of local optima. By analyzing the results, we confirmed that most of the best folding pathways involve the formation of auxiliary stacks, or involve the cooperative formation and disruption of incompatible stacks. The benchmarking results show that RNAEAPath outperforms the existing heuristics on most test cases. We believe that this is because the construction of folding pathways in RNAEAPath captures important biological findings.

References

  1. Biebricher CK, Diekmann S, Luce R: Structural analysis of self-replicating RNA synthesized by Qbeta replicase. J Mol Biol. 1982, 154: 629-648. 10.1016/S0022-2836(82)80019-3.

    Article  CAS  PubMed  Google Scholar 

  2. Biebricher CK, Luce R: In vitro recombination and terminal elongation of RNA by Q beta replicase. EMBO J. 1992, 11: 5129-5135.

    PubMed Central  CAS  PubMed  Google Scholar 

  3. Bukau B: Regulation of the Escherichia coli heat-shock response. Mol Microbiol. 1993, 9: 671-680. 10.1111/j.1365-2958.1993.tb01727.x.

    Article  CAS  PubMed  Google Scholar 

  4. Liphardt J, Onoa B, Smith SB, Tinoco I, Bustamante C: Reversible unfolding of single RNA molecules by mechanical force. Science. 2001, 292: 733-737. 10.1126/science.1058498.

    Article  CAS  PubMed  Google Scholar 

  5. Narberhaus F: mRNA-mediated detection of environmental conditions. Arch Microbiol. 2002, 178: 404-410. 10.1007/s00203-002-0481-8.

    Article  CAS  PubMed  Google Scholar 

  6. Narberhaus F, Waldminghaus T, Chowdhury S: RNA thermometers. FEMS Microbiol Rev. 2006, 30: 3-16. 10.1111/j.1574-6976.2005.004.x.

    Article  CAS  PubMed  Google Scholar 

  7. Montange RK, Batey RT: Riboswitches: emerging themes in RNA structure and function. Annu Rev Biophys. 2008, 37: 117-133. 10.1146/annurev.biophys.37.032807.130000.

    Article  CAS  PubMed  Google Scholar 

  8. Roth A, Breaker RR: The structural and functional diversity of metabolite-binding riboswitches. Annu Rev Biochem. 2009, 78: 305-334. 10.1146/annurev.biochem.78.070507.135656.

    Article  CAS  PubMed  Google Scholar 

  9. Wakeman CA, Winkler WC, Dann CE: Structural features of metabolite-sensing riboswitches. Trends Biochem Sci. 2007, 32: 415-424. 10.1016/j.tibs.2007.08.005.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Smith DJ, Query CC, Konarska MM: Nought may endure but mutability: spliceosome dynamics and the regulation of splicing. Mol Cell. 2008, 30: 657-666. 10.1016/j.molcel.2008.04.013.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Simon AE, Gehrke L: RNA conformational changes in the life cycles of RNA viruses, viroids, and virus-associated RNAs. Biochim Biophys Acta. 2009, 1789: 571-583.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Mahen EM, Watson PY, Cottrell JW, Fedor MJ: mRNA secondary structures fold sequentially but exchange rapidly in vivo. PLoS Biol. 2010, 8: e1000307-10.1371/journal.pbio.1000307.

    Article  PubMed Central  PubMed  Google Scholar 

  13. Gerdes K, Gultyaev AP, Franch T, Pedersen K, Mikkelsen ND: Antisense RNA-regulated programmed cell death. Annu Rev Genet. 1997, 31: 1-31. 10.1146/annurev.genet.31.1.1.

    Article  CAS  PubMed  Google Scholar 

  14. Nussinov R, Jacobson AB: Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci USA. 1980, 77: 6309-6313. 10.1073/pnas.77.11.6309.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  15. Nussinov R, Pieczenik G, Griggs JR, Kleitman DJ: Algorithms for loop matchings. SIAM Journal on Applied Mathematics. 1978, 35: 68-82. 10.1137/0135006.

    Article  Google Scholar 

  16. Morgan S, Higgs P: Barrier heights between ground states in a model of RNA secondary structure. J Phys A. 1998, 31 (14): 3153-10.1088/0305-4470/31/14/005.

    Article  CAS  Google Scholar 

  17. Thachuk C, Manuch J, Rafiey A, Mathieson LA, Stacho L, Condon A: An algorithm for the energy barrier problem without pseudoknots and temporary arcs. Pac Symp Biocomput. 2010, 108-119.

    Google Scholar 

  18. Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999, 288: 911-940. 10.1006/jmbi.1999.2700.

    Article  CAS  PubMed  Google Scholar 

  19. Turner DH, Sugimoto N, Freier SM: RNA structure prediction. Annu Rev Biophys Biophys Chem. 1988, 17: 167-192. 10.1146/annurev.bb.17.060188.001123.

    Article  CAS  PubMed  Google Scholar 

  20. Flamm C, Fontana W, Hofacker IL, Schuster P: RNA folding at elementary step resolution. RNA. 2000, 6: 325-338. 10.1017/S1355838200992161.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  21. Flamm C, Hofacker IL, Stadler P, Wolfinger M: Barrier trees of degenerate landscapes. Z Phys Chem. 2002, 216: 155-173. 10.1524/zpch.2002.216.2.155.

    Article  CAS  Google Scholar 

  22. Wuchty S, Fontana W, Hofacker IL, Schuster P: Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers. 1999, 49: 145-165.

    Article  CAS  PubMed  Google Scholar 

  23. Manuch J, Thachuk C, Stacho L, Condon A: NP completeness of the direct energy barrier problem without pseudoknots. 15th International Conference DNA Computing and Molecular Programming. Edited by: Deaton R, Suyama A. 2009, Berlin, Heidelberg: Springer-Verlag, 106-115. [http://dx.doi.org/10.1007/978-3-642-10604-0_11]

    Chapter  Google Scholar 

  24. Flamm C, Hofacker IL, Maurer-Stroh S, Stadler PF, Zehl M: Design of multistable RNA molecules. RNA. 2001, 7: 254-265. 10.1017/S1355838201000863.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  25. Hofacker IL: Vienna RNA secondary structure server. Nucleic Acids Res. 2003, 31: 3429-3431. 10.1093/nar/gkg599.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Voss B, Meyer C, Giegerich R: Evaluating the predictability of conformational switching in RNA. Bioinformatics. 2004, 20: 1573-1582. 10.1093/bioinformatics/bth129.

    Article  CAS  PubMed  Google Scholar 

  27. Geis M, Flamm C, Wolfinger MT, Tanzer A, Hofacker IL, Middendorf M, Mandl C, Stadler PF, Thurner C: Folding kinetics of large RNAs. J Mol Biol. 2008, 379: 160-173. 10.1016/j.jmb.2008.02.064.

    Article  CAS  PubMed  Google Scholar 

  28. Dotu I, Lorenz WA, Van Hentenryck P, Clote P: Computing folding pathways between RNA secondary structures. Nucleic Acids Res. 2010, 38: 1711-1722. 10.1093/nar/gkp1054.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  29. Bogomolov S, Mann M, Voß B, Podelski A, Backofen R: Shape-based barrier estimation for RNAs. German Conference on Bioinformatics. 2010, 41-50.

    Google Scholar 

  30. Lorenz R, Flamm C, Hofacker IL: 2D projections of RNA folding landscapes. German Conference on Bioinformatics. 2009, 11-20.

    Google Scholar 

  31. Flamm C, Hofacker IL: Beyond energy minimization: Approaches to the kinetic folding of RNA. Monatsh Chem. 2008, 139: 447-457. 10.1007/s00706-008-0895-3.

    Article  CAS  Google Scholar 

  32. Shcherbakova I, Mitra S, Laederach A, Brenowitz M: Energy barriers, pathways, and dynamics during folding of large, multidomain RNAs. Curr Opin Chem Biol. 2008, 12: 655-666. 10.1016/j.cbpa.2008.09.017.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  33. Ndifon W: A complex adaptive systems approach to the kinetic folding of RNA. Biosystems. 2005, 82: 257-265. 10.1016/j.biosystems.2005.08.004.

    Article  CAS  PubMed  Google Scholar 

  34. Sponer J, Leszczynski J, Hobza P: Nature of nucleic acid-base stacking: nonempirical ab initio and empirical potential characterizaion of 10 stacked base dimers. Comparison of stacked and H-bonded base pairs. J Phys Chem. 1996, 100 (13): 5590-5596. 10.1021/jp953306e.

    Article  CAS  Google Scholar 

  35. Yakovchuk P, Protozanova E, Frank-Kamenetskii MD: Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 2006, 34: 564-574. 10.1093/nar/gkj454.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  36. Zhao P, Zhang WB, Chen SJ: Predicting secondary structural folding kinetics for nucleic acids. Biophys J. 2010, 98: 1617-1625. 10.1016/j.bpj.2009.12.4319.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  37. Eiben A: Evolutionary computing: the most powerful problem solver in the universe?. Dutch Mathematical Archive. 2002, 5: 126-131.

    Google Scholar 

  38. Bafna V, Tang H, Zhang S: Consensus folding of unaligned RNA sequences revisited. J Comput Biol. 2006, 13: 283-295. 10.1089/cmb.2006.13.283.

    Article  CAS  PubMed  Google Scholar 

  39. Mandal M, Boese B, Barrick JE, Winkler WC, Breaker RR: Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell. 2003, 113: 577-586. 10.1016/S0092-8674(03)00391-X.

    Article  CAS  PubMed  Google Scholar 

  40. paRNAss. [http://bibiserv.techfak.uni-bielefeld.de/parnass/examples.html]

  41. Freyhult E, Moulton V, Clote P: Boltzmann probability of RNA structural neighbors and riboswitch detection. Bioinformatics. 2007, 23: 2054-2062. 10.1093/bioinformatics/btm314.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This article has been published as part of BMC Bioinformatics Volume 13 Supplement 3, 2012: ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/13/S3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaojie Zhang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SZ and YL conceived and designed the project. YL implemented the program and ran benchmark tests for the paper. SZ and YL both drafted the manuscript and approved the final manuscript.

Electronic supplementary material

12859_2012_5083_MOESM1_ESM.pdf

Additional file 1: Supplementary data for predicting folding pathways between RNA conformational structures guided by RNA stacks. Supplementary data for predicting folding pathways between RNA conformational structures guided by RNA stacks in a PDF file. (PDF 326 KB)

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Li, Y., Zhang, S. Predicting folding pathways between RNA conformational structures guided by RNA stacks. BMC Bioinformatics 13 (Suppl 3), S5 (2012). https://doi.org/10.1186/1471-2105-13-S3-S5

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-13-S3-S5

Keywords