Representation of RNA folding pathways
Given an initial structure A and an end structure B, we use a sequence of actions successively applied to A, rather than a sequence of intermediate structures, to represent a folding pathway from A to B. Representing a pathway by an action chain can avoid cyclic additions and deletions of base pairs and make it easy to simulate the formation and deletion of RNA stacks. A similar representation has also been employed in the previous work of Thachuk et al. [17].
We use two types of actions, add_{
i,j
} and del_{
i,j
} in the representation of RNA folding pathways. For an intermediate secondary structure S of an RNA sequence x, the action add_{
i,j
} denotes the 'add'ition of base pair (i, j) to S (i.e. add_{
i,j
}(S) = S ∪ {(i, j)}) and del_{
i,j
} denotes the 'del'etion of base pair (i, j) from S (i.e. del_{
i,j
}(S) = S  {(i, j)}). An action is direct if it concerns a base pair in A ∪ B and indirect otherwise. The simplest direct pathways from A to B concern sequential deletions of all base pairs in A  B followed by additions of all base pairs in B  A.
Consider an example sequence x = GGGGAAAACCCCUUUU with initial and final structures shown in Figure 1. This simple pathway is obtained by first deleting all GC pairs from A until the RNA is single stranded, and then adding all AU pairs until B is obtained. Note that each intermediate structure S_{
i
} differs from both its successor and predecessor by exactly one base pair. The actions in the example are all direct actions and the energy barrier is 5.50  (6.60) = 12.10 kcal/mol.
An addition action add_{
i,j
}(S) conflicts with S if either x_{
i
} or x_{
j
} is already paired in S, and it clashes with S if there exists a base pair \left\{\left({x}_{i}^{\prime},{x}_{j}^{\prime}\right)\in Si<{i}^{\prime}<j<{j}^{\prime}\mathsf{\text{or}}{i}^{\prime}i{j}^{\prime}j\right\}. A deletion action del_{
i,j
}(S) conflicts with S if (x_{
i
}, x_{
j
}) ∉ S. An addition or deletion action is valid and can be applied to S properly if it neither conflicts with nor clashes with S.
A pathway from A to B can be represented by an action chain, which is a sequence of valid actions a_{1}, . . . , a_{
m
} such that S_{0} = A, S_{
t
} = a_{
t
}(S_{
t
}_{1}) for 1 ≤ t ≤ m and S_{
m
} = B. Note that an action chain for A to B implies a sequence of valid actions that can be successively applied to A without introducing conflicts or clashes and produce B. We use the term "action chain" when the sequence is certified to be valid, and the term "sequence of actions" if its validity is not guaranteed.
This representation of a pathway p from A to B has the following important properties. First, every folding pathway can be represented by a unique action chain and every action chain represents a unique folding pathway (note that it is not necessarily true for a sequence of actions). Second, rearranging the order of actions in p results in a new sequence of actions which represents a new folding pathway from A to B when it is valid. (It is an action chain that can be successively applied to A properly and obtain B.) Third, introducing a pair of complementary actions (e.g. add_{
i,j
} and del_{
i,j
}) to p results in a new sequence of actions which also represents a new folding pathway from A to B if it is valid.
In RNAEAPath, folding pathways are represented in the form of action chains, instead of a sequence of intermediate structures. This representation makes the life cycle of a folding pathway transparent to the algorithm and also makes it easier for us to simulate the cooperative formation and destruction of RNA stacks by rearranging the order of actions or introducing multiple pairs of complementary actions.
Predicting low energy barrier folding pathways
Given an RNA sequence x, an initial structure A and a final structure B, RNAEAPath computes a near optimal low energy barrier folding pathway from A to B in an evolutionary algorithm framework [37]. Figure 2 elucidates the overall paradigm for RNAEAPath. In this algorithm, the population of each generation is comprised of folding pathways ordered by their fitness. The functions {\mathbb{M}}_{y}\left(p\right) are mutation strategies, each of which takes in a pathway p and produces a set of offspring pathways. These mutation strategies are central to the effectiveness of RNAEAPath and will be discussed in the Mutation strategies subsection. ℓ_{1}, ℓ_{2}, ℓ_{3}, MAX and γ are positive integer control parameters.
The initial population of RNAEAPath, ℙ_{0}, is filled with a set of simple pathways. Then, the algorithm goes through several iterations. ℙ_{
k
}_{1} is the population of the k  1^{st} iteration. In the k^{th} iteration, the algorithm produces {\mathbb{O}}_{k} (an ordered list of pathways) and ℙ_{
k
} (the population of the k^{th} iteration) from ℙ_{
k
}_{1}. {\mathbb{O}}_{k} stores the best ℓ_{1} pathways in ℙ_{
k
}_{1} and the best ℓ_{2} pathways produced by each p ∈ ℙ_{
k
}_{1}. More specifically, each pathway p ∈ ℙ_{
k
}_{1} produces {t}_{y}^{k} offsprings through every mutation strategy {\mathbb{M}}_{y}\left(1\le y\le Y\right). The resulting offsprings produced by p are stored in a temporary list \mathbb{T}, and the top ℓ_{2} pathways are added to {\mathbb{O}}_{k}. Finally, the best solution of the k^{th} iteration, termed as OPT_{
k
}, is the best pathway in {\mathbb{O}}_{k}. And, ℙ_{
k
} (the population of the k^{th} iteration) is composed of the best ℓ_{3} pathways of {\mathbb{O}}_{k} and will be used in the next iteration to produce ℙ_{
k
}_{+1}. This helps keep the diversity of the population large, since ℙ_{
k
} contains at most ℓ_{2} offsprings produced by each p ∈ ℙ_{
k
}_{1}, no matter how many highqualified offsprings are produced by each pathway. The algorithm terminates when a stopping condition is met, and it returns the best solution of the last iteration. Since {\mathbb{O}}_{k} retains the best ℓ_{1} pathways from ℙ_{
k
}_{1} in each iteration, the best one ever encountered by the algorithm is retained in lists {\mathbb{O}}_{k} and ℙ_{
k
}, and stored in OPT_{
k
}. So, OPT_{
k
} has no worse fitness when compared to OPT_{
k
}_{1}, and RNAEAPath always returns the best action chain it ever discovered.
In the remaining of this section, we discuss details regarding fitness evaluation, initialization of the population, stopping conditions and mutation strategies of RNAEAPath.
Fitness of action chains
The order of folding pathways (valid action chains) is primarily determined by their energy barriers. In case of a tie, the order is determined by the average of energy differences between the initial structure A and intermediate structures. Note that lower energies are preferred in the previous two methods of ordering. If a tie still exists, then shorter action chains are preferred. Action chains are ordered arbitrarily if their relative order can not be determined based on these three criteria.
The initial population of folding pathways
The initial population, ℙ_{0}, contains 4 simple pathways from A to B formed by first deleting all base pairs in A  B and then adding those in B  A, similar to the pathway shown in Figure 1. Although we can also arrange base pair deletions and additions in an arbitrary order, we tailor them in a manner that simulates successive degradation and formation of RNA stacks. This is because random deletions and additions of base pairs tend to form additional unpaired loop regions that introduce entropic penalties (see Figure 3 for an illustration). We can degrade or form each stack either from the outmost base pair to the innermost base pair or vice verse. Usually, it yields a lower energy barrier if we degrade a stack from the outmost base pair to the innermost base pair and form a stack from the innermost base pair to the outmost base pair. However, for the sake of simplicity and generosity, we construct 4 simple pathways in ℙ_{0}, which degrade all the stacks from the same direction and form all the stacks from the same direction. These simple pathways constitute a diversified and unbiased initial population for the algorithm to start from.
The number of offsprings produced by each mutation strategy
In each generation, the expected total number of offsprings produced by each individual is a constant positive integer \mathcal{L}. The number of offsprings that each individual produces using mutation strategy {\mathbb{M}}_{y},\left(1\le y\le Y\right), in the k^{th} generation, is denoted by {\ell}_{{\mathbb{M}}_{y}}^{k}. In the initial generation, {\ell}_{{\mathbb{M}}_{y}}^{0} is equivalent to \mathcal{L}/Y for all the mutation strategies. In the k^{th} generation, {\ell}_{{\mathbb{M}}_{y}}^{k} is determined adaptively according to the quality of the offsprings produced using {\mathbb{M}}_{y} in the k  1^{st} iteration. Let {b}_{{\mathbb{M}}_{y}}^{k1} be number of offsprings that are both produced through {\mathbb{M}}_{y} and selected to construct ℙ_{k1}, the population of the k  1^{st} generation. Then, {\ell}_{{\mathbb{M}}_{y}}^{k} in the k^{th} generation is computed as follows.
{\ell}_{{\mathbb{M}}_{y}}^{k}=\mathrm{max}\left\{\begin{array}{c}{\mathcal{L}}_{\mathit{min}}\hfill \\ \phantom{\rule{0.5em}{0ex}}\\ \hfill \frac{\left({b}_{y}^{k1}/{\ell}_{{\mathbb{M}}_{y}}^{k1}\right)}{\sum _{{y}^{\prime}=1}^{Y}\left({b}_{{y}^{\prime}}^{k1}/{\ell}_{{\mathbb{M}}_{{y}^{\prime}}}^{k1}\right)}\mathcal{L}\hfill \end{array}\right.
Mutation strategies that have produced more high quality offsprings in the (k  1)^{st} iteration are allowed to generate more offsprings in the k^{th} generation. In contrast, mutation strategies that perform poorly in the k  1^{st} generation, are only allowed to generate a small number (\mathcal{L}_{
min
}, with default value 3) of offsprings. Note that, the sum of {\ell}_{{\mathbb{M}}_{y}}^{k} for 1 ≤ y ≤ Y may be greater than ℓ.
Stopping conditions
The algorithm terminates when (1) the current best solution achieves the lowest possible value E(B)  E(A), or (2) when no improvement has been found over γ consecutive iterations (a plateau), or (3) when MAX number of iterations have passed and successive iterations do not discover better results. Note that the algorithm may simulate further than MAX iterations if improvements are made in the very last iteration and it stops immediately if no improvement is made between successive iterations. More specifically, the algorithm stops when any of the following conditions is satisfied:

1.
the energy barrier of OPT_{
k
} is equivalent to E(B)  E(A).

2.
k >γ and the fitness of OPT_{
k
} is equivalent to that of OPT_{
k
}_{}_{
γ
}.

3.
k ≥ MAX and the fitness of OPT_{
k
} is equivalent to that of OPT_{
k
}_{1}.
Mutation strategies
In RNAEAPath, the mutation strategies employed to evolve folding pathways can be categorized into three types: (1) rearranging the order of actions, (2) introducing indirect pathways and (3) formation of a single stack or cooperative conversion of a pair of incompatible stacks. In this section, let {\mathbb{M}}_{1},\dots ,{\mathbb{M}}_{Y} denote the mutation strategies and let p = a_{1}, . . . , a_{
m
} denote the input pathway A = S_{0}, . . . , S_{
m
} = B. For each mutation strategy {\mathbb{M}}_{y}\left(p\right), we describe the process for generating one new pathway q using each mutation strategy when given p.
Type 1: reordering of actions
As described in the subsection of representation of RNA folding pathways, shuffling the order of actions of the input pathway p can result in a new pathway from A to B. In RNAEAPath, two mutation strategies of this type are employed. {\mathbb{M}}_{1} changes the position of an arbitrary action, and {\mathbb{M}}_{2} swaps the positions of two arbitrary actions.
M_{1}: Let {\mathbb{M}}_{1}^{{t}_{1},{t}_{2}}\left(p\right) denote the sequence of actions obtained by first removing an action {a}_{{t}_{1}} (1 ≤ t_{1} ≤ m) from p and then inserting it after {a}_{{t}_{2}}, for all t_{2} ∈ {0,..., t_{1}  1, t_{1} + 1, . . . , m}. Note that the resulting sequence of actions may not necessarily be a valid action chain. For instance, in Figure 1, {\mathbb{M}}_{1}^{1,4}\left(p\right)={a}_{2},{a}_{3},{a}_{4},{a}_{1},{a}_{5},\dots ,{a}_{8} and {\mathbb{M}}_{1}^{3,2}\left(p\right)=p are valid action chains, while {\mathbb{M}}_{1}^{8,1}\left(p\right)={a}_{1},{a}_{8},{a}_{2},\dots ,{a}_{7} is not.
The procedure for computing {\mathbb{M}}_{1}^{{t}_{1},{t}_{2}}\left(p\right) is described in the following.

1.
Choose t_{1} uniformly at random from the interval [1, m].

2.
Compute the interval [l, u], (t_{1} <l <u <m), where l is the minimum and u is the maximum such that for all t_{2} ∈ [l, u] and t_{2} ≠ t_{1}, {\mathbb{M}}_{1}^{{t}_{1},{t}_{2}}\left(p\right) is a valid action chain.

3.
Choose t_{2} from the interval [l, u].

3.1. If {a}_{{t}_{1}} is an addition operation, for all l ≤ t <t' ≤ u and t ≠ t' ≠ t_{1}, the probability of choosing t is greater than that of t'.

3.2. Otherwise (a deletion operation), for all l ≤ t ≤ t' ≤ u and t ≠ t' ≠ t_{1}, the probability of choosing t is less than that of t'
We do not choose t_{2} (t_{2} ≠ t_{1}) uniformly at random in [l, u], instead, we tend to place addition operations in the front part of p, and deletion operations in the later part of p. This is because adding base pairs early and deleting them late during the folding may help stabilize the intermediate secondary structures. (Please see Additional file 1 for the detailed description of the discrete probability.)
{\mathbb{M}}_{2}: Let {\mathbb{M}}_{2}^{{t}_{1},{t}_{2}}\left(p\right) denote the sequence of actions obtained by swapping {a}_{{t}_{1}} with {a}_{{t}_{2}}. If the resulting sequence of actions is a valid action chain, let it be q; otherwise, restart the process. For example, in Figure 1, {\mathbb{M}}_{2}^{1,8}\left(p\right) is not a valid action chain, while {\mathbb{M}}_{2}^{2,4}\left(p\right)={a}_{1},{a}_{4},{a}_{3},{a}_{2},{a}_{5},\dots ,{a}_{8} is. t_{1} and t_{2} are chosen uniformly at random from {(t_{1}, t_{2}): 1 ≤ t_{1} <t_{2} ≤ m}.
Mutation strategies of type 1 provide methods for shuffling the order of actions of an input pathway and generating slightly different new pathways. However, these strategies are not capable of introducing additional (indirect) base pairs, and the offsprings of a direct pathway produced through type 1 strategies are also direct. In the following, we will describe mutation strategies that are able to construct indirect pathways from a direct pathway.
Type 2: introducing indirect pathways by adding a pair of complementary actions
Morgan and Higgs [16] pointed out that the optimal folding paths are generally indirect pathways. This idea was further described by Dotu et al. [28]. The temporary formation of base pairs, especially those base pairs that do not belong to A ∪ B, may lower the energies of intermediate structures and thus render better folding pathways. Similarly, temporary deletion and reformation of a base pair also can create an indirect pathway.
{\mathbb{M}}_{3}: Let {\mathbb{M}}_{3}^{{t}_{1},{t}_{2},+\left(i,j\right)}\left(p\right) denote the sequence of actions obtained by introducing an addition action add_{
i,j
} after {a}_{{t}_{1}} and its complementary action del_{
i,j
} after {a}_{{t}_{2}}. Let {\mathbb{M}}_{3}^{{t}_{1},{t}_{2}\left(i,j\right)}\left(p\right) denote the sequence of actions obtained by introducing a deletion action del_{
i,j
} after {a}_{{t}_{1}} and its complementary action add_{
i,j
} after {a}_{{t}_{2}}. For example, in Figure 1, {\mathbb{M}}_{3}^{1,7,+\left(1,16\right)}\left(p\right)={a}_{1},add_{1,16}, a_{2}, ..., a_{7}, del_{1,16}, a_{8}. The procedures for computing {\mathbb{M}}_{3}^{{t}_{1},{t}_{2},+\left(i,j\right)}\left(p\right) and {\mathbb{M}}_{3}^{{t}_{1},{t}_{2},\left(i,j\right)}\left(p\right) are similar to each other. In the following, we only describe the procedure for computing {\mathbb{M}}_{3}^{{t}_{1},{t}_{2},+\left(i,j\right)}\left(p\right).

1.
Choose t_{1} uniformly at random from the interval [1, m], and obtain the associated intermediate structure {S}_{{t}_{1}}.

2.
Find a set of base pairs that neither conflict with nor clash with {S}_{{t}_{1}} and choose a base pair (i, j) uniformly at random from the set.

3.
Compute the interval [l, u], (t_{1} <l <u <m), where l is the minimum and u is the maximum such that for all values t_{2} ∈ [l, u] the resulting sequence of actions of {\mathbb{M}}_{3}^{{t}_{1},{t}_{2},+\left(i,j\right)}\left(p\right) is a valid action chain.

4.
Choose t_{2} from the interval [l, u] with the probability of choosing t greater than that of t' for all t >t'. (This is because (i, j) is not likely to be deleted soon after its formation.)
Mutation strategy {\mathbb{M}}_{3}is capable of producing an indirect pathway from a direct pathway. In addition, a proper combination of multiple applications of {\mathbb{M}}_{3} may result in a pathway which simulates the successive formation and deletion of a temporary stack during the folding. Take the pathway p in Figure 1 as an example, we can construct a pathway q that forms a temporary stack consisting of all the GU base pairs via a multiple application of {\mathbb{M}}_{3},\phantom{\rule{0.5em}{0ex}}q={\mathbb{M}}_{3}^{5,7,+\left(3,14\right)}\left({\mathbb{M}}_{3}^{3,7,+\left(2,15\right)}\left({\mathbb{M}}_{3}^{1,7,+\left(1,16\right)}\left(p\right)\right)\right).
Type 3: formation of a single stack or simultaneous formation and deletion of a pair of incompatible stacks
In this section, we will introduce mutation strategies for producing pathways that involve with formation and deletion of stacks. To perform this type of strategies, we first need to find all possible stacks in an RNA sequence x. We use the algorithm of Bafna et al. [38] to find the set of all possible stacks with more than 3 consecutive base pairs, and denote it by STA(x). There are two strategies in Type 3: formation of a single stack \left({\mathbb{M}}_{4}\right) and simultaneous formation and destruction of a pair of incompatible stacks \left({\mathbb{M}}_{5}\right).
{\mathbb{M}}_{4}: Let {\mathbb{M}}_{4}^{t,h}\left(p\right) denote the sequence of actions obtained by forcing the formation of a stack stack_{
h
} ∈ STA after action a_{
t
}, where stack_{
h
} is compatible with S_{
t
}. The following describes the procedure for computing {\mathbb{M}}_{4}^{t,h}\left(p\right).

1.
Choose t uniformly at random from the interval [1, m], and obtain the associated intermediate structure S_{
t
}.

2.
Find a set of stacks that neither conflict with nor clash with S_{
t
}, and pick up a stack stack_{
h
} uniformly at random from the set.

3.
Ensure that each base pair (i, j) in {stack_{
h
}  S_{
t
}} is sequentially (from the innermost base pair to the outmost base pair) formed after a_{
t
}.

3.1. If an action add_{
i,j
} appears in {a_{t+1}, . . . , a_{
m
}}, move it up and place it after a_{
t
} using strategy {\mathbb{M}}_{1}.

3.2. Otherwise, introduce a pair of complementary actions add_{
i,j
} and del_{
i,j
} to p after a_{
t
} using strategy {\mathbb{M}}_{3}.
We can introduce additional stacks that are compatible with S_{
t
} using {\mathbb{M}}_{4} by forcing a sequence of addition actions successively forming base pairs in {stack_{
h
}  S_{
t
}}, after a_{
t
}.
{\mathbb{M}}_{5}: Let {\mathbb{M}}_{5}^{t,h}\left(p\right) denote the sequence of actions obtained by forcing the formation of a stack stack_{
h
} ∈ STA which is incompatible with S_{
t
}, after action a_{
t
}. Shown on the right side of Figure 4 is a folding pathway which simultaneously destructs and forms a pair of incompatible stacks. Shown on the left side is a simple folding pathway which has exactly the same start and end structures, while it folds into a single stranded structure during the folding. Usually, the pathway on the right has lower energy barrier than the one on the left because it never folds into a single stranded structure. The folding pathway on the right side of Figure 4 can be introduced using strategy {\mathbb{M}}_{5}. And, the procedure for computing {\mathbb{M}}_{5}^{t,h}\left(p\right) is as follows:

1.
Choose an arbitrary deletion action a_{
t
} = del_{
i,j
} from p, and obtain the associated intermediate structure S_{
t
}.

2.
Find a set of stacks which either conflicts with or clashes with S_{
t
}, and choose a stack stack_{
h
} uniformly at random from the set.

3.
For each base pair (i', j') in {stack_{
h
}  S_{
t
}} that is compatible with S_{
t
}, place add_{
i
}_{',}_{
j
}_{'} to p after a_{
t
} using strategy {\mathbb{M}}_{4}.

4.
For each base pair (i', j') in {stack_{
h
}  S_{
t
}} that is incompatible with S_{
t
},

4.1. Find all the base pairs (i*, j*) in S_{
t
} that are incompatible with (i', j'), and ensure that each base pair (i*, j*) is deleted before the action add_{
i
}_{',}_{
j'
}.

4.2. If a action \mathsf{\text{de}}{\mathsf{\text{l}}}_{{i}^{*},{j}^{*}} appears in {a_{
t
}_{+1}, . . . , a_{
m
}}, move it up before add_{
i
}_{'},_{
j
}_{'}using strategy {\mathbb{M}}_{1}.

4.3. Otherwise, introduce a pair of complementary actions \mathsf{\text{de}}{\mathsf{\text{l}}}_{{i}^{*},{j}^{*}} and \mathsf{\text{ad}}{\mathsf{\text{d}}}_{{i}^{*},{j}^{*}}using strategy {\mathbb{M}}_{3}.
Using {\mathbb{M}}_{5}, we can introduce the simultaneous formation of a stack stack_{
h
}, which is incompatible with S_{
t
}, and destruction of existent stacks (or base pairs) that hamper the formation of stack_{
h
}. Since cooperative formation and destruction of stacks may contribute additional stacking energies for stabilizing the intermediate structures, better folding pathways with lower energy barriers may be rendered.