RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory
 Juan Antonio GarciaMartin^{1, 3},
 Amir H. Bayegan^{1},
 Ivan Dotu^{2} and
 Peter Clote^{1}Email author
DOI: 10.1186/s1285901612806
© The Author(s) 2016
Received: 24 May 2016
Accepted: 26 September 2016
Published: 19 October 2016
Abstract
Background
RNA inverse folding is the problem of finding one or more sequences that fold into a userspecified target structure s _{0}, i.e. whose minimum free energy secondary structure is identical to the target s _{0}. Here we consider the ensemble of all RNA sequences that have low free energy with respect to a given target s _{0}.
Results
We introduce the program RNAdualPF, which computes the dual partition function Z ^{∗}, defined as the sum of Boltzmann factors exp(−E(a,s _{0})/RT) of all RNA nucleotide sequences a compatible with target structure s _{0}. Using RNAdualPF, we efficiently sample RNA sequences that approximately fold into s _{0}, where additionally the user can specify IUPAC sequence constraints at certain positions, and whether to include dangles (energy terms for stacked, singlestranded nucleotides). Moreover, since we also compute the dual partition function Z ^{∗}(k) over all sequences having GCcontent k, the user can require that all sampled sequences have a precise, specified GCcontent.
Using Z ^{∗}, we compute the dual expected energy 〈E ^{∗}〉, and use it to show that natural RNAs from the Rfam 12.0 database have higher minimum free energy than expected, thus suggesting that functional RNAs are under evolutionary pressure to be only marginally thermodynamically stable.
We show that C. elegans precursor microRNA (premiRNA) is significantly nonrobust with respect to mutations, by comparing the robustness of each wild type premiRNA sequence with 2000 [resp. 500] sequences of the same GCcontent generated by RNAdualPF, which approximately [resp. exactly] fold into the wild type target structure. We confirm and strengthen earlier findings that precursor microRNAs and bacterial small noncoding RNAs display plasticity, a measure of structural diversity.
Conclusion
We describe RNAdualPF, which rapidly computes the dual partition function Z ^{∗} and samples sequences having low energy with respect to a target structure, allowing sequence constraints and specified GCcontent. Using different inverse folding software, another group had earlier shown that premiRNA is mutationally robust, even controlling for compositional bias. Our opposite conclusion suggests a cautionary note that computationally based insights into molecular evolution may heavily depend on the software used.
C/C++software for RNAdualPF is available at http://bioinformatics.bc.edu/clotelab/RNAdualPF.
Keywords
RNA secondary structure Partition function Boltzmann ensemble RobustnessBackground
In [1], Borenstein and Ruppin define neutrality of an RNA sequence a=a _{1},…,a _{ n } by \(\eta (\mathbf {a}) = 1\frac {\langle d \rangle }{n}\), where in this section 〈d〉 denotes the average, taken over all 3n singlepoint mutants of a, of the base pair distance d _{BP} between the minimum free energy (MFE) structure s _{0} of a and the MFE structures of singlepoint mutants of a. An RNA sequence a is then defined to be robust if η(a) is greater than the average neutrality of 1000 control sequences generated by the program RNAinverse [2], which fold into the same target structure s _{0}. The main finding of [1] is that precursor microRNAs (premiRNA) exhibit a significantly higher level of mutational robustness than random RNA sequences having the same structure. To control for sequence composition bias in their computational study, the authors selected sequences from the output of RNAinverse, whose dinucleotide composition was similar to that of wild type premiRNA (JensenShannon divergence less than 0.01). Since the filtering step required enormous run time and computational resources, the authors restricted their attention to a small set of 211 microRNAs, generating only 100 control sequences per microRNA. Borenstein and Ruppin conclude that robustness of precursor microRNAs is not the byproduct of a base composition bias or of thermodynamic stability.
Subsequently Rodrigo et al. [3] undertook a similar analysis for bacterial small RNAs, also using the program RNAinverse, albeit using somewhat different definitions – precise definitions are given in “Formal definitions of robustness” section. The main finding of [3] was that bacterial sncRNAs are not significantly robust when compared with 1000 sequences having the same structure, as computed by RNAinverse; however, bacterial sncRNAs tend to be significantly plastic, in the sense that the ensemble of low energy structures is structurally diverse. Unlike the case of precursor microRNAs [1], Rodrigo et al. did not control for sequence compositional bias.
This raises the question of whether the control sequences analyzed in [1, 3] are representative or to what extent features shared by sequences output by the program RNAinverse are artifacts of the program used. Indeed, the number of RNA sequences that fold into a given target structure can be astronomically large. Over a few weeks, before we elected to terminate the execution, our stateoftheart inverse folding software RNAiFold [4] generated 273,926,421 many 52nt sequences that fold exactly into the MFE secondary structure s _{0} of HIV1 ribosomal frameshift stimulating signal from the GagPol overlap region AF033819.3/16311682, and which additionally code 17mer peptides in the Gag and Pol reading frames having amino acids that appear in Gag/Pol peptides found in the Los Alamos HIV1 database [5]. The number of 52 nt RNA sequences that fold into target s _{0} without additionally imposing the constraint of coding particular peptides in overlapping Gag/Pol reading frames is certain to dwarf the previous number. Moreover, the number of sequences that fold into the MFE structure of an animal precursor microRNA (length 68 to 91 nt [6]) or into the MFE structure of bacterial sncRNA (length 53436 nt [3]) is certain to be even more daunting.
Different inverse folding algorithms have adopted different strategies to generate sequences that fold into a userspecified target secondary structure s _{0}. For instance, RNAinverse [2, 7] performs an adaptive walk, in one step of which a nucleotide in the current sequence is mutated and subsequently accepted if the base pair distance between the minimum free energy (MFE) structure of the mutated sequence and the target structure s _{0} is reduced. NUPACK Design [8] selects a candidate mutation position with probability proportional to its contribution to the ensemble defect (Boltzmannweighted Hamming distance to the vector representation of s _{0}, where s _{0}[i]=j indicates (i,j)∈s _{0} and s _{0}[i]=i indicates i is unpaired in s _{0}). RNAiFold CPdesign [4, 9] uses constraint programming to systematically explore the search tree of all inverse folding solutions in an order determined by certain heuristics. Accordingly, one cannot claim that the collection of sequences generated by any particular inverse folding algorithm is representative of the astronomically large space of all inverse folding solutions – indeed, each inverse folding algorithm has an inherent but unknown bias.
In this paper, we describe the algorithm RNAdualPF, which generates sequences which have low free energy with respect to a userspecified target structure s _{0} – i.e. the inherent bias of RNAdualPF is known, unlike the situation of other inverse folding algorithms. We show that RNAdualPF is extremely fast software for generating sequences that approximately fold into s _{0}; moreover, in a postprocessing step, one can filter the output of RNAdualPF to select sequences that exactly fold into s _{0}. RNAdualPF additionally allows the user to specify IUPAC codes to constrain certain nucleotide positions as well as to control the GCcontent of all generated sequences. Sampling is performed in a manner distinct but somewhat analogous to that by which Sfold [10] and RNAsubopt p [2] sample representative secondary structures from the Boltzmann ensemble of all structures of a given sequence. Using RNAdualPF, we perform a pilot study that is similar, though not identical, to that of [1, 3] for two classes of RNA: 250 C. elegans precursor microRNA from miRBase [11] and the bacterial small noncoding RNAs previously analyzed in [3].
Finally, it should be noted that, although RNAdualPF was developed entirely independently of the work of Reinharz et al. [12], one can view our Cprogram as an extension of Python program IncaRNAtion [12] to the full Turner energy model, where additionally GCcontent is rigorously handled. This point will be discussed further in the Conclusion.
Formal definitions of robustness
Let a=a _{1},…,a _{ n } denote an arbitrary RNA sequence, where \(a_{i} \in \mathcal {N} = \{\text {A,\,U,\,G,\,C}\}\), a secondary structure s of a is a set of base pairs (i,j) satisfying the following conditions: (1) If (i,j)∈s then a _{ i },a _{ j } constitute a WatsonCrick or GU wobble pair, i.e. \(ij \in \mathcal {B}\) which is the set {AU,UA,GC,CG,GU,UG}. (2) If (i,j)∈s then i+θ<j, where θ=3 (a minimum assumed for steric hindrance). (3) If (i,j)∈s and (k,ℓ)∈s, then either i<k<ℓ<j or k<i<j<ℓ or i<j<k<ℓ or k<ℓ<i<j. The collection of all secondary structures of the RNA sequence a is denoted \({\mathbb {S}\mathbb {S}}(\mathbf {a})\), and the free energy [13] of s is denoted by E(a,s), or simply by E(s) provided that the sequence a is clear from context. The Boltzmann probability p(s)=p _{ a }(s) for structure s of a is defined by exp(−E(a,s)/RT)/Z, where the partition function \(Z = Z(\mathbf {a}) = \sum _{s \in {\mathbb {S}\mathbb {S}}(\mathbf {a})} \exp (E(\mathbf {a},s)/RT)\). Given two secondary structures s,t of a, the base pair distance d _{BP}(s,t) between s and t is defined to be the size of the symmetric difference of s,t, i.e. s−t+t−s.
where 〈d _{1}(a,a’)〉 denotes the average value of d _{1}(a,a’) taken over all single point mutants a’ of a. Since d _{1}(a,a’) is not a true metric, we replace it by the metric D _{V}(a,b) in our computation of mutational robustness. Clearly both notions are closely related.
Implementation
for an RNA sequence a=a _{1},…,a _{ n }, where the sum is taken over all secondary structures \(\mathbb {S}\mathbb {S}(\mathbf {a})\) of a, E(a,s) denotes the free energy for the structure s of a with respect to the Turner energy parameters [13], R denotes the universal gas constant and T is absolute temperature. Subsequently Ding and Lawrence [16] described how to use the partition function together with a simple backtracking strategy to sample secondary structures of a from the Boltzmann ensemble of low energy structures.
where the sum is taken over all RNA sequences a=a _{1},…,a _{ n } of length n that are compatible with structure s _{0}, i.e. a _{ i },a _{ j } constitute a WatsonCrick or wobble pair for each base pair (i,j)∈s _{0}. The set of all RNA sequences that are compatible with s _{0} is denoted by \(\mathbb {AA}(s_{0})\). Note that if a sequence a is not compatible with the target structure s _{0}, then the energy E(a,s _{0}) is infinite, so the corresponding Boltzmann factor exp(−E(a,s _{0})/RT) is zero and the sum in Eq. (7) could have been written over all sequences of the same length as s _{0}. Here we describe the efficient software RNAdualPF to compute the dual partition function Z ^{∗} and to sample from the low energy ensemble of sequences that are compatible with a given secondary structure s _{0}.
Dual partition function
If s is a secondary structure on sequence a=a _{1},…,a _{ n }, then the length of s, denoted by ℓ(s), is equal to n, while the size of s, denoted by s, is the number of base pairs belonging to s. Similarly, if secondary structure s is restricted to the interval [i,j], where 1≤i≤j≤n, then the length of the restriction of s to [i,j], denoted by ℓ(s[i,j]), is equal to j−i+1, while the size of the restriction of s to [i,j], denoted by s[i,j], is the number of base pairs (X,Y) of s that satisfy i≤x<y≤j.
Given an RNA sequence a=a _{1},…,a _{ n }, the McCaskill algorithm [15] computes the partition function Z(a) defined in Eq. (6). When a is clear from context, Z(a) is usually denoted by Z.
Given a target secondary structure s _{0}, we describe below an algorithm to compute the dual partition function Z ^{∗}(s _{0}), defined as the sum of all Boltzmann factors exp(−E(a,s _{0})), where the sum is taken over all RNA sequences \(\mathbf {a} \in \mathbb {A}\mathbb {A}(s_{0})\). Unlike the McCaskill algorithm, which requires time that is cubic in the length of a, the algorithm presented below requires time that is (essentially) linear^{1} in the length of s _{0}. Our algorithm is motivated by the initialization step of the algorithm INFORNA [17], in which a sequence is determined, for which the free energy with respect to target structure s _{0} is a minimum – i.e. INFORNA determines argmin_{ a } E(a,s _{0}).
Base pair dual partition function table. Given the target structure with sequence constraints depicted in Fig. 2, RNAdualPF computes and stores all the partial dual partition function values for the substructures enclosed by each base pair
Index  i  j  Type  AU  CG  GC  UA  GU  UG  Z ^{∗}(i,j) 

1  18  23  Tetraloop  0.000  0.000  0.364  0.000  0.000  0.000  0.364 
2  17  24  Stack  10.977  17.859  76.923  10.977  10.977  3.525  131.238 
3  16  26  R. bulge  11.690  70.834  184.603  12.771  13.347  3.915  297.160 
4  6  10  Triloop  0.004  0.010  0.010  0.004  0.004  0.004  0.038 
5  5  11  Stack  0.750  3.022  5.234  0.899  0.960  0.256  11.120 
6  3  13  Int. loop  109.842  256.875  424.976  108.653  117.851  108.132  1126.330 
7  2  14  Stack  10853.104  86208.448  170643.321  12575.544  13285.398  3647.077  297212.891 
8  1  27  Multiloop  1558.575  7895.583  7895.583  1558.575  1558.575  1558.575  22025.464 
9  1  28  S _{0}  –  –  –  –  –  –  88101.856 
From this ordering, we assign a base pair index to each base pair (i,j), which is defined to be the rank of (i,j) in the total ordering.
The following definitions correspond to the Turner nearest neighbor energy model [13], which is an additive loop model where a loop closed by external base pair (i,j) is designated as a kloop, if the loop contains k base pairs interior to (i,j). Therefore, hairpin loops are 0loops; base pair stacks, bulge loops and internal loops are 1loops; and multiloops are kloops for k≥2 (also called (k+1)way junctions), where the additional count is due to the outer component adjacent to (i,j) [18].
This AUpenalty is applied only if (i,j) is a base pair adjacent to a triloop, a bulge, an internal loop or a multiloop, or if it is the outermost base pair of an external loop in target structure s _{0}, and (i,j) is instantiated by one of the pairs AU, UA, GU, UG. When basepaired positions i,j are clear from the context, we write e _{ AU }(X,Y).
Here, we assume that in parsing the input target structure, a list BPcloseELorML has been created of those base pairs (i,j), which close either an external loop or a multiloop. Let I be the indicator function, it follows that if (i,j) closes an external loop or multiloop, then \(\exp \left (\frac {I[(i,j) \in BPcloseELorML] \cdot e_{AU}(X,Y)}{RT} \right)\) is the Boltzmann factor for a special AUpenalty, otherwise this factor equals 1. For clarity in the notation, this factor is denoted by \(e^{(\frac {e^{I}_{AU}(X,Y)}{RT})}\). Note that this term is distinct from the factor \(\exp (\frac {e_{AU}(X,Y)}{RT})\) applied to base pairs adjacent to a triloop, a bulge or an internal loop, which does not depend on the indicator function.
Hairpins
where hairpinE(j−i−1) designates the hairpin free energy obtained from table lookup, when j−i−1≤30.
Triloop
Tetraloop
Hairpin size exceeds four and is different than six
Stacked base pairs, bulges and internal loops
Here, we consider the case of a 1loop, which comprises the case of stacked base pairs, bulges and internal loops. The following cases correspond to each possibility.
Stacked base pair
Bulge loop
Internal loop
External loop
Despite the fact that, by following the total order on base pairs defined in Eq. (10), the dual partition function of multiloops is always computed before the dual partition function of the external loop, the computation of the dual partition function of multiloops will be easier to understand if the dual partition function of the external loop is defined in advance.
In order to improve speed, some implementations of RNA thermodynamicsbased algorithms ignore the contribution of dangling positions, which corresponds to Vienna RNA Package d0 flag. RNAdualPF also includes this option, which dramatically increases the speed of the algorithm. The reason behind this difference of performance is clear from the following definitions.
where \(\ell = n\sum \limits _{r = 1,\ldots,k}{(j_{r}i_{r}+1)}\) and n is the length of the target structure s _{0}.
Depending on the target structure s _{0}, it can happen that the second sum of Eq. (24) must be restricted to range over strictly less than 4^{2k } many RNA sequences. This is explained as follows. If i _{1}=1 [resp. j _{ r }=n] then there is no position for a 5^{′} [resp. 3^{′}] dangle, and hence the nucleotide sequences considered in the second summation would have length strictly less than 2k. Moreover, certain 5^{′} dangled positions could be identical to 3^{′} dangle positions, which arises for instance when j _{ k }+2=i _{ k+1}; alternatively, certain dangled positions could be identical with basepaired positions, which arises for instance when j _{ k }+1=i _{ k+1}. In such situations, instantiations of the 3^{′}dangle on (i _{ k },j _{ k }) and the 5^{′}dangle on (i _{ k+1},j _{ k+1}) are not independent, thus leading to a restriction of the range of the second summation in Eq. (24). A similar restriction is implicitly assumed in the treatment of external loops in this section and of multiloops in the next section.
The algorithm performance can be improved by dividing the external loop into groups of components having interdependently constrained dangling positions, as just explained. Define two base pairs (X,Y),(x ^{′},y ^{′}) as adjacent if x<y<x ^{′}<y ^{′} and x ^{′}−y≤2 – i.e. dangling positions of the base pairs (X,Y),(x ^{′},y ^{′}) are constrained. Let G denote a maximal collection of adjacent base pairs belonging to H=[(i _{1},j _{1}),…,(i _{ k },j _{ k })], together with their associated dangle positions in D=[i _{1}−1,j _{1}+1,…,i _{ k }−1,j _{ k }+1]. It is important to note that H∪D is thus partitioned into a collection of g disjoint groups \(\mathcal {G} = [G_{1},\ldots,G_{g}]\). Therefore, we can divide an external loop of k helices into a collection groups \(\mathcal {G}\) of size g≤k, and p unpaired positions that are external to every base pair of s _{0} and not adjacent to any base pair.
where the range of the second summation can be constrained by the overlap among positions in D(G) and between positions in D(G) and H(G), as explained for Eq. (24).
Multiloop
The notation we use to define the dual partition function of multiloops with dangling positions is similar to that described for external loops. However, some modifications are required in the previously given definitions, since we have to take into account the flanking positions of the base pair (i,j) closing the multiloop. Let H=[(i _{1},j _{1}),…,(i _{ k },j _{ k }),(i,j)] be the collection of k base pairs closing one of the k components of the multiloop, and the base pair (i,j) closing the multiloop, and define the multiset D=[a _{1},b _{1},…,a _{ k+1},b _{ k+1}]⊆[i _{1}−1,j _{1}+1,⋯,i _{ k }−1,j _{ k }+1,i+1,j−1] of nucleotide positions adjacent to the base pairs in H. Due to the possible overlap with the base pair closing the multiloop and its flanking positions, there are additional constraints in the ordered multiset [a _{1},b _{1},…,a _{ k+1},b _{ k+1}], so that (for instance) if a _{1}=i _{1}−1, and i _{1}=i+1, then a _{1}=a _{ k+1} and any nucleotide value that is assigned to a _{1} must simultaneously be assigned to a _{ k+1}. Moreover, there can also be an overlap between the list of base paired positions [i _{1},j _{1},…,i _{ k },j _{ k },i,j] and the multiset [a _{1},b _{1},…,a _{ k+1},b _{ k+1}]. If (for instance) i=i _{1}−1, then a _{ k+1}=i _{1} and a _{1}=i.
As explained for Eq. (24), it can happen that the second summation must be restricted to range over strictly less than 4^{2k } many RNA sequences.
A decomposition similar that for external loops can be performed to improve the performance in the computation of the dual partition function of a multiloop. In a multiloop, in addition to the adjacency definition given for external loops, we consider the base pair (i,j) that closes the multiloop as adjacent to a base pair (X,Y) that closes a component of the multiloop, where i<x<y<j, if either x≤i+2 or y≥j−2. Then, let G denote a maximal collection of adjacent base pairs belonging to H=[(i _{1},j _{1}),…,(i _{ k },j _{ k }),(i,j)], together with their associated dangle positions in D=[i _{1}−1,j _{1}+1,…,i _{ k }−1,j _{ k }+1,i+1,j−1]. This decomposition produces a collection \(\mathcal {G}\) of g disjoint groups G _{1},…,G _{ g }, one of which, designated the closing group G _{ c } contains the closing base pair (i,j) of the multiloop, and g−1 of which, designated as nonclosing groups G _{ nc }, do not contain the base pair (i,j).
Nonclosing groups have the same composition as those defined for external loops – i.e. a collection of h base pairs H(G _{ nc })=[(κ _{1},λ _{1}),…,(κ _{ h },λ _{ h })] and a set of dangling positions D(G _{ nc })=[α _{1},β _{1},…,α _{ h },β _{ h }]⊆[κ _{1}−1,λ _{1}+1,⋯,κ _{ h }−1,λ _{ h }+1]. Therefore, we can compute the dual partition function Z(G _{ gc }) of a nonclosing group as described in Eq. (25). In addition, the collection of nonclosing groups of size g−1 of a multiloop of k components is denoted by \(\mathcal {G}_{nc}\), where 0≤(g−1)≤k.
Therefore, a multiloop of k components and ℓ unpaired positions can be decomposed into one closing group G _{ c }, a collection of nonclosing groups \(\mathcal {G}_{nc}\), and p unpaired positions that are not adjacent to any base pair, with 0≤p≤ℓ.
In the same way as in Eq. (24), the values of the second summation are constrained to the possible choices among overlapping positions.
Sampling
Due to the Turner energy model, it is necessary to determine nucleotide positions whose instantiation influences the energy (hence Boltzmann probability) of other positions, and subsequently all mutually dependent positions must be instantiated simultaneously. Figure 1 illustrates the mutual dependencies that must be considered when sampling different types of elements, where the base pair (i,j) to be sampled is highlighted in red, positions whose sampling probability is dependent on the instantiation of (i,j) are highlighted in blue, and positions that are mutually dependent, but independent of the instantiation of (i,j), are highlighted in green.
Since the dynamic programming algorithm for the dual partition function proceeds from inner to outer base pairs, using the total ordering ≺ in Eq. (10), the sampling order of base pairs proceeds from outer to inner positions, i.e. from largest base pair index to smallest. In order to account for mutual dependencies in the sampling step, we define the function sample(k,T,i,j,X,Y) for each base pair (i,j) in S _{0}, where k indicates the base pair index defined from Eq. (10), T indicates the type of structural element closed by base pair (i,j) in the target RNA secondary structure, as shown in Table 1, and X,Y are the instantiated nucleotides at positions (i,j). Due to the mutual dependencies, sampling a base pair with base pair indexk closing an mloop, for m>0, forces the instantiation of all inner closing base pairs of the mloop, and the base pair index of each such inner base pair is strictly less than k. For this reason, except in the case of external loops, the outermost base pair (i,j) has been always instantiated before sample(k,T,i,j,X,Y) is called, and therefore the instantiation X,Y is given as a parameter of the sampling function.
The Boltzmann probability of each possible instantiation of mutually dependent positions can be computed on the fly in the backward step. However, in order to improve the speed of the algorithm, in the forward step RNAdualPF stores (for each base pair) the conditional dual partition function values of instantiations of interdependent positions. These tables are used by the sampling function, since each value corresponds to the dual partition function conditional on a specific instantiation of the positions to be sampled by sample(k,T,i,j,X,Y). Since the sampling procedure depends on the type T of element, we describe the function sample(k,T,i,j,X,Y) for each type of element – hairpin, stacked base pair, internal loop (which also comprises left and right bulge), multiloop and external loop as depicted in Fig. 1. For each of these cases, the values are stored in the conditional dual partition function table associated with the closing base pair (i,j).
Hairpins
Then, remaining unpaired positions are uniformly sampled, since the nucleotide choice does not change the final free energy. Triloops, tetraloops and hexaloops are exceptions to this rule, since there are special loops that contribute to or penalize the free energy. In those cases, we have to account for the special loops, as defined in “Hairpins” section.
Although it could seem to be a waste of space to store a different conditional dual partition function table for each base pair (i,j), even for two different hairpins of the same size in the target structure, one should note that RNAdualPF allows sequence constraints, and thus Z ^{∗}(i,j) could possibly differ from Z ^{∗}(i ^{′},j ^{′}) when (i,j) and (i ^{′},j ^{′}) close hairpins of the same size.
Stacking base pairs
Internal loops
For internal loops of sizes (1×1, 1×2, 2×1, 1×N and N×1) similar conditional dual partition function tables are computed following the definitions in “Stacked base pairs, bulges and internal loops” section.
Other internal loops: When there are no shared flanking positions between the two base pairs that close an internal loop, as depicted in Fig. 1 d, the energy contribution of innermost base pair and its respective flanking positions is independent of those of the outermost base pair.
Finally, since the remaining unpaired position does not contribute to the free energy, it is uniformly sampled.
Multiloops and external loops
As explained in “External loop” section, if dangling positions are not included in the computation, sampling an external base pair or the closing base pair (i,j) of a multiloop from Z ^{∗}(i,j) is trivial. On the other hand, by including dangling positions in the sampling, there is a dramatic increase in the space complexity of RNAdualPF, albeit the space used is only a constant factor larger. However, the decompositions into groups described in “External loop” and “External loop” sections allow to sample the positions of each group independently.
The example shown in Fig. 1 e depicts a multiloop with two groups: a nonclosing group G _{ nc } highlighted in green, and a closing group G _{ c } highlighted in red and blue, where the closing base pair of the multiloop (i,j) is marked in red.
Recall that the base pairs in H(G _{ nc }) are adjacent. Therefore, due the constraints given by the overlapping positions within D(G _{ nc }), and between D(G _{ nc }) and H(G _{ nc }), explained in “External loop” section, the number of possible instantiations \(\mathcal {U},\mathcal {W}\) of H(G _{ nc }),D(G _{ nc }) is ≤(6^{ h }·4^{ h+1}).
As a final remark, we would like to recall that all the conditional dual partition function values are computed and stored in the forward step at the same time as the dual partition function. Therefore, despite the consequent increase of space complexity in the algorithm, the computation of the values required for correct sampling does not involve a greater time complexity.
Scaling
Controlling GCcontent
where s _{0}[i,j] denotes the restriction of target structure s _{0} to the interval [i,j]. We describe two particular subcases, to provide the idea of how modifications need to be undertaken.
Triloop
Multiloop and external loop
Since the modification required in the remaining cases follows similar reasoning as in the treatment of the hairpin and external loop just described, the details for these remaining cases are not given.
An additional challenge of computing the dual partition function with GCcontent control is the combinatorial problem of efficiently counting the number N of instantiations of the external loop, consisting of all positions external to every base pair, with GCcontent k, where the user can stipulate that certain positions are constrained to contain nucleotides consistent with IUPAC codes. To this end, we implemented the combinatorial algorithm defined in Supplementary Information.
Sampling with GCcontent
The implementation of sampling with GCcontent is performed in a similar manner as described in “Sampling” section, with some notable differences.
First, the sampling function is redefined by sample(k,T,i,j,X,Y,α), where k indicates the base pair index in the ordering defined by Eq. (10) for the base pair (i,j) that is already instantiated by nucleotide pair XY, and T designates the type of structural element closed by base pair (i,j) in the target RNA secondary structure, as shown in Table 1. The function sample(k,T,i,j,X,Y,α) instantiates all positions of the loop having outer closing base pair (i,j), including its inner closing base pair(s) and which returns the GCcontent of the sampled loop. Moreover, the GCcontent of the subsequence a[i+1,j−1]=(a _{ i+1},…,a _{ j−1}) will be α once the entire sequence a _{1},…,a _{ n } is sampled.
Second, RNAdualPF stores a conditional dual partition function table for each base pair (i,j) and GCcontent 0 to ji1. The function sample(k,T,i,j,X,Y,α) samples from the conditional dual partition function of those sequences which have exactly α Gs and Cs strictly between the positions i and j, thus guaranteeing a GCcontent of α for the subsequence a[i+1,j−1] once the entire sequence a _{1},…,a _{ n } is sampled. Note that sample(k,T,i,j,X,Y,α) samples only the loop closed by the already instantiated outer base pair (i,j), and that α is the GCcontent of the entire subsequence a[i+1,j+1]=a _{ i+1},…,a _{ j−1} once the algorithm terminates. Only in the case that base pair (i,j) closes a hairpin loop will it generally happen that the GCcontent of the loop closed by (i,j) is equal to α.
Let α be the userdesignated GCcontent of sequences a=a _{1},…,a _{ n } to be sampled from a target secondary structure having ℓ base pairs. The following pseudocode describes how to sample sequences a=a _{1},…,a _{ n }, whose GCcontent exactly equals α. Here, an external loop with m components means that there are m exterior base pairs (i _{1},j _{1}),…,(i _{ m },j _{ m }) such that all positions exterior to these base pairs are unpaired; i.e. each position \(r \in \{ 1,\ldots,n\}  \cup _{c=1}^{m} \{ i_{c},\ldots,j_{c} \}\) is unpaired in the target structure.
To clarify how the GCcontent is sampled in a statistically rigorous manner, suppose that the user has specified the GCcontent to be α, and that L is the external loop of the target structure s _{0} having m components, where the cth component has external closing base pair (i _{ c },j _{ c }). In computing the dual partition function, for all possible choices of nonnegative integers α _{1},…,α _{ m },β that sum to α and all 6^{ m } possible assignments of WatsonCrick or wobble nucleotide pairs X _{1},Y _{1},…,X _{ m },Y _{ m } to the base pairs (i _{1},j _{1}),…,(i _{ m },j _{ m }), the software RNAdualPF has computed the sum of \(\sum _{c=1}^{m} Z^{*}_{c}(i_{c},j_{c};X_{c},Y_{c};\alpha _{c})\) plus the Boltzmann factor of the external loop with GCcontent β. Since the dual partition function Z ^{∗}(s _{0};α) is the sum, taken over all values of α _{1},…,α _{ m },β and all WatsonCrick and wobble pair assignments to the external base pairs, RNAdualPF can then use the roulette wheel method to sample values α _{1},…,α _{ m },β and X _{1},Y _{1},…,X _{ m },Y _{ m } in a statistically rigorous manner. Multiloops, and other structural elements, which contain unpaired regions whose sequence does not contribute to the free energy of the structure, are handled in a analogous manner.
Results
Robustness and plasticity of C. elegans miRNAs and E. coli sncRNAs
In [1] Borenstein and Ruppin used version 1.4 of the Vienna RNA Package [7] to generate 1000 RNA sequences per wild type precursor microRNA (premiRNA) extracted from the database Rfam 1.0 [19], with the property that each of the 1000 control sequences folded into the wild type premiRNA structure – i.e. the minimum free energy (MFE) structure of each of the 1000 control sequences was identical to the MFE structure of the wild type premiRNA. Based on these computational experiments, Borenstein and Ruppin asserted that the “structure of miRNA precursor stem–loops exhibits a significantly high level of mutational robustness in comparison with random RNA sequences with similar stem–loop structures”. Noting that the Vienna RNA Package inverse folding program RNAinverse does not control for GCcontent or other sequence compositional bias, the authors performed a second computational experiment, in which control sequences not only folded into the target wild type structure, but also had similar dinucleotide composition to that of wild type premiRNA (JensenShannon divergence less than 0.01). Since the filtering step required enormous run time and computational resources, the authors restricted their attention to a small set of 211 microRNAs, and generated only 100 control sequences per microRNA – note here that RNAinverse cannot control for GCcontent. Borenstein and Ruppin concluded that robustness of precursor microRNAs was not the byproduct of a base composition bias or of thermodynamic stability.
Subsequently Rodrigo et al. [3] undertook a similar analysis for bacterial small noncoding RNAs (sncRNA), also using the program RNAinverse, albeit using somewhat different definitions – see precise definitions in “Formal definitions of robustness” section. The main finding of [3] was that bacterial sncRNAs are not significantly robust when compared with 1000 sequences having the same structure, as computed by RNAinverse; however, the authors found that bacterial sncRNAs tend to be significantly plastic, in the sense that the ensemble of low energy structures are structurally diverse. Unlike the case of precursor microRNAs [1], Rodrigo et al. did not control for sequence compositional bias.
Using RNAdualPF, we performed similar computational experiments on 250 precursor microRNAs of C. elegans from miRBase 20 [11] and for the bacterial small noncoding RNAs of [3]. Below, we discuss each case separately.
Analysis of C. elegans precursor microRNA from the database miRBase 20 [11]
MEASURE  Def. Exact GC  Def. No GC  MFE Exact GC  MFE No GC  WT 

BP DIST TARGET  0.06 ±0.03  0.09 ±0.04  0 ±0  0 ±0  0 ±0 
ENERGY MFE  –0.53 ±0.11  –0.85 ±0.12  –0.48 ±0.12  –0.79 ±0.14  –0.38 ±0.11 
ENERGY TARGET  –0.46 ±0.12  –0.78 ±0.14  –0.48 ±0.12  –0.79 ±0.14  –0.38 ±0.11 
ENSEMBLE DEFECT  0.12 ±0.04  0.14 ±0.06  0.05 ±0.02  0.05 ±0.02  0.08 ±0.05 
EXP BP DIST  0.07 ±0.03  0.1 ±0.04  0.03 ±0.01  0.03 ±0.01  0.05 ±0.03 
PROP NAT CONTACT  0.93 ±0.04  0.9 ±0.06  0.96 ±0.02  0.96 ±0.02  0.92 ±0.05 
POS ENTROPY  0.14 ±0.05  0.14 ±0.05  0.13 ±0.05  0.12 ±0.05  0.2 ±0.11 
GC CONTENT  42.88 ±9.14  82.07 ±3.5  42.9 ±9.15  80.92 ±4.09  42.88 ±9.14 
LN DUAL PROB  –95.4 ±21.03  –51.43 ±11.98  –102.73 ±22.81  –59.52 ±14.64  –117.11 ±25.94 
LN PROB  10.81 ±4.05  –10.95 ±4.68  –1.38 ±0.55  –0.96 ±0.45  –2.02 ±0.97 
MH STR DIV  0.08 ±0.03  0.08 ±0.03  0.07 ±0.03  0.06 ±0.03  0.11 ±0.06 
VIENNA STR DIV  0.05 ±0.02  0.05 ±0.02  0.04 ±0.02  0.04 ±0.02  0.07 ±0.04 
Analysis of bacterial RNAs [3]
MEASURE  GC 5 %  Exact GC  No GC  WT 

BP DIST TARGET  0.08 ±0.04  0.08 ±0.04  0.14 ±0.06  0 ±0 
ENERGY MFE  –0.44 ±0.1  –0.44 ±0.1  –0.63 ±0.14  –0.29 ±0.1 
ENERGY TARGET  –0.39 ±0.12  –0.39 ±0.12  –0.54 ±0.16  –0.29 ±0.1 
ENSEMBLE DEFECT  0.16 ±0.08  0.16 ±0.08  0.23 ±0.1  0.14 ±0.09 
EXP BP DIST  0.09 ±0.04  0.09 ±0.04  0.15 ±0.06  0.09 ±0.06 
PROP NAT CONTACT  0.89 ±0.09  0.89 ±0.09  0.8 ±0.12  0.83 ±0.15 
POS ENTROPY  0.19 ±0.08  0.19 ±0.08  0.23 ±0.09  0.35 ±0.18 
GC CONTENT  48.33 ±7.02  48.34 ±7.02  74.75 ±5.55  48.34 ±7.02 
LN DUAL PROB  –94.59 ±27.22  –94.54 ±27.19  –66.37 ±16.96  –117.22 ±33.37 
LN PROB  –10.12 ±4.04  –10.09 ±4.03  –13.71 ±5.16  –2.34 ±0.95 
MH STR DIV  0.1 ±0.04  0.1 ±0.04  0.13 ±0.05  0.18 ±0.09 
VIENNA STR DIV  0.06 ±0.03  0.06 ±0.03  0.09 ±0.03  0.11 ±0.06 
Analysis of the Rfam 12.0 database
MEASURE  GC 5 %  Exact GC  No GC  WT 

BP DIST TARGET  0.1 ±0.05  0.1 ±0.05  0.16 ±0.07  0 ±0 
ENERGY MFE  –0.43 ±0.13  –0.43 ±0.13  –0.64 ±0.16  –0.28 ±0.13 
ENERGY TARGET  –0.36 ±0.14  –0.36 ±0.14  –0.54 ±0.19  –0.28 ±0.13 
ENSEMBLE DEFECT  0.18 ±0.07  0.18 ±0.07  0.25 ±0.11  0.16 ±0.12 
EXP BP DIST  0.11 ±0.05  0.11 ±0.05  0.17 ±0.07  0.1 ±0.08 
PROP NAT CONTACT  0.87 ±0.09  0.87 ±0.09  0.78 ±0.14  0.81 ±0.17 
POS ENTROPY  0.22 ±0.09  0.21 ±0.09  0.25 ±0.1  0.4 ±0.25 
GC CONTENT  46.27 ±10.91  46.27 ±10.91  75.12 ±5.89  46.27 ±10.91 
LN DUAL PROB  –110.35 ±54.12  –110.34 ±54.12  –73.5 ±33.32  –136.7 ±65.62 
LN PROB  –13.94 ±7.74  –13.9 ±7.71  –18.11 ±10.59  –2.83 ±1.71 
MH STR DIV  0.12 ±0.05  0.12 ±0.05  0.13 ±0.05  0.2 ±0.12 
VIENNA STR DIV  0.07 ±0.03  0.07 ±0.03  0.09 ±0.04  0.13 ±0.08 
Turning now to the analysis of bacterial small noncoding RNAs, we find that sncRNAs are not significantly robust (Zscore 0.0±1.4, pvalue 0.98) when GCcontent is not controlled, confirming a result from Rodrigo et al. [3]. However, when GCcontent of the sequences sampled from RNAdualPF is required to be identical to that of wild type sncRNA, bacterial sncRNAs are see to be significantly nonrobust (Zscore −2.58±3.87, pvalue of 4.4×10^{−7}). Note here that Rodrigo et al. used RNAinverse in their computational experiments, hence could not consider the case with control over GCcontent. The left panels of Figs. 3 a and 4 summarize our findings that precursor microRNAs [resp. bacterial sncRNAs] are significantly nonrobust [resp. not significantly robust] with respect to a control set of 2000 [resp. 1000] sequences generated by RNAdualPF with identical GCcontent to that of the wild type sequence.
Finally, in our analysis of plasticity, the right panels of Figs. 3 and 4 show that both C. elegans and bacterial small noncoding RNAs exhibit more plasticity when compared with control sequences for which GCcontent is not controlled, as well as when compared with control sequences for which GCcontent is identical to that of wild type sequences.
Structural RNA has higher free energy than expected
In Figure 4 of [9], we showed that the free energy E _{0} of the minimum free energy (MFE) structure s _{0} of E. coli valtRNA (accession RV1600 from Sprinzl database [25] tdbR00000454 from tRNAdb [26]), is much higher (less favorable) than the average free energy 〈E〉 of over four million RNAs having the same MFE structure s _{0} as that of E. coli valtRNA. Here, E. coli valtRNA RV1600 was selected, because its MFE structure s _{0} is identical to the Rfam consensus structure for tRNA family RF00005. This preliminary result suggests that naturally occurring transfer RNAs may be under selective pressure to be only marginally thermodynamically stable. Since it took a number of days for RNAiFold [4, 9] to return over four million solutions of the inverse folding problem for the tRNA target structure, we now describe how RNAdualPF can be used to compute the Boltzmann expected free energy of literally all sequences a _{1},…,a _{ n } with respect to an arbitrary target structure s _{0}. In this manner, we confirm our preliminary finding concerning E. coli valtRNA, and show that the folding energy of structural RNA from the Rfam database is much higher (less favorable) than expected. Before presenting results, we need some definitions.
where the sum is taken over all RNA sequences a=a _{1},…,a _{ n } of length n. Note that T _{0} indicates the (table) temperature at which the energy of a structure s _{0} and nucleotide sequence a is evaluated using the Turner parameters, while all other occurrences of the temperature variable are designated by T, which we call formal temperature. The distinction between formal and table temperature is made to allow us to use finite difference approximations to derivatives with respect to the formal temperature when when we compute dual expected energy and dual conformational entropy below (see [27] for more explanation). When table temperature T _{0} equals formal temperature T, and the temperature is clear from the context, we write Z ^{∗}(s _{0}); if the target structure s _{0} is also clear from the context, then we write Z ^{∗}. A similar remark applies to the other thermodynamic functions p ^{∗},G ^{∗},〈E ^{∗}〉,S ^{∗}, which we now define.
Programs to compute dual expected energy 〈E ^{∗}〉, dual conformational entropy S ^{∗}, and dual heat capacity \(C^{*}_{p}\) are provided at our web site. We do not elaborate further on dual entropy or dual heat capacity, since at the present time we have found no compelling applications.
In this fashion, we can exactly compute the dual expected energy 〈E ^{∗}(s _{0},k)〉 of all sequences having GCcontent k which approximately fold into target structure s _{0}. Tables 2, 3 and 4 analyze what we mean by approximately folding into the target structure – i.e. sequences a are preferentially sampled when free energy E(a,s _{0}) is low, hence have large dual Boltzmann probability. RNAdualPF, even when exact GCcontent is controlled, is faster than inverse folding programs by orders of magnitude, hence providing an effective alternative manner of solving inverse folding.
Conclusion
In this paper we describe the algorithm and software RNAdualPF, which computes the dual partition function Z ^{∗}, defined as the sum of Boltzmann factors exp(−E(a,s _{0})/RT) of all sequences a with respect to the target structure s _{0}. Using RNAdualPF, we efficiently sample RNA sequences that (approximately) fold into s _{0}, where additionally the user can specify IUPAC sequence constraints at certain positions, and whether to include dangles (energy terms for stacked, singlestranded nucleotides). Moreover, the user can require that all sampled sequences have a precisely specified GCcontent, since, optionally, we compute the dual partition function Z ^{∗}(k) simultaneously for all values k=G+C. This sampling strategy is complementary to the use of RNAiFold [4], since it allows the study of the properties of long RNA structures whose number of solutions for the inverse folding problem is astronomically large.
We use RNAdualPF to corroborate previous studies [1] using RNAinverse [2], by confirming that precursor microRNAs are significantly mutationally robust when GCcontent is not controlled. However, in contrast to [1], we find that precursor microRNAs are significantly nonrobust when GCcontent is controlled. We confirm and extend previous findings [3] that bacterial small noncoding RNAs display plasticity (structural diversity) and are not statistically robust, when GCcontent is not controlled. Additionally, we obtain the new finding that when when GCcontent is controlled, bacterial small noncoding RNAs are significantly nonrobust, as in the case of precursor microRNAs. One possible reason for the discrepancy between our results and those of [1] could be related with the fact that the energy parameters of Vienna RNA Package 1.4 (Turner 1999 parameters used in the computational experiments of [1]) differ from those of Vienna RNA Package 2.1.9 (Turner 2004 parameters used in the current study with RNAdualPF). Another possible reason is that the inverse folding solutions returned by the program RNAinverse used in [1] show a different bias than sequences returned by RNAdualPF (in this context, we mean the inverse folding solutions filtered from the sequences returned by RNAdualPF).
As mentioned in the Introduction, there is a relation between our C program RNAdualPF and the Python program IncaRNAtion [12], although our work is independent of that of Reinharz et al. [12]. IncaRNAtion is a weighted sampling algorithm that computes the dual partition function for a simple energy model, which only considers base stacking free energies – unlike RNAdualPF, the program IncaRNAtion includes no energy contributions for hairpins, bulges, internal loops, multiloops, dangles, or mismatches. If the user specifies a desired GCcontent α, then IncaRNAtion does not compute the dual partition function for GCcontent, but rather applies an adjustable heuristic so that after a suitable burnin period, sequences tend to approximately have GCcontent α. See Table 6.2 of [28] for benchmarking results on RNAdualPF and IncaRNAtion, which show conclusively that RNAdualPF is not only faster, but its sequences have a higher probability of folding into the target structure, its sequences have a smaller GCcontent in default mode, where GCcontent is not controlled, etc.
Additionally, we have shown that natural RNAs from the Rfam 12.0 database have higher minimum free energy than expected, thus supporting our results in [9] which suggest that functional RNAs are under evolutionary pressure to be only marginally thermodynamically stable. The applications described in this paper demonstrate that RNAdualPF is a useful and extremely fast software tool for evolutionary and synthetic biology.
Endnote
^{1} When dangling positions are not included in the computation (d0), the algorithm clearly requires linear time. When dangling positions are included (d2), run time is exponential in the number of components of the largest multilooop; however, in practice the algorithm is extremely fast, and it is possible to modify the algorithm to always run in linear time.
Abbreviations
 MFE:

Minimum free energy
 PLMVd:

Peach latent mosaic viroid
 sncRNA:

Small noncoding RNA
Declarations
Acknowledgements
We would like to thank the anonymous reviewers for helpful comments.
Funding
Funding for this research was provided by National Science Foundation grant DBI1262439. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Availability of data and materials
Source code for OSX and Linux of the C/C++software RNAdualPF is available at http://bioinformatics.bc.edu/clotelab/RNAdualPF under GPL3 licence.
Authors’ contributions
Project design PC. Algorithm design PC, ID, JAGM. Implementation JAGM. Software testing and benchmarking JAGM using a program to test structural diversity implemented by AHB. Manuscript preparation PC, JAGM. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Borenstein E, Ruppin E. Direct evolution of genetic robustness in microRNA. Proc Natl Acad Sci. 2006; 103(17):6593–598. doi:10.1073/pnas.0510600103.View ArticlePubMedPubMed CentralGoogle Scholar
 Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011; 6:26. doi:10.1186/17487188626.View ArticlePubMedPubMed CentralGoogle Scholar
 Rodrigo G, Fares MA. Describing the structural robustness landscape of bacterial small RNAs. BMC Evol Biol. 2012; 12(1):1–12. doi:10.1186/147121481252 21481252View ArticleGoogle Scholar
 GarciaMartin JA, Dotu I, Clote P. RNAiFold 2.0: a web server and software to design custom and rfambased RNA molecules. Nucleic Acids Res. 2015; 43(W1):513–21. doi:10.1093/nar/gkv460.View ArticleGoogle Scholar
 Los Alamos HIV database. 2015. http://www.hiv.lanl.gov/. Accessed 30 Dec 2015.
 Krol J, Sobczak K, Wilczynska U, Drath M, Jasinska A, Kaczynska D, Krzyzosiak WJ. Structural features of microRNA (miRNA) precursors and their relevance to mirna biogenesis and small interfering RNA/short hairpin RNA design. J Biol Chem. 2004; 279(40):42230–2239. doi:10.1074/jbc.M404931200.View ArticlePubMedGoogle Scholar
 Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. Fast folding and comparison of RNA secondary structures. Monatsch Chem. 1994; 125:167–88.View ArticleGoogle Scholar
 Zadeh JN, Wolfe BR, Pierce NA. Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem. 2011; 32(3):439–52.View ArticlePubMedGoogle Scholar
 GarciaMartin JA, Clote P, Dotu I. RNAiFold: a constraint programming algorithm for RNA inverse folding and molecular design. J Bioinform Comput Biol. 2013; 11(2):1350001. doi:10.1142/S0219720013500017.View ArticlePubMedGoogle Scholar
 Ding Y, Chan CY, Lawrence CE. Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 2004; 32:0.View ArticleGoogle Scholar
 Kozomara A, GriffithsJones S. mirbase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014; 42(Database):68–73. doi:24275495.View ArticleGoogle Scholar
 Reinharz V, Ponty Y, Waldispuhl J. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics. 2013; 29(13):308–15.View ArticleGoogle Scholar
 Turner DH, Mathews DH. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010; 38(Database):280–2. doi:10.1093/nar/gkp892.View ArticleGoogle Scholar
 Gruber AR, Bernhart SH, Hofacker IL, Washietl S. Strategies for measuring evolutionary conservation of RNA secondary structures. BMC Bioinforma. 2008; 9(1):1–19. doi:10.1186/147121059122.View ArticleGoogle Scholar
 McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 1990; 29:1105–1119. doi:10.1002/bip.360290621.View ArticlePubMedGoogle Scholar
 Ding Y, Lawrence CE. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 2003; 31:7280–301.View ArticlePubMedPubMed CentralGoogle Scholar
 Busch A, Backofen R. INFORNA, a fast approach to inverse RNA folding. Bioinformatics. 2006; 22(15):1823–31. doi:10.1093/bioinformatics/btl194.View ArticlePubMedGoogle Scholar
 Zuker M, Mathews DH, Turner DH. Algorithms and thermodynamics for RNA secondary structure prediction: A practical guide In: Barciszewski J, Clark BFC, editors. RNA Biochemistry and Biotechnology. NATO ASI Series. Dordrecht: Kluwer Academic Publishers: 1999. p. 11–43.Google Scholar
 GriffithsJones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003; 31(1):439–41.View ArticlePubMedPubMed CentralGoogle Scholar
 Dirks RM, Lin M, Winfree E, Pierce NA. Paradigms for computational nucleic acid design. Nucleic Acids Res. 2004; 32(4):1392–1403. doi:10.1093/nar/gkh291.View ArticlePubMedPubMed CentralGoogle Scholar
 Pei S, Anthony JS, Meyer MM. Sampled ensemble neutrality as a feature to classify potential structured RNAs. BMC Genomics. 2015; 16(1):1–12. doi:10.1186/s1286401412038.View ArticleGoogle Scholar
 Huynen M, Gutell R, Konings D. Assessing the reliability of RNA folding using statistical mechanics. J Mol Biol. 1997; 267(5):1104–12. doi:10.1006/jmbi.1997.0889.View ArticlePubMedGoogle Scholar
 Morgan SR, Higgs PG. Barrier heights between ground states in a model of RNA secondary structure. J Phys A: Math Gen. 1998; 31:3153–170. doi:10.1088/03054470/31/14/005.View ArticleGoogle Scholar
 Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, Finn RD. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015; 43(D1):130–7. doi:10.1093/nar/gku1063.View ArticleGoogle Scholar
 Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1998; 26(1):148–53. doi:10.1093/nar/26.1.148.View ArticlePubMedPubMed CentralGoogle Scholar
 Juhling F, Morl M, Hartmann RK, Sprinzl M, Stadler PF, Putz J. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 2009; 37(Database):159–62.View ArticleGoogle Scholar
 GarciaMartin JA, Clote P. RNA Thermodynamic Structural Entropy. PLoS ONE. 2015; 10(11):0137859. doi:10.1371/journal.pone.0137859.View ArticleGoogle Scholar
 GarciaMartin JA. RNA inverse folding and synthetic design. Ph.D. dissertation in Biology, Boston College. 2016. Dissertation made available on June 28, 2016 and will remain accessible indefinitely: http://hdl.handle.net/2345/bcir:106989.