- Methodology Article
- Open Access
Multi-objective optimization for RNA design with multiple target secondary structures
- Akito Taneda^{1}Email author
- Received: 23 November 2014
- Accepted: 17 August 2015
- Published: 3 September 2015
Abstract
Background
RNAs are attractive molecules as the biological parts for synthetic biology. In particular, the ability of conformational changes, which can be encoded in designer RNAs, enables us to create multistable molecular switches that function in biological circuits. Although various algorithms for designing such RNA switches have been proposed, the previous algorithms optimize the RNA sequences against the weighted sum of objective functions, where empirical weights among objective functions are used. In addition, an RNA design algorithm for multiple pseudoknot targets is currently not available.
Results
We developed a novel computational tool for automatically designing RNA sequences which fold into multiple target secondary structures. Our algorithm designs RNA sequences based on multi-objective genetic algorithm, by which we can explore the RNA sequences having good objective function values without empirical weight parameters among the objective functions. Our algorithm has great flexibility by virtue of this weight-free nature. We benchmarked our multi-target RNA design algorithm with the datasets of two, three, and four target structures and found that our algorithm shows better or comparable design performances compared with the previous algorithms, RNAdesign and Frnakenstein. In addition to the benchmarks with pseudoknot-free datasets, we benchmarked MODENA with two-target pseudoknot datasets and found that MODENA can design the RNAs which have the target pseudoknotted secondary structures whose free energies are close to the lowest free energy. Moreover, we applied our algorithm to a ribozyme-based ON-switch which takes a ribozyme-inactive secondary structure when the theophylline aptamer structure is assumed.
Conclusions
Currently, MODENA is the only RNA design software which can be applied to multiple pseudoknot targets. Successful design results for the multiple targets and an RNA device indicate usefulness of our multi-objective RNA design algorithm.
Keywords
- RNA switch
- Artificial riboswitch
- Multi-objective genetic algorithm
- Pseudoknot
Background
In synthetic biology, biological systems are treated as a circuit composed of biomolecular parts such as nucleic acids and proteins. Since not only natural biomolecules but also artificially-constructed ones can be used as the molecular parts for constructing biological circuits, various efforts have been made to design novel biomolecules which have a desired function. In this context, synthetic RNA devices utilizing a conformational change have intensively been investigated and applied to control biological processes such as gene expression [1–4]. Since we can design various synthetic RNAs by combining the switching ability with a variety of natural RNA functions including enzyme [5], molecular recognition [6], thermometer [7], guide sequence [8], and scaffold [9], artificial RNA sequences with structural changes give a promising platform for creating biomolecular devices which control biological functions in accordance with the designer’s purpose.
We have to take secondary structure into account when designing an artificial RNA whose function needs a specific secondary structure. To date, manual/experimental approaches [1, 10] and computational designs [2–4, 11–14] have been proposed for the rational design of functional RNAs. In case that an automated tool for rational design does not exist for a desired functional RNA, secondary structure prediction method has been utilized in a trial-and-error manner [15]. Since such a trial-and-error approach can be a time-consuming process, a more automated tool for the rational design is important for the efficient development of RNA devices. Biomolecular design algorithms which find a biological sequence folding into a prescribed target structure are called ‘inverse folding’. The inverse folding of RNA can be formulated as a combinatorial optimization problem, in which a discrete space is explored to find an RNA sequence folding into a specified secondary structure [16–23]. The inverse folding algorithms of RNA can be classified into two categories: those for a single target and for multiple targets. Multi-target inverse folding designs the RNA sequences which fold into user-prescribed multiple secondary structures. Since conformational changes can be encoded in such multiple target structures, multi-target inverse folding is particularly useful for designing the RNA sequences with structural changes. For example, synthetic riboswitches and RNA devices are important targets of such designs. So far, the multi-target inverse folding methods have utilized single-objective optimization frameworks, where a weighted sum of objective functions (OFs) is optimized to obtain desired RNA sequences [3, 24–26]. However, the choice of weight parameter values can become rather empirical and can be a tedious task. An in silico selection pipeline has also been used to design synthetic riboswitches [4], where multiple criteria are used in a step-by-step manner to filter randomly generated RNA sequences. In the present study, we propose a multi-target inverse folding algorithm for RNA, which is based on multi-objective genetic algorithm (MOGA) [27]. MOGA is a framework suitable for the optimization problem with multiple OFs. Our multi-target design algorithm for RNA has been developed as a new version of our previous single-target RNA design algorithm, MODENA [28], which is based on MOGA. By using the multi-target version of MODENA, we can explore the optimal multistable RNA sequences without empirical weight parameters among multiple OFs. In addition, it is noteworthy that MODENA is the first inverse folding software which can perform the RNA design for multiple pseudoknot targets. In the rest of the present paper, we will describe our algorithm for multi-target RNA design in detail. Then we will show the design performance and usefulness of our algorithm through the benchmarks for multiple target structures and a design example of an RNA device which is taken from recent literature.
Methods
Optimization technique
We denote an RNA sequence of length N by S = s _{1}..s _{ i }..s _{ N }, where s _{ i }∈{A, C, G, U}. An RNA secondary structure, θ, is defined as a set of base pairs, where a base pair is defined as a pair, (i, j), of nucleotide positions. We consider only canonical (AU, GC) and wobble (GU) base pairs. An RNA sequence which can form a target secondary structure is called a ‘compatible’ RNA sequence [16].
Multi-target RNA design such as RNA device design is an inherently multi-objective problem since not a single but multiple requirements, e.g. a structure stability and a structure similarity with a target structure, can be needed to specify a desired function of RNAs. We define multi-target and multi-objective RNA sequence design problem as follows: finding an RNA sequence with a length of N which is compatible with prescribed multiple target secondary structures θ _{ i } (i=1,…,n _{target}) and is Pareto optimal with respect to given OFs f _{ i } (i=1,…,n _{OF}), where the OFs can be a minimum free energy, the energy difference between two secondary structures, or other predicted values. Moreover, formulae include such predicted values can also be used as the OFs.
Usually, there are trade-offs among the OFs of practical multi-objective problems. In such cases, a single optimal solution does not exist and the best solutions we can expect are Pareto optimal solutions [27]. Pareto optimal solutions are a set of solutions which are not dominated by any other solutions, where solution A is said to dominate solution B if \(f_{i}^{\mathrm {A}}\) is superior or equal to \(f_{i}^{\mathrm {B}}\) for all i and a j (1≥j≥n _{OF}) which satisfies \(f_{j}^{\mathrm {A}} \ne f_{j}^{\mathrm {B}}\) exists. So far, inverse folding methods which do not utilizing multi-objective optimization techniques have solved the RNA sequence design problems by using a weighted sum of OFs. Since the optimal solutions obtained as a result of such a weighted sum of OFs are included in Pareto optimal solutions [27], multi-objective optimization corresponds to simultaneously exploring multiple solutions which can be obtained by optimizing various weighted sums of OFs. By utilizing the framework of multi-objective optimization, we can explore RNA sequences with complex characteristics without tuning empirical weights among OFs.
Objective functions
Methods and their properties available in MODENA. Method names are mainly taken from those of corresponding structure prediction methods
Method | Property | Str. | Package |
---|---|---|---|
RNAfold | MFE,SIM | y | |
RNAfold-p^{a} | MFE,EFE,PB,SIM | y | |
RNAeval | FE | - | |
FindPath^{b} | BAR | - | |
Fold | MFE,SIM | y | RNAstructure [42] |
fold^{c} | FE | - | RNAstructure [42] |
EnsembleEnergy | EFE | - | RNAstructure [42] |
CentroidFold | FE^{d},SIM | y | |
centroidfold^{c} | FE | - | |
IPknot^{e} | GCPAIR,SIM | y | |
mfe^{e} | MFE,SIM | y | NUPACK [35] |
pfunc^{e} | EFE,PF | - | NUPACK [35] |
energy^{e} | FE | - | NUPACK [35] |
prob^{e} | PB | - | NUPACK [35] |
defect^{e} | DEF,NDEF | - | NUPACK [35] |
UNAFold | MFE,SIM | y | UNAFold [46] |
pknotsRG^{e} | MFE,SIM | y | RNA studio [47] |
HotKnots^{e} | MFE,SIM | y | RNAsoft [48] |
GC^{e} | CONT | - | - |
In addition to the structure prediction programs, GC content is also included in the available properties. Since biased GC content can easily appear in the designed sequences if we do not take GC content into account [22], |r(GC:CONT)−ρ _{target}| was used in one of the OFs in the present study, where r(X:Y) indicates the value for the property Y of method X, and ρ _{target} is a user-specified target GC content. The r(GC:CONT), or a GC content (%), is calculated by counting the number of Gs and Cs in the designed sequence and dividing the count by the nucleotide length of the sequence.
In MODENA, target GC content can be taken into account through an OF. For this reason, the constraint for GC content is not exact but an approximate one. Since there can be a trade-off between a GC content and another OF value, e.g. a minimum free energy, an OF including a GC content can interfere with another OF during the design.
It is noted that increasing the number of OFs usually makes the design more difficult. In the present study, at most we used five OFs (see the ‘An example of RNA device design’ subsection).
Sequence and structure constraints
To fix functional motifs during the sequence design process, the sequence constraints in the IUPAC nucleotide code are available in MODENA. In addition to the sequence constraints, we can specify secondary structure constraints for each secondary structure prediction method if the prediction method can use secondary structure constraints (e.g. the Fold program of RNAstructure, CentroidFold, and RNAfold provide such a function). While the constraint sequences are never changed during a design run, the structure constraints are applied only when the prediction method with the structure constraints is invoked, so that we can define and use different structure constraints for each method. A typical usage of the structure constraints is modelling of the ligand-binding state of an aptamer. If a ligand exists, the aptamer domain binds the ligand and forms a characteristic secondary structure. In inverse folding, this ligand-binding state can be modelled by using the structure constraints which specify the characteristic secondary structure of the aptamer domain [4]. Such a structure-constraint secondary structure prediction gives the secondary structure which has the lowest energy in the set of all the secondary structures with the constraint secondary structure. This corresponds to the lowest energy structure of when the ligand binds to the aptamer.
Nucleotide assignment algorithm
In the GA initialization and reproduction steps of the inverse folding, we generate RNA sequences which are compatible with the prescribed target structures. The generation of compatible random sequences in single-target inverse folding is easy even when sequence constraint is imposed on. In the case of the RNA sequence design with multiple targets, however, the more complex base-complementarity relationship among nucleotide positions, called the dependency graph [24, 26], has to be taken into account, since a nucleotide position which forms base-pairs with multiple other positions corresponds to a vertex with a degree > 1 and such nucleotide positions cause a network-like relationship. In MODENA, we do not use the ear decomposition of RNAdesign [26] which is a sophisticated graph coloring algorithm and guarantees uniform sampling of RNA sequences compatible with the target structures. Instead, to generate RNA sequences, we use a naive ‘nucleotide assignment algorithm’ described below.
A dependency graph G=(V,E) is the graph composed of vertices, V={1,…,N}, representing nucleotide positions and edges, \(E=\cup _{i=1}^{n_{\text {target}}} \theta _{i}\), corresponding to the base pairs in target structures. In the two-target problem, each connected component c _{ i } (i=1,…,n _{c}) in the dependency graph belongs to one of isolated vertex, path, and cycle [24]. In addition, more complex graph structures can appear in the dependency graph of the multi-target inverse folding with n _{target}≥3 [26]. To generate RNA sequences compatible with all the target structures, we have to find a nucleotide code assignment to V, by which all base paring relationships specified by E are satisfied. As described in the generalized intersection theorem [26], if G is bipartite, at least one nucleotide code assignment compatible with G exists; if G is not bipartite, we cannot assign compatible nucleotide codes to G since the bipartiteness is also a necessary condition for the latter.
where k indicates the position along the path, i.e. k=1 corresponds to the start vertex and the k for the end vertex is equal to the length of the path; s indicates a nucleotide assigned to the start vertex (s∈{A, C, G, U}); χ(1,s,s)=1 if a nucleotide s is allowed at the start vertex, otherwise 0. A χ(k,x,y) gives the number of combinations such that a nucleotide x is assigned to the k-th vertex in the path when a nucleotide y is assigned to the start vertex. For example, if a U and A are assigned to the start and end vertices of path m, respectively, and such assignments are compatible for the path, χ(L _{ m },A,U) becomes larger than 0, where L _{ m } is the length of path m. It is noted that if a nucleotide x is not allowed at the k-th vertex due to a sequence constraint, χ(k,x,s) is set to 0 during the computation of the recursion.
If a sequence constraint does not exist in the connected component, the nucleotide assignment algorithm uniformly samples the nucleotides assuming AU or GC base pair alone. It is noted that, when a sequence constraint exists, there is no guarantee that the nucleotide assignment algorithm can sample the whole sequence space of a given nucleotide length since we use the biased sampling method in MODENA.
GA operators
In the reproduction step of GA, individuals in a population are modified by using ‘GA operators’ to generate the next GA population. The multi-target version of MODENA uses four new GA operators: point mutation, negative design, positive design, and crossover operators. Whereas the point mutation and crossover operators are straight forward extensions of the previous GA operators used in the single-target version of MODENA [28], positive and negative design operators are newly introduced operators in the multi-target version. The details of the four GA operators are described below.
Point mutation for multiple targets
Mutation is one of the most fundamental operations in heuristic optimization algorithms, since it corresponds to a local move in a search space. In the point mutation, we scan the RNA sequence from the 5’ side to the 3’ side to randomly select a nucleotide position, i _{mut}, in accordance with a mutation probability p _{ M } (a default value p _{ M } = 0.05); at each nucleotide position, we generate a random number, r _{n}, where 0.0 ≤r _{n}<1.0, and compare r _{n} and p _{ M }; if r _{n}≤p _{ M }, we try to mutate the nucleotide position, otherwise we do nothing and move on to the next nucleotide position. If the selected position belongs to a loop nucleotide, the selected nucleotide is simply changed to a nucleotide different from the original one with an equal probability; if the selected position forms a base pair in the target secondary structures, we randomly select and apply one of transversion and transition operators to the selected nucleotide. If the transversion or transition operator failed (this can occur due to a sequence constraint), the nucleotides of the connected component including the selected nucleotide are not changed.
If we introduce a transversion at a nucleotide position belonging to a connected component, all the other positions in the connected component must change their nucleotides to repair the compatibility, since any connected component is bipartite and all nucleotide positions belonging to each partition must have the same nucleotide type (purine or pyrimidine). This repair process is performed in accordance with the ‘nucleotide assignment algorithm’.
In the case of transition operator, there exists a case where one nucleotide change (i.e. a change in the randomly selected position) alone is adequate. For example, the A of an AU base pair can be changed to a G, leading to a GU base pair and this change does not destroy the compatibility of the original base pair. In other cases, however, more nucleotide changes can be required to guarantee the compatibility. Therefore, we traverse the spanning tree of the connected component in order of the breadth-first search to repair the compatibility, where the vertex of nucleotide position i _{mut} is the root vertex of the spanning tree. Similar to the case of two-target inverse folding [24], the traversal up to depth one is adequate to repair any connected component of multiple targets.
In the point mutation, one of the transversion and transition operators is randomly selected with the equal probability (= 0.5) and applied to each nucleotide position selected with the p _{ M }.
Negative design operator
Positive design operator
Since positive design operator tries to increase the GC content of the base-paired positions in the target structures, positive design operator may cause a slow convergence in the GA when the user designs the RNA sequences with a low GC content. Positive design operator can be turned off by option “-opPos 0”.
Crossover for multiple targets
Crossover operator combines subsequences taken from two ‘parent’ individuals to generate a new one and can give a ‘long jump’ in the search space in contrast to the local move of the mutation operator. Let us consider two parent individuals (called parent L and R). First, a nucleotide position (a crossover point) p is selected at random (1≤p≤N−1), and then we split each parent individual into the 5’ half (s _{1}..s _{ p }) and the 3’ half (s _{ p+1}..s _{ N }) of the original sequence at the selected nucleotide position. Then, we try to splice the 5’ half of parent L and the 3’ half of parent R to generate a child individual. If there exists a connected component whose nucleotides distribute both in the 5’ half and 3’ half, the nucleotides belonging to the connected component are copied from one of the parents to the child (in the current version of MODENA, those of parent L are copied).
Mutation of undesired sequence motifs
To avoid undesired functions, we implemented simple operators for mutating nucleotide tracts and user-prescribed sequence motifs. Switching on/off of these operators can be specified through options. These operators are invoked just after (i) the processing by each GA operator and (ii) the GA initialization of each individual. If a nucleotide tract or specified sequence motif is found in the designed sequence, the point mutation is performed at a nucleotide position in the detected region to change the sequence.
Results and discussion
To show the optimization performance of the multi-target version of MODENA, we performed computational RNA design for the various sets of multiple targets and sequence constraints. Throughout the rest of the present paper, we use a population size of 100, the maximum number of generations of 200, and target GC content ρ _{target} = 50 (%) for MODENA if the other values are not mentioned. For performance comparison, we ran RNAdesign with option “-n 500 –thin 200 -b 100 –scale 1” and extracted top 100 RNA sequences as the results of RNAdesign; we used Frnakenstein with option “-s 100” to design 100 RNA sequences. When the Vienna RNA Package is required in the present study (for sequence design and performance evaluation), MODENA and Frnakenstein used Vienna RNA Package 1.8.3, and RNAdesign used Vienna RNA Package 2.1.1. NUPACK 3.0 and RNAstructure 5.3 were also used for pseudoknotted RNA design and RNA device design by MODENA, respectively.
Two-target design
As design examples of the two-target inverse folding without sequence constraint, we designed RNA sequences which fold into the metastable structures of SV11 [33], the 17 sets of metastable structures taken from the dataset (command_linesNew.faa) used in the paper of RNAtabupath [34], and two-target pseudoknot datasets (datasets generated by NUPACK:subopt [35] and that based on the natural pseudoknots taken from Pseudobase [36]). For the pseudoknot-free two-target designs, we minimized the following three objective functions computed by RNAfold and RNAeval with option -d2: \(f_{1}=\sum _{i}^{n_{\text {target}}}(E(\theta _{i})-G)\), \(f_{2}=\sum _{i < j}|E(\theta _{i})-E(\theta _{j})|,f_{3}=|r(\text {GC:CONT})-\rho _{\text {target}}|\), where G and E(θ _{ i }) are the ensemble free energy and the free energy of target structure θ _{ i }, respectively. In the multi-target pseudoknot designs of the present study, instead of RNAfold and RNAeval, we used the ‘pfunc’ and ‘energy’ of NUPACK with option -pseudo to evaluate f _{1}, f _{2}, and f _{3}.
The results of two-target designs
MODENA | RNAdesign | Frnakenstein | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RNA | l | δ e _{1} | δ e _{2} | n _{1} | n _{2} | δ e _{1} | δ e _{2} | n _{1} | n _{2} | δ e _{1} | δ e _{2} | n _{1} | n _{2} |
SV11 | 115 | 0.00 | 0.00 | 100 | 19 | 0.00 | 0.20 | 11 | 0 | 0.00 | 0.50 | 84 | 0 |
alpha operon | 130 | 0.10 | 0.60 | 0 | 0 | 0.10 | 0.40 | 3 | 0 | 0.00 | 0.60 | 34 | 0 |
amv | 145 | 0.60 | 0.70 | 0 | 0 | 1.30 | 2.20 | 0 | 0 | 0.80 | 2.40 | 1 | 0 |
attenuator | 73 | 0.00 | 0.80 | 33 | 0 | 3.50 | 3.80 | 0 | 0 | 0.30 | 0.60 | 0 | 0 |
dsrA | 85 | 0.00 | 0.00 | 100 | 7 | 0.00 | 0.10 | 15 | 0 | 0.00 | 0.80 | 99 | 0 |
HDV | 153 | 0.00 | 0.00 | 100 | 30 | 0.00 | 0.10 | 11 | 0 | 0.00 | 0.50 | 100 | 0 |
HIV-1 leader | 280 | 0.50 | 0.70 | 0 | 0 | 1.20 | 1.50 | 0 | 0 | 0.50 | 1.30 | 0 | 0 |
ms2 | 73 | 0.40 | 0.40 | 44 | 0 | 1.70 | 1.80 | 0 | 0 | 0.20 | 0.40 | 0 | 0 |
rb1 | 148 | 0.00 | 0.04 | 75 | 0 | 0.20 | 0.40 | 1 | 0 | 0.00 | 0.66 | 100 | 0 |
rb2 | 113 | 0.00 | 0.00 | 100 | 89 | 0.10 | 0.20 | 8 | 0 | 0.00 | 0.10 | 93 | 0 |
rb3 | 141 | 0.10 | 0.10 | 52 | 0 | 1.00 | 1.50 | 0 | 0 | 0.00 | 0.00 | 39 | 2 |
rb4 | 146 | 2.10 | 5.60 | 0 | 0 | 3.30 | 3.70 | 0 | 0 | 3.21 | 5.31 | 0 | 0 |
rb5 | 201 | 0.46 | 0.50 | 0 | 0 | 0.80 | 1.30 | 0 | 0 | 0.20 | 0.44 | 0 | 0 |
ribD leader | 304 | 0.80 | 1.20 | 0 | 0 | 3.90 | 4.20 | 0 | 0 | 0.10 | 2.60 | 0 | 0 |
s-box leader | 247 | 0.70 | 0.80 | 0 | 0 | 1.60 | 1.80 | 0 | 0 | 0.30 | 1.10 | 0 | 0 |
s15 | 74 | 0.10 | 0.40 | 0 | 0 | 0.00 | 0.40 | 15 | 0 | 0.00 | 0.20 | 98 | 0 |
SL | 56 | 0.00 | 0.00 | 100 | 28 | 0.00 | 0.00 | 17 | 1 | 0.00 | 0.00 | 86 | 6 |
thiM leader | 165 | 0.60 | 1.00 | 0 | 0 | 2.70 | 3.60 | 0 | 0 | 0.36 | 1.56 | 0 | 0 |
mean | - | 0.38 | 0.76 | - | - | 1.26 | 1.58 | - | - | 0.35 | 1.09 | - | - |
median | - | 0.10 | 0.50 | - | - | 1.00 | 1.50 | - | - | 0.10 | 0.60 | - | - |
The design results for the 17 sets of two target structures are also tabulated in Table 2. As shown in the n _{1} column of Table 2, for the eight sets of the 17 sets, MODENA designed the RNA sequences such that at least one of the two target structures has the lowest free energy. RNAdesign designed a less number (seven sets of the 17 sets) of such RNA sequences; Frnakenstein did nine sets of the 17 sets, which are slightly better results compared to those obtained by MODENA. Moreover, as can be seen in the n _{2} column of Table 2, MODENA successfully designed, for four sets of the 17 sets, the RNA sequences in which both target structures have the lowest free energy, while RNAdesign and Frnakenstein obtained such completely multistable RNA sequences for only one and two sets of the 17 sets, respectively. These results indicate that MODENA has a better design ability for the two-target inverse folding of RNA compared with RNAdesign, and has a design ability comparable to Frnakenstein.
We performed Wilcoxon’s signed rank tests by using R version 3.0.3 for the δ e _{1} and δ e _{2} in Table 2, where δ e _{1} is the energy difference between the ground state and the lowest energy target structure (δ e _{2} is that for the highest energy target structure). As a result of the comparison between MODENA and RNAdesign, we obtained P = 7.324×10^{−4} and P = 7.996×10^{−3} for δ e _{1} and δ e _{2}, respectively. When we compared the design results of MODENA and Frnakenstein, the Ps for δ e _{1} and δ e _{2} were 0.4639 and 3.076×10^{−2}, respectively. If we use the Bonferroni correction for these four tests, the Ps for the comparisons between MODENA and RNAdesign are statistically significant (< 0.05 /4). Since we tested four comparisons (a comparison between δ e _{1}s of MODENA and RNAdesign, δ e _{2}s of MODENA and RNAdesign, δ e _{1}s of MODENA and Frnakenstein, and δ e _{2}s of MODENA and Frnakenstein),“0.05 divided by four” was used here as the level of statistical significance for each comparison.
At the bottom of Table 2, the means and medians for δ e _{1} and δ e _{2} are also shown. Better values of the means and medians for genetic algorithms (MODENA and Frnakenstein) imply that better optimization techniques are effective for these design problems. A representative designed sequence for each design problem is tabulated in Additional file 1: Table S1.
As utilized in the computational design of ribozymes by Dotu et al. [14], the Boltzmann probability of each target structure can be a useful measure for evaluating designed RNAs. In the paper by Dotu et al. [14], the designed RNA sequences are classified in accordance with whether the single target structure has a Boltzmann probability ≥ 0.4 or not. In the case of multistable RNA design, it is desirable that all target structures have the same large probability. The Boltzmann probabilities of the RNA sequences designed for the 17 sets of two targets are shown in Additional file 3: Figure S1. As can be seen from the figure, MODENA successfully designed the RNA sequences with (the sum of the Boltzmann probabilities of two target structures) ≥ 0.4 for four target sets (dsrA, HDV, rb2, and SL), while Frnakenstein and RNAdesign designed such RNA sequences only for one (SL) and no set, respectively.
Two-target pseudoknot design
MODENA can design pseudoknotted RNA sequences if the structure prediction method can predict a pseudoknot [32]. For this reason, the pseudoknot classes which can be designed by MODENA are dependent on those of the structure prediction method. To our knowledge, this is the first report on the inverse folding algorithm for multiple target pseudoknots. As the feasible design problems for the design performance test, by using NUPACK:subopt, we generated 50 and 30 sets of two target structures with a length of 60 and 80 nucleotides, respectively. Hereafter, we call the 50 and 30 sets as PK60 and PK80 dataset, respectively. The PK60 and PK80 datasets were constructed as follows. First we randomly generate an RNA sequence, then perform NUPACK:subopt to obtain two suboptimal structures. We filter the set of two structures by examining the following criteria. (i) (the Hamming distance between the two structures in bracket notation)/(target structure length) ≥ 0.1 to avoid very similar structures. (ii) (the number of base-paired nucleotides of the two structures)/(the sum of the lengths of the two structures) ≥ 0.2 to avoid too many loop nucleotides in the target structures. (iii) At least one pseudoknot is included in one of the two structures. If the set of two structures satisfied these criteria, we added the set to our dataset, otherwise rejected the set. We continued this target set generation process until 50 or 30 sets of two structures were obtained for the PK60 and PK80 datasets, respectively.
In these designs of the pseudoknotted RNAs with multiple targets, we used smaller parameter values, a population size of 30 and the maximum number of generations of 50, to limit the computational times since the pseudoknot prediction can take a very long computational time. In these design benchmarks, MODENA successfully designed completely multistable RNA sequences for a number of the sets of target structures; for the PK60 dataset, we obtained 45 sets (90 % of the 50 sets) in which at least one target structure has the lowest free energy and obtained 27 sets (54 %) which have a completely multistable RNA sequence; in the PK80 dataset, we successfully designed 27 sets (90 % of the 30 sets) in which one of the two targets has the lowest free energy and the 15 sets (50 %) having a completely multistable RNA sequence. Designed sequences and detailed results of the two-target pseudoknotted RNAs are tabulated in Additional file 1: Tables S4 and S5.
In addition to the randomly generated datasets, we constructed a two-target pseudoknot dataset based on natural RNA pseudoknots taken from Pseudobase [36]. The two-target ‘Pseudobase’ dataset (we call LE80 dataset) was constructed as follows: Step 1) we set i = 1; Step 2) we pick up the i-th pseudoknot from the Pseudobase dataset for single-target design [32, 36]; Step 3) we compare the i-th pseudoknot with j-th pseudoknot (i<j) one by one; if the i-th and j-th pseudoknots have the same nucleotide length and have a structure similarity = ([target structure length] - [the Hamming distance between the i-th and j-th pseudoknots in bracket notation])/(target structure length) < 0.8, we add a set of the i-th and j-th pseudoknots to the LE80 dataset and mark the i-th and j-th pseudoknots in the Pseudobase dataset for single-target design (marked pseudoknots are not used at a subsequent processing, therefore each pseudoknot in the Pseudobase dataset for single-target design can appear only once in the LE80 dataset); Step 4) we increment i by one; if we reach the last pseudoknot, stop the processing; otherwise go to Step 2. It is noted that, in the procedure from Step 1 to Step 4, we consider the pseudoknots with a length of ≥ 40 and ≤ 80 nucleotides.
The design results for the two-target Pseudobase dataset (the LE80 dataset)
Pseudobase ID | l | δ e _{1} | δ e _{2} | n _{1} | n _{2} |
---|---|---|---|---|---|
PKB00002_PKB00004 | 50 | 0.20 | 0.20 | 5 | 0 |
PKB00005_PKB00015 | 41 | 0.50 | 0.50 | 6 | 0 |
PKB00008_PKB00031 | 40 | 0.00 | 0.70 | 6 | 0 |
PKB00010_PKB00066 | 40 | 0.00 | 0.00 | 22 | 3 |
PKB00012_PKB00268 | 40 | 0.00 | 0.20 | 13 | 0 |
PKB00030_PKB00045 | 41 | 0.20 | 0.90 | 9 | 0 |
PKB00047_PKB00069 | 61 | 4.00 | 4.40 | 0 | 0 |
PKB00048_PKB00265 | 61 | 0.50 | 1.20 | 0 | 0 |
PKB00050_PKB00128 | 59 | 0.00 | 0.00 | 12 | 1 |
PKB00052_PKB00107 | 52 | 0.00 | 0.10 | 7 | 0 |
PKB00057_PKB00072 | 67 | 3.60 | 3.90 | 0 | 0 |
PKB00068_PKB00129 | 68 | 4.80 | 4.90 | 0 | 0 |
PKB00070_PKB00244 | 55 | 0.00 | 0.50 | 3 | 0 |
PKB00078_PKB00106 | 62 | 0.00 | 0.40 | 8 | 0 |
PKB00080_PKB00132 | 49 | 0.20 | 0.20 | 13 | 0 |
PKB00088_PKB00127 | 62 | 0.20 | 0.30 | 2 | 0 |
PKB00098_PKB00232 | 62 | 0.80 | 1.40 | 0 | 0 |
PKB00131_PKB00205 | 48 | 0.00 | 0.01 | 29 | 0 |
PKB00139_PKB00141 | 70 | 1.30 | 1.90 | 0 | 0 |
PKB00142_PKB00231 | 71 | 0.10 | 0.60 | 0 | 0 |
PKB00143_PKB00233 | 71 | 2.50 | 2.60 | 0 | 0 |
PKB00148_PKB00218 | 72 | 3.90 | 4.90 | 0 | 0 |
PKB00175_PKB00259 | 57 | 1.60 | 1.60 | 3 | 0 |
PKB00179_PKB00280 | 68 | 1.50 | 1.70 | 3 | 0 |
PKB00180_PKB00212 | 64 | 0.10 | 0.30 | 4 | 0 |
PKB00190_PKB00266 | 47 | 0.00 | 0.00 | 29 | 7 |
PKB00207_PKB00213 | 45 | 0.00 | 0.00 | 12 | 2 |
PKB00211_PKB00239 | 80 | 0.30 | 0.40 | 3 | 0 |
PKB00222_PKB00305 | 80 | 2.10 | 3.20 | 1 | 0 |
PKB00224_PKB00281 | 43 | 0.00 | 0.10 | 10 | 0 |
PKB00230_PKB00273 | 48 | 0.00 | 0.40 | 7 | 0 |
PKB00248_PKB00257 | 66 | 0.40 | 2.10 | 0 | 0 |
PKB00263_PKB00270 | 62 | 0.20 | 0.60 | 0 | 0 |
PKB00269_PKB00272 | 66 | 1.40 | 1.40 | 0 | 0 |
mean | - | 0.89 | 1.22 | - | - |
median | - | 0.20 | 0.55 | - | - |
Three- and four-target designs
The results of three- and four-target designs
MODENA | RNAdesign | Frnakenstein | ||||
---|---|---|---|---|---|---|
3str | 4str | 3str | 4str | 3str | 4str | |
mean (δ e _{1}) | 0.27 | 0.84 | 0.35 | 1.63^{a} | 0.39 | 0.92 |
median (δ e _{1}) | 0.00 | 0.39 | 0.05 | 0.70^{a} | 0.10 | 0.55 |
mean (δ e _{2}) | 0.54 | 1.78 | 0.53 | 2.31^{a} | 0.96 | 1.89 |
median (δ e _{2}) | 0.30 | 1.40 | 0.30 | 1.50^{a} | 0.80 | 1.60 |
Multi-target RNA design with sequence constraints
To test the design performance with sequence constraints, we re-performed the RNA sequence designs with the sets of the two, three, and four pseudoknot-free targets, where a randomly selected subsequence with a length of 10 % of the original sequence is used as sequence constraints in each design problem. As a result, for the two-target designs, we obtained higher means and medians than those of the designs without the constraints, while results comparable to the designs without the constraints were obtained for the three- and four-target designs. Data of these designs are included in Additional file 1: Tables S7 - S9.
Single-target design
Since the multi-target version of MODENA can also perform single-target RNA design, the results for single-target designs without sequence constraints are also included in Additional file 2: Tables S10 – S15, where the results obtained for the RNAiFold datasets (including the EteRNA dataset) [20, 38] and the Pseudobase dataset [32, 36] are shown. Input files, in which the objective functions and the methods used for each single-target design are described, are available from the MODENA website.
An example of RNA device design
As an example of RNA device design, we re-designed the ribozyme-based RNA device proposed by Win and Smolke [39]. This RNA device, as described in [15] in detail, utilizes the sequence motifs of sTRSV hammerhead ribozyme and theophylline aptamer to realize an ON switch which can be used by embedding it in the 3’-UTR of a mRNA. When a ligand does not exist, this ON switch folds into the ribozyme structure and catalyzes self-cleavage, leading to the low expression of the gene in the mRNA. This design problem becomes a two-target RNA design since ribozyme-active and -inactive conformations specify the function. This RNA device design has the following features: (i) The ligand-binding state of the RNA device is approximately modelled as the minimum free energy (MFE) structure predicted with the secondary structure constraints of the aptamer domain. Such a modelling is realized by utilizing the structure constraint function provided by a secondary structure prediction method. (ii) The target structure of the ribozyme-inactive state is specified by the loop nucleotides which disrupt the stem II of the ribozyme. (iii) Some target secondary structures (other than the ribozyme structure, the aptamer domain structure, and the loop disrupting the stem II) are specified by a wild card; the secondary structures of such regions are automatically determined during the design computation. As objective functions, we minimized F _{1}=E(Θ _{active})−G, F _{2}=|E(Θ _{inactive})−E(Θ _{active})−1.0|, F _{3}=−(σ _{inactive}+σ _{active})/2, F _{4}=|r(GC:CONT)−50|, where Θ _{inactive} and Θ _{active} indicate the MFE structures of the inactive and active state, respectively; σ _{inactive} and σ _{active} are the structure similarities with the inactive- and active-target structures, respectively. RNAstructure 5.3 package was used to evaluate these objective functions. We ran MODENA to design the RNA device mentioned above with option “-conv 0”, which guarantees that the run does not stop until the specified maximum number of generations (in this case, 200) is reached.
Since the energy barrier height between target structures can be one of the important features of RNA device, in addition to the RNA device design mentioned above, we designed RNA sequences by minimizing five OFs, i.e. F _{1}, F _{2}, F _{3}, F _{4} and the energy barrier height F _{5} between two secondary structures, where the two secondary structures are the MFE structures predicted by RNAfold with and without the structure constraints for the aptamer domain. In this test design, we minimized F _{5}; this can increase the transition probabilities between ribozyme-active and -inactive conformations when the ligand does not exist. We used the Vienna RNA package instead of RNAstructure package in this RNA design taking energy barrier height into account; this is because the barrier height prediction program, accessFindPath.py [34] called in MODENA with a look-ahead parameter of 1000, utilizes findpath.c [24] which is taken from the Vienna RNA package. We used the GA parameter settings similar to the RNA device design mentioned above except for a population size of 200 and the maximum number of generations of 400. By using MODENA, we successfully obtained two designed RNA sequences which have F _{2} = 0.0 and F _{3} = -1.0. Both of these designed RNA sequences have a predicted energy barrier height of 5.2 kcal/mol, which is a much smaller value than that (11.9 kcal/mol) predicted for the RNA sequence (shown in Fig. 8) designed by not taking energy barrier height into account. The designed sequences and their structures are shown in Additional file 3: Figure S2. It is noted that the predicted barrier height only gives an upper limit of the lowest barrier height since the algorithm of findpath.c is a heuristic algorithm.
Computational time
Computational times
MODENA | ||||||
---|---|---|---|---|---|---|
Dataset | l (nt) | pop. size | #threads | t _{MODENA} (s) | t _{Frnakenstein} (s) | t _{RNAdesign} (s) |
two targets | 149 | 100 | 4 | 446^{a} | 13,988^{a} | 2,924^{c} |
three targets | 100 | 100 | 4 | 339^{a} | 1,924^{b} | 1,249^{c} |
four targets | 100 | 100 | 4 | 385^{a} | 2,848^{b} | 1,653^{c} |
RNA device^{d} | 95 | 100 | 4 | 2,428^{a} | - | - |
RNA device^{e} | 95 | 200 | 6 | 15,460^{b} | - | - |
PK60 | 60 | 30 | 3 | 1,997^{b} | - | - |
PK80 | 80 | 30 | 3 | 7,780^{b} | - | - |
LE80 | 61 | 30 | 3 | 2,140^{b} | - | - |
In the present study, we did not tune the GA parameters. The parameter values used in the present study were determined based on our past experience in the GA studies including RNA sequence alignment [31] and single-target RNA inverse folding [28, 32]. Some parameter tuning could improve the benchmark results shown in the present paper.
Effect of the crossover for multiple targets
To examine the effect of the crossover for multiple targets, we performed the two-target designs of the 17 RNAtabupath target sets without the crossover for multiple targets (option “-opCr 0” was used for the purpose). Aa a result, we obtained the means for δ e _{1} and δ e _{2} (0.46 and 0.85, respectively), which are slightly worse values compared with the designs with the crossover operator (as can be seen in Table 2, 0.38 and 0.76, respectively). When we designed without the crossover operator, the medians for δ e _{1} and δ e _{2} were 0.1 and 0.3, respectively. The median of δ e _{2} is slightly better than that of the designs with the crossover operator (0.5). Wilcoxon’s signed rank tests for these comparisons (P = 0.9097 and 0.3296) indicate that the RNA sequences designed with and without the crossover are comparable with respect to δ e _{1} and δ e _{2}. In Additional file 3: Figure S3, the Boltzmann probabilities of the RNA sequences designed with and without the crossover operator are shown. As can be seen from the figure, MODENA with the crossover designed the RNA sequences with the Boltzmann probabilities larger than MODENA without the crossover in many target sets; it is noteworthy that the designs with the crossover gave larger probabilities compared to those without the crossover in all the target sets with long lengths (> 200 nucleotides; HIV-1 leader, rb5, ribD leader, and s-box leader), while both are comparable for short sequences (< 100 nucleotides; dsrA, ms2, s15, and SL).
Differences between the multi-target and previous versions of MODENA
Main differences between the new multi-target version and the previous single-target versions are as follows: (i) The previous versions cannot perform the RNA inverse folding with multiple target structures. The previous versions can accept only one target structure as an input. To perform the RNA inverse folding with multiple target structures, it is necessary to take the dependency graph into account in the initialization, mutation, and crossover as implemented in the multi-target version. (ii) The error diagnosis operator of the previous versions has been replaced by negative and positive mutation operators in the multi-target version. (iii) The ‘Mutation of undesired sequence motifs’ operator has been introduced in the new version. (iv) Objective functions more than two can be used in the multi-target version. More sequence properties including a predicted energy barrier height and a GC content can be used in the OFs in the multi-target version. In addition, not only sequence property values themselves, but also a function of sequence property values can be utilized in an objective function in the new version. The previous versions of MODENA can use only the two objective functions (a stability score and structure similarity score). (v) The secondary structure constraint, which is useful for modelling the ligand-binding state of an aptamer, has been introduced in the new version. (vi) The new version explores Pareto optimal solutions in the default setting, while the previous versions do weak Pareto optimal solutions. (vii) A simple parallelization of the evaluation part in the GA has been introduced in the new version.
Differences between genetic algorithms (MODENA and Frnakenstein)
Important differences between MODENA and Frnakenstein are as follows. First, MODENA adopts a multi-objective genetic algorithm which can obtain the approximate Pareto optimal solutions at one run, while Frnakenstein designs RNA sequences based on the approach utilizing a weighted sum of objective functions (which includes the case of a single objective function) which explores one of the Pareto optimal solutions as the single optimal solution. MODENA uses multiple OFs such as f _{1}, f _{2}, and f _{3}, whereas Frnakenstein uses a single OF as the default setting for multi-target RNA design. Due to this algorithmic difference, the stopping criterion (the number of continuous GA generations which have the same number of non-dominated solutions) of MODENA is also different from that of Frnakenstein.
Secondly, in the default setting of Frnakenstein, RNA sequences are initialized by RNAinverse where a target structure is randomly selected from multiple targets. This initialization does not guarantee the compatibility with multiple target structures. MODENA initializes RNA sequences with the nucleotide assignment algorithm which guarantees the compatibility.
Thirdly, MODENA has various functions which are useful for practical RNA device design and are not implemented in Frnakenstein: sequence and structure constraints, pseudoknot design function, target GC content, mutation of undesired sequence motifs, and OpenMP parallelization. Although, in principle, it may be possible to add these functions to Frnakenstein, such a version of Frnakenstein is currently not available.
Fourthly, MODENA is written in C and partially in C++ and Frnakenstein is a python program.
Finally, MODENA uses originally-developed GA operators different from those of Frnakenstein. As the default setting, the GA operators of Frnakenstein use base pairing probabilities, whereas the GA operators of MODENA do not utilize base paring probabilities. RNA folding with base paring probability computation takes a longer time than that without base paring probabilities, this difference could be one of the reasons for the difference between the computational times of MODENA and Frnakenstein.
Average pairwise structure distance of target structures
To know how diverse structures are included in each set of target structures, we calculated the average pairwise structure distance (APSD) for each set of target structures. The calculated APSD is displayed in the filename of each set of target structures, e.g. ‘alpha_operon-apsd13.inp’ indicates a set of target structures with an APSD of 0.13. The target structures (input files) used in the benchmarks can be downloaded from the MODENA website.
Availability
Academic users can use MODENA from our website (http://rna.eit.hirosaki-u.ac.jp/modena/multi).
Conclusion
We have presented a novel multi-objective genetic algorithm useful for multi-target RNA design. To enable us to design the RNA sequences in which multiple target structures have the free energies close to that of the ground state, we developed novel initialization, mutation, and crossover procedures which take the dependency graph into account. We examined the design performance of the present algorithm with the various sets of two, three, and four target structures and obtained good design performances compared with the other state-of-the-art multi-target RNA design algorithms. As an example of the practical RNA device design with sequence and structure constraints, we presented successfully designed RNA sequences which have the characteristic sequence and structural features specifying the function of a ribozyme-based RNA device. Although we showed only one example of RNA device design, MODENA can design RNA devices responding to various ligands by changing the secondary structure and sequence motif of the aptamer domain. Development of design templates (input files for MODENA) for various RNA devices, including RNA devices requiring four target structures (e.g. AND and OR logic gates [2, 11]), is currently in progress.
By virtue of the modular nature of genetic algorithm in which a solution evaluation part and the other optimization parts are clearly separated as different algorithmic parts, MODENA can perform multi-target RNA sequence design based on various widely-used RNA secondary structure prediction methods. It is noteworthy that we can use MODENA to design multistable pseudoknots, whereas the previous inverse folding methods cannot design multistable pseudoknots.
The current version of MODENA has several issues to be improved. For example, addition of RNA-RNA interaction prediction algorithms to the methods invoked in MODENA will make MODENA possible to design interacting structured RNA sequences. Such a design tool is useful for automatic design of RNA-RNA interaction circuits [3]. Since the computational speed of the current version is not so fast and the parallelization is very simple, more sophisticated parallelization will improve the speed of MODENA. Computational speed is important not only for standalone use, but also for the webserver for designing RNA devices.
From our experience, we can say that reducing the number of GU base pairs is important to obtain a good design result. For this reason, avoiding a subset of the sequence space having many GU base pairs is desirable for RNA sequence design. Since our sequence sampling approach, the nucleotide assignment algorithm, is rather empirical, development of a more sophisticated GU base pair sampling method is an interesting direction for future research. It is noteworthy that, by introducing more random nature into the nucleotide assignment algorithm, we can give non-zero sampling probabilities to all compatible nucleotide assignments (some weights will be needed to reduce the number of GU pairs).
It is noted that the present algorithm does not guarantee the desired in vitro/in vivo function of the designed RNA sequences. The functions of designed RNAs have to be determined by experiment. We hope that our multi-objective optimization approach gives an effective guide for developing novel RNA devices which work in vitro/in vivo and leads to fruitful collaboration between informatics and experiment in the field of biomolecular device design.
Declarations
Acknowledgements
This work was partially supported by KAKENHI (24500358).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Tang J, Breaker RR. Rational design of allosteric ribozymes. Chem Biol. 1997; 4:453–9.View ArticlePubMedGoogle Scholar
- Win MN, Liang JC, Smolke CD. Frameworks for programming biological function through RNA parts and devices. Chem Biol. 2009; 16:298–310.View ArticlePubMedPubMed CentralGoogle Scholar
- Rodrigo G, Landrain TE, Jaramillo A. De novo automated design of small RNA circuits for engineering synthetic riboregulation in living cells. Proc Natl Acad Sci U S A. 2012; 109:15271–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Wachsmuth M, Findeiß S, Weissheimer N, Stadler PF, Mörl M. De novo design of a synthetic riboswitch that regulates transcription termination. Nucleic Acids Res. 2012; 41:2541–551.View ArticlePubMedPubMed CentralGoogle Scholar
- Perreault J, Weinberg Z, Roth A, Popescu O, Chartrand P, Ferbeyre G, et al. Identification of hammerhead ribozymes in all domains of life reveals novel structural variations. PLoS Comput Biol. 2011; 7:1002031.View ArticleGoogle Scholar
- Chushak Y, Stone MO. In silico selection of RNA aptamers. Nucleic Acids Res. 2009; 37:87.View ArticleGoogle Scholar
- Kortmann J, Narberhaus F. Bacterial RNA thermometers: molecular zippers and switches. Nat Rev Microbiol. 2012; 10:255–65.View ArticlePubMedGoogle Scholar
- Lowe TM, Eddy SR. A computational screen for methylation guide snoRNAs in yeast. Science. 1999; 283:1168–71.View ArticlePubMedGoogle Scholar
- Zappulla DC, Cech TR. RNA as a flexible scaffold for proteins: yeast telomerase and beyond. Cold Spring Harb Symp Quant Biol. 2006; 71:217–24.View ArticlePubMedGoogle Scholar
- Schultes Ea. One sequence, two ribozymes: Implications for the emergence of new ribozyme folds. Science (80-.) 2000; 289:448–52.View ArticleGoogle Scholar
- Penchovsky R, Breaker RR. Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes. Nat Biotechnol. 2005; 23:1424–33.View ArticlePubMedGoogle Scholar
- Choi HMT, Chang JY, Trinh La, Padilla JE, Fraser SE, Pierce Na, et al. Programmable in situ amplification for multiplexed imaging of mRNA expression. Nat Biotechnol. 2010; 28:1208–12.View ArticlePubMedPubMed CentralGoogle Scholar
- Hochrein LM, Schwarzkopf M, Shahgholi M, Yin P, Pierce NA. Conditional Dicer substrate formation via shape and sequence transduction with small conditional RNAs. J Am Chem Soc. 2013; 135:17322–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Dotu I, Garcia-Martin JA, Slinger BL, Mechery V, Meyer MM, Clote P, et al. Complete RNA inverse folding: computational design of functional hammerhead ribozymes. Nucleic Acids Res. 2014; 42:11752–62.View ArticlePubMedPubMed CentralGoogle Scholar
- Liang JC, Smolke CD. Rational design and tuning of ribozyme-based devices. Methods Mol Biol. 2012; 848:439–54.View ArticlePubMedGoogle Scholar
- Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P, et al. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie - Chem Mon Chemie Chem Mon. 1994; 125:167–88.View ArticleGoogle Scholar
- Andronescu M, Fejes AP, Hutter F, Hoos HH, Condon A. A new algorithm for RNA secondary structure design. J Mol Biol. 2004; 336:607–24.View ArticlePubMedGoogle Scholar
- Busch A, Backofen R. INFO-RNA–a fast approach to inverse RNA folding. Bioinformatics. 2006; 22:1823–31.View ArticlePubMedGoogle Scholar
- Zadeh JN, Wolfe BR, Pierce NA. Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem. 2011; 32:439–52.View ArticlePubMedGoogle Scholar
- Garcia-Martin JA, Clote P, Dotu I. RNAiFOLD: a constraint programming algorithm for RNA inverse folding and molecular design. J Bioinform Comput Biol. 2013; 11:1350001.View ArticlePubMedGoogle Scholar
- Esmaili-Taheri A, Ganjtabesh M, Mohammad-Noori M. Evolutionary solution for the RNA design problem. Bioinformatics. 2014; 30:1250–8.View ArticlePubMedGoogle Scholar
- Reinharz V, Ponty Y, Waldispühl J. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics. 2013; 29:308–15.View ArticleGoogle Scholar
- Matthies MC, Bienert S, Torda AE. Dynamics in sequence space for RNA secondary structure design. J Chem Theory Comput. 2012; 8:3663–670.View ArticlePubMedGoogle Scholar
- Flamm C, Hofacker IL, Maurer-Stroh S, Stadler PF, Zehl M. Design of multistable RNA molecules. RNA. 2001; 7:254–65.View ArticlePubMedPubMed CentralGoogle Scholar
- Lyngsø RB, Anderson JWJ, Sizikova E, Badugu A, Hyland T, Hein J. Frnakenstein: multiple target inverse RNA folding. BMC Bioinforma. 2012; 13:260.View ArticleGoogle Scholar
- Höner Zu Siederdissen C, Hammer S, Abfalter I, Hofacker IL, Flamm C, Stadler PF. Computational design of RNAs with complex energy landscapes. Biopolymers. 2013; 99:1124–36.PubMedGoogle Scholar
- Deb K. Multi-objective optimization using evolutionary algorithms. Chichester: John Wiley and Sons; 2001.Google Scholar
- Taneda A. MODENA : a multi-objective RNA inverse folding. Adv Appl Bioinforma Chem. 2011; 4:1–12.Google Scholar
- Deb K, Pratap a, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput. 2002; 6:182–97.View ArticleGoogle Scholar
- Handl J, Kell DB, Knowles J. Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Trans Comput Biol Bioinform. 2007; 4:279–92.View ArticlePubMedGoogle Scholar
- Taneda A. Multi-objective pairwise RNA sequence alignment. Bioinformatics. 2010; 26:2383–390.View ArticlePubMedGoogle Scholar
- Taneda A. Multi-objective genetic algorithm for pseudoknotted RNA sequence design. Front Genet. 2012; 3:36.View ArticlePubMedPubMed CentralGoogle Scholar
- Biebricher CK, Luce R. In vitro recombination and terminal elongation of RNA by Q beta replicase. EMBO J. 1992; 11:5129–135.PubMedPubMed CentralGoogle Scholar
- Dotu I, Lorenz Wa, Van Hentenryck P, Clote P. Computing folding pathways between RNA secondary structures. Nucleic Acids Res. 2010; 38:1711–22.View ArticlePubMedGoogle Scholar
- Zadeh JN, Steenberg CD, Bois JS, Wolfe BR, Pierce MB, Khan AR, et al. NUPACK: Analysis and design of nucleic acid systems. J Comput Chem. 2011; 32:170–3.View ArticlePubMedGoogle Scholar
- van Batenburg FH, Gultyaev AP, Pleij CW, Ng J, Oliehoek J. PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res. 2000; 28:201–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Giegerich R, Voss B, Rehmsmeier M. Abstract shapes of RNA. Nucleic Acids Res. 2004; 32:4843–851.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee J, Kladwang W, Lee M, Cantu D, Azizyan M, Kim H, et al. RNA design rules from a massive open laboratory. Proc Natl Acad Sci U S A. 2014; 111:2122–127.View ArticlePubMedPubMed CentralGoogle Scholar
- Win MN, Smolke CD. A modular and extensible RNA-based gene-regulatory platform for engineering cellular function. Proc Natl Acad Sci U S A. 2007; 104:14283–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Darty K, Denise A, Ponty Y. VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics. 2009; 25:1974–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011; 6:26.View ArticlePubMedPubMed CentralGoogle Scholar
- Reuter JS, Mathews DH. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinforma. 2010; 11:129.View ArticleGoogle Scholar
- Hamada M, Kiryu H, Sato K, Mituyama T, Asai K. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics. 2009; 25:465–73.View ArticlePubMedGoogle Scholar
- Asai K, Kiryu H, Hamada M, Tabei Y, Sato K, Matsui H, et al. Software.ncrna.org: web servers for analyses of RNA sequences. Nucleic Acids Res. 2008; 36:75–8.View ArticleGoogle Scholar
- Sato K, Kato Y, Hamada M, Akutsu T, Asai K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics. 2011; 27:85–93.View ArticleGoogle Scholar
- Markham NR, Zuker M. UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol. 2008; 453:3–31.View ArticlePubMedGoogle Scholar
- Reeder J, Giegerich R. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinforma. 2004; 5:104.View ArticleGoogle Scholar
- Andronescu M, Aguirre-Hernández R, Condon A, Hoos HH. RNAsoft: A suite of RNA secondary structure prediction and design software tools. Nucleic Acids Res. 2003; 31:3416–422.View ArticlePubMedPubMed CentralGoogle Scholar
- Voss B, Meyer C, Giegerich R. Evaluating the predictability of conformational switching in RNA. Bioinformatics. 2004; 20:1573–82.View ArticlePubMedGoogle Scholar
- Wakeman CA, Winkler WC, Dann CE. Structural features of metabolite-sensing riboswitches. Trends Biochem Sci. 2007; 32:415–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Mandal M, Boese B, Barrick JE, Winkler WC, Breaker RR. Riboswitches control fundamental biochemical pathways in bacillus subtilis and other bacteria. Cell. 2003; 113:577–86.View ArticlePubMedGoogle Scholar