 Research Article
 Open access
 Published:
Biobjective integer programming for RNA secondary structure prediction with pseudoknots
BMC Bioinformatics volume 19, Article number: 13 (2018)
Abstract
Background
RNA structure prediction is an important field in bioinformatics, and numerous methods and tools have been proposed. Pseudoknots are specific motifs of RNA secondary structures that are difficult to predict. Almost all existing methods are based on a single model and return one solution, often missing the real structure. An alternative approach would be to combine different models and return a (small) set of solutions, maximizing its quality and diversity in order to increase the probability that it contains the real structure.
Results
We propose here an original method for predicting RNA secondary structures with pseudoknots, based on integer programming. We developed a generic biobjective integer programming algorithm allowing to return optimal and suboptimal solutions optimizing simultaneously two models. This algorithm was then applied to the combination of two known models of RNA secondary structure prediction, namely MEA and MFE. The resulting tool, called BiokoP, is compared with the other methods in the literature. The results show that the best solution (structure with the highest F_{1}score) is, in most cases, given by BiokoP. Moreover, the results of BiokoP are homogeneous, regardless of the pseudoknot type or the presence or not of pseudoknots. Indeed, the F_{1}scores are always higher than 70% for any number of solutions returned.
Conclusion
The results obtained by BiokoP show that combining the MEA and the MFE models, as well as returning several optimal and several suboptimal solutions, allow to improve the prediction of secondary structures. One perspective of our work is to combine better monocriterion models, in particular to combine a model based on the comparative approach with the MEA and the MFE models. This leads to develop in the future a new multiobjective algorithm to combine more than two models. BiokoP is available on the EvryRNA platform: https://EvryRNA.ibisc.univevry.fr.
Background
RNAs are involved in numerous pathologies such as cancer and neurodegenerative diseases. Determining the structure of an RNA is an important step in the understanding of its biological and biochemical function, its classification and its interaction with other molecules. In this paper, we are interested in the prediction of the secondary structure of RNAs with pseudoknots. Pseudoknots can have important roles in the translation process. For example, some studies have shown that the interaction of a pseudoknot with the ribosome induces a break of the ribosome during the translation, by causing a deformation of the tRNA in the P site [1].
Predicting the secondary structure with pseudoknots of an RNA sequence is a subject which is heavily studied in the literature. In fact, this problem was proved to be NPhard for various energy models [2, 3] and, as the current provided tools are not satisfactory, it is still an open subject. Two main approaches exist for predicting RNA structures (with or without pseudoknots): the thermodynamic approach and the comparative approach. The thermodynamic approach consists in, either computing the structure of minimum free energy (MFE) according to a set of thermodynamic parameters, or computing the structure of maximum expected accuracy (MEA) with a partition function. The comparative approach consists in finding a conserved RNA structure between several species. This approach needs therefore several (homologous) sequences as inputs, unlike the first approach where only one sequence is needed.
Many tools have been proposed in the literature for predicting RNA pseudoknots. We can cite for instance tools based on MFE models [4–9], tools based on MEA models [10, 11] and tools based on the comparative approach [12, 13]. However, the results of a single given model can only approach the real structure. For example, it is now established that the real structure has a very low energy, but not necessarily the minimum one (indeed, many factors are involved, such as the environment). Approaches able to combine different models are therefore interesting. To our knowledge, very few tools have been proposed to combine different models for the prediction of secondary structures of RNAs with pseudoknots. Combination has been used for the prediction of a consensus structure of several homologous sequences, as performed in ILM [13] which combines the comparative approach with an MFE model, and in IPknot [10] which combines the comparative approach with an MEA model. An algebraic dynamic programming method [14] has also been proposed to combine the MEA and the MFE models. However, no dedicated tool is available. Moreover, very few tools, namely pKiss [4], McGenus [5] and Tfold [12], have been proposed to return several solutions of secondary structures with pseudoknots. Proposing a unique solution, the optimal one according to a given model, is restrictive, for the reasons given above. It is important to consider also suboptimal solutions. Our goal is to develop a method combining different models and returning both several optimal and several suboptimal solutions. In this paper, we are interested in the thermodynamic approach, as we consider a single RNA sequence of interest as input.
The majority of RNA secondary structure prediction tools were developed using the dynamic programming methodology [4, 5, 7, 11]. In [6] and [10], another approach was proposed: integer programming. An integer program is a mathematical formalization of a problem. It consists in an objective function to optimize on a set of integer variables, subject to a set of linear constraints. This approach is very flexible, allowing to model mathematically a large range of problems. It has been applied to various domains, from economy to industry. To our knowledge, only one team has used integer programming for RNA secondary structure prediction with pseudoknots. First they developed an integer program [6] to find the structure of MFE using the stacking energy parameters of Mfold 3.0 [15]. Then they provided the IPknot software [10] based on an MEA model using base pair probabilities computed with different models like the McCaskill [16] or the Dirks and Pierce [8] models. This team also used integer programming to predict RNARNA interactions [17]. Note that integer programming has also been employed in related domains such as multiple RNA sequencestructure alignment [18] or 3D RNA structure by inserting local 3D motifs in RNA secondary structure [19].
In this paper, we propose an original method based on biobjective integer programming minimizing two criteria for the prediction of RNA secondary structures with pseudoknots. This approach allows us to combine two thermodynamic models into a single biobjective integer program (BOIP), from which we can get the set of optimal secondary structures having the best tradeoff between the two criteria. Note that a method to find biobjective optimal solutions for the RNA folding problem, combining also two thermodynamic models, namely the MEA and the MFE models, was also developed [14]. This method defines a binary Pareto product operator using algebraic dynamic programming and studies different implementations of this operator. The authors showed that this combination generates Pareto sets with some diversified structures with their variations. As stated before, suboptimal solutions are equally of great interest from a biological point of view. We therefore propose an algorithm to retrieve the kbest (sub)optimal solutions for any BOIP and apply it to our specific issue. In this work, we consider a first model based on the MEA model proposed in [10], to which we will refer as Mod1. A second model, based on the MFE model proposed in [6], will be refered as Mod2. We have thus performed the following steps:

We developed an original generic algorithm, that allows to return several optimal and several suboptimal solutions for any BOIP.

We combined the two thermodynamic models Mod1 and Mod2 for prediction of RNA secondary structure with pseudoknots into one BOIP.

We implemented this BOIP with our generic algorithm to predict several optimal and several suboptimal RNA secondary structures. The tool is called BiokoP (Biobjective programming pseudoknot Prediction) and is available on our EvryRNA platform.
We evaluated our algorithm on a dataset of 198 pseudoknotted RNA sequences from PseudoBase++ [20]. The first observation is that the real structure is often given by a suboptimal solution, which confirms the need of returning suboptimal solutions. BiokoP was then compared with other tools proposing several solutions for pseudoknotted RNA secondary structure prediction. To our knowledge, only two tools are available in the literature, namely pKiss [4] and McGenus [5]. BiokoP was also compared to IPknot [10], in the case where one solution is returned. Considering the dataset of pseudoknotted secondary structures, BiokoP gives better F_{1}scores than the other tools. The results in function of the type of pseudoknots show that BiokoP gives homogeneous results regardless of the pseudoknot type. Indeed, the F_{1}scores are always higher than 70% for any number of solutions returned, contrary to those of pKiss and McGenus. The results also show that BiokoP is more likely to return the best structure (according to the F_{1}score) among the optimal solutions than the other tools. We also experimented BiokoP on a dataset of pseudoknotfree RNA sequences from RNA STRAND [21]. We compared BiokoP on this dataset with the other tools and with RNAsubopt [22]. RNAsubopt is able to predict pseudoknotfree structures and suboptimal solutions. The results show that BiokoP is able to predict pseudoknotfree secondary structures with F_{1}scores close to those of RNAsubopt and better than those of pKiss and McGenus.
The paper is organized as follows: in the “Methods” section, we start by giving some fundamental definitions in multiobjective optimization. We present our algorithm, which aims to compute several solutions (optimal and suboptimal), for any BOIP. Then, we present how we combined the two models Mod1 and Mod2 into a single BOIP to predict RNA secondary structures with pseudoknots. The “Results” section is devoted to the experimental evaluation of our method. Finally, we discuss about our results in the “Discussion” section and we conclude and give some perspectives in the “Conclusion” section.
Methods
Our work is based on integer programming which consists in optimizing an objective function according to linear constraints over a set of integer decision variables [23]. It allows to model very different problems. Integer programming is usually used to obtain an optimal solution, but here, the purpose is to obtain also several suboptimal solutions.
We are interested in optimizing several objective functions, corresponding here to different models for RNA secondary structure prediction. We thus have a biobjective integer program, and the set of optimal solutions is called the Pareto set. As said before, regarding our biological context, we are interested in finding optimal and suboptimal solutions. In a multicriteria setting, it means to compute suboptimal Pareto sets, namely the kbest Pareto sets for k≥1. Hence, we present a new method to generate those sets for a generic biobjective integer program (BOIP). We would like to stress out that this is a totally new problem to our knowledge, this should not be confused with the traditional problem of finding approximate Pareto sets. Indeed, in the latter approach, one wants to find an approximation of the exact Pareto set, whereas in our method we find the exact (sub)optimal Pareto sets.
The biobjective integer programming
A multiobjective integer program (IP) is an IP with more than one objective function. In the sequel, we consider the case where there are only two objective functions, denoted by f_{1} and f_{2}, and one wants to minimize them. In that case we say that we have a BOIP. Given a BOIP, we denote by \(\mathcal X\) its set of feasible solutions, i.e., the set of solutions satisfying all constraints. Let x and x^{′} in \(\mathcal X\) be two solutions. We say that x dominates x^{′}, denoted by x≻x^{′}, if and only if f_{1}(x)≤f_{1}(x^{′}) and f_{2}(x)≤f_{2}(x^{′}), where at least one inequality is strict. Since, in general, there does not exist a solution dominating all other solutions, we are looking for a tradeoff. A solution \(x\in \mathcal X\) is Pareto efficient if and only if there does not exist a solution \(x'\in \mathcal X\) such that x^{′}≻x. The Pareto set is \({\mathcal P} := \{x\in {\mathcal X}: x\ \text {is Pareto efficient}\}\). It is the set of solutions which are not dominated by other solutions. The Pareto front is \({\mathcal F} := \left \{\left (f_{1}(x),f_{2}(x)\right) : x\in {\mathcal P} \right \}\). Figure 1a illustrates those definitions.
Many methods exist to solve multiobjective combinatorial optimization problems and BOIP. There are methods for finding the exact Pareto front [24–28] or an approximation of it [29, 30]. A first difference of our approach with the majority of the above works is that we are rather interested in finding the Pareto set instead of the Pareto front, and in case there are several solutions with the same values for each objective function, we want to find them all. Another more fundamental difference is that we are also interested in computing suboptimal Pareto sets, namely the kbest Pareto sets with k≥1. For example, the second best Pareto set corresponds to the best tradeoff when the solutions belonging to the first Pareto set have been removed. In other words, when the first Pareto set is removed, the remaining nondominated solutions form the 2best Pareto set. Figure 1b shows several kbest Pareto sets.
Algorithm for finding the kbest Pareto sets
In this section, we present an original generic algorithm we developed to compute the kbest Pareto sets for any BOIP:
min f_{1}(x)
min f_{2}(x)
subject to:
The constraints are described here as linear functions g_{ k } of x.
For the clarity of the presentation let us assume first that all the variables in the BOIP are binary ones. In that case, given a set F of forbidden solutions, we denote by P_{1}(λ_{ min },λ_{ max },F) the following IP:
min f_{1}(x)
subject to:
In this IP the first objective function f_{1} to be minimized stays the same. The second objective function f_{2} is introduced by two constraints which will maintain its value between λ_{ min } and λ_{ max }.
For each solution s in F, a constraint DIFF (s), also present in [31], is added. This constraint forbids to find the solution in F again. The constraint is defined in the following way. Let assume we have found a solution \(x^{*}=\left (x^{*}_{1},x^{*}_{2},\dots,x^{*}_{n}\right) \in F\) of a binary IP. Let define \(B:=\left \{ i  x^{*}_{i}=1\right \}\) and \(N :=\left \{ i  x^{*}_{i}=0\right \}\). The DIFF (x^{∗}) constraint is: \(\sum _{i\in B}(1x_{i}) + \sum _{i\in N}{x_{i}} \geq 1\). This constraint ensures that the (Hamming) distance between any feasible solution s and the solution x^{∗} is at least one. Therefore, there must be at least one variable x_{ i } which takes a different value from \(x^{*}_{i}\).
For the more general case, i.e. for BOIP with integer decision variables, this time, several binary and continuous variables together with several constraints must be added to the IP, leading to a mixed linear program [32]. For each solution \(x^{*} = \left (x^{*}_{1},x^{*}_{2},\dots,x^{*}_{n}\right)\in F\), we create the n binary variables α_{ i }∈{0,1} for 1≤i≤n, and the n+1 continuous variables, W_{ i }≥0, (1≤i≤n) and 0≤θ≤1, together with the following constraints (M being a large constant):
Of course, these modifications do not change the main algorithm, their aim is to forbid the solutions in F. In the following, we denote again by P_{1}(λ_{ min },λ_{ max },F) the resulting mixed linear program.
We denote by P_{2} the following IP:
max f_{2}(x)
subject to:
The general idea of our algorithm is to recursively perform a dichotomic search in the areas above and below each new solution found. We denote by nb the number of Pareto sets seeked. At the end of the algorithm, the set \(\mathcal R\) will contain all the solutions belonging to the kth Pareto sets, for 1≤k≤nb. For each solution s found during the execution of the algorithm, we have a label, denoted by l(s), indicating the index of the set this solution belongs to, i.e., l(s)=k iff the solution s belongs to the kth Pareto set.
Our algorithm, called FindKParetoSets works as follows. First, we find a (leftmost) solution L, minimizing the f_{1} criterion. We set its label to 1, l(L):=1, and this solution is added to the set \(\mathcal R\). Notice that since there can exist several solutions minimizing f_{1} with different f_{2} values, this solution does not necessarily belong to the first Pareto set. In that case, its correct label will be set during the remaining execution of the algorithm. Then, we compute the solution U maximizing the f_{2} criterion. An f_{1} value of a solution s is noted as s_{1}, and in the same manner, s_{2} defines the f_{2} value. In the following, U_{2} will serve as an upper bound for the recursive search. Finally the localPareto() procedure is called and performs the recursive search, first below L, between −∞ and L_{2}−ε according to the f_{2} criterion, and then above L, between L_{2} and U_{2}. Here ε is a very small constant such that for any pair of solutions s,s^{′} one has either f_{2}(s)=f_{2}(s^{′}) or f_{2}(s)−f_{2}(s^{′})>ε.
The localPareto() procedure is described below. Each search, corresponding to the computation of a portion of a Pareto set, is done between two values, denoted by λ_{ min } and λ_{ max }, that are taken as two arguments. The set F represents a set of solutions previously found between λ_{ min } and λ_{ max }, that we could find again by solving P_{1}. To avoid it, the solutions of F are forbidden as explained before. If the IP P_{1}(λ_{ min },λ_{ max },F) has a solution s (lines ????), by default its label is set to 1 (line ??). Then, the label of s must be computed according to lines ????. If the label is inferior or equal to nb+1, the solution s is added to \({\mathcal R}\). If necessary, the labels of some previously found solutions of \(\mathcal R\) are updated (lines ?? to ??). Finally, the localPareto() procedure is called to search below s (between λ_{ min } and s_{2}−ε) and above s (between s_{2} and λ_{ max }) if the label is inferior to nb.
Example We show an example of an execution of the algorithm FindKParetoSets to find three Pareto sets. We solve the BOIP presented in the following section, with the PKB101 RNA from the satellite tobacco mosaic virus. Figure 2 shows the three Pareto sets obtained and summarizes the recursive search.
The first step of our algorithm is to find the solution denoted L, by solving the BOIP (line ??), and add it to the set \({\mathcal R}\) (line ??). Then a maximum threshold U_{2} is found by solving P_{2} (line ??) to search above the first solution L. A search below the solution L is done (line ??) and the solution s^{1} is found. In the localPareto() procedure, the solution s^{1} obtains the label of the previous solution L. A search below s^{1} is done, but no solution is found. The search above s^{1} is done and s^{2} is found. The recursive search continues until no additional solution is found.
Biobjective integer programming for predicting RNA secondary structures with pseudoknots
In this paper, we propose a method for predicting RNA secondary structures with pseudoknots using the algorithm presented based on a BOIP. Our method allows to return several optimal and several suboptimal solutions, optimizing two objectives related to an MEA model and an MFE model. The MEA model, to which we will refer as Mod1, is based on the model proposed in [10] and uses the Dirks and Pierce set of thermodynamic parameters [8]. The MFE model, to which we will refer as Mod2, is based on the model proposed in [6]. Mod1 and Mod2 can describe all kinds of pseudoknots. In the following, we present first how an RNA structure with pseudoknots can be modeled. Then we describe how we combine Mod1 and Mod2 into one BOIP.
Modeling RNA secondary structures with pseudoknots
In Mod1 and Mod2, the RNA secondary structures are modeled in the following way. An RNA sequence s is composed of n nucleotides or bases which can be A, U, G or C. Each base can be paired according to the WatsonCrick (AU and GC) or the Wobble (GU) pairings. To take into account the pseudoknots, it is assumed that a secondary structure can be decomposed into m pseudoknotfree substructures y^{1}, y^{2},…, y^{m}, called levels. The levels are disjoint sets meaning that a base pair belongs to exactly one level. From experimental data, it is generally assumed that two levels are sufficient to describe most known RNA structures. Then, in the following, m=2.
A base pair between the bases i and j in level p is represented by a binary variable \(y_{ij}^{p}\) equal to 1, with i=1,…,n and j=i+1,…,n. If there is no base pair between i and j, \(y_{ij}^{p}\) is equal to zero.
The possible types of base pairs correspond to integer values 1,…,6: AU has the value 1, CG the value 2, GC the value 3, GU the value 4, UG the value 5 and UA the value 6.
The possible stacks of two base pairs (i,j) and (i−1,j+1) in level p are defined with binary variables \(x_{ij}^{klp}\), with k and l representing the possible types of base pairs. If \(x_{ij}^{klp}\) is equal to 1, then the bases i and j, and the bases i+1 and j−1 are paired, and in the case where \(x_{ij}^{klp}\) is equal to zero, there is either one base pair or no base pair at all.
Predicting RNA secondary structures with pseudoknots by combining two models
In the BOIP, we combine Mod1 and Mod2. The objective of Mod1 is to find the MEA structure with none pseudoknot or with one or several pseudoknots of any type.
The MEA structure is found by the computation of base pair probabilities with the Dirks and Pierce model [8]. We set as f_{1} the approximation of the expected accuracy:
where β^{p} are constants for each level p, fixed to β^{p}=1/m, p_{ ij } are the base pair probabilities computed with the Dirks and Pierce model and θ^{p} is a threshold aiming to ignore the lower base pair probabilities.
The objective of Mod2 is to seek the MFE structure. The MFE function consists in the sum of the energies of each stack \(x_{ij}^{klp}\) of two base pairs:
with e_{ kl } the energy given in [6], depending on the types k and l of the two base pairs.
For the need of the algorithm, the sign of the function f_{1}(y) is changed to have two objective functions to minimize.
The constraints of the BOIP enforce that any feasible solution corresponds to a feasible folding configuration of a secondary structure of RNA. They define basic rules (Fig. 3) such as making impossible for a base i to be paired with several bases, forbidding the presence of pseudoknots on the same level and forbidding isolated base pairs. Also, adding pseudoknots in the structure is penalized since they are rare, according to the known structures. The DIFF constraints will be added for any solution in F. This constraint adapted to our BOIP is:
with \(B^{p}=\left \{ ij  y_{ij}^{*p}=1\right \}\) and \(N^{p} =\left \{ ij  y_{ij}^{*p}=0\right \}\).
In our BOIP, the pseudoknot levels can be inverted, causing the generation of different solutions (that have not necessarily the same objective values) corresponding to the same structure. To avoid this redundancy, the following constraint is added:
This constraint corresponds to the previous constraint but the levels of the sets B and N are inverted. Then, the base pairs of the level 1 are forbidden in level 2 and vice versa.
Results
The BOIP presented for predicting RNA secondary structures with pseudoknots is implemented using the CPLEX Optimizer V12.6.3 solver [33]. Our algorithm is implemented with ε=0.001 and m=2. The obtained tool, called BiokoP, is available on our EvryRNA platform.
In the following, we first present the datasets we use for the evaluation of BiokoP, then the experiments showing the distribution of real structures found over the generated solutions. The next section is devoted to a statistical analysis of structures predicted by BiokoP and by other tools from the literature. We end by giving some information on the execution time of BiokoP.
Datasets
We evaluate our approach on a dataset of pseudoknotted RNAs we built from the PseudoBase ++ database [20]. This dataset gathers 198 sequences whose lengths range from 21 to 128 nucleotides.
PseudoBase++ classifies the sequences by the pseudoknot types. We recovered five types of pseudoknots: H (Htype), HHH (kissing hairpin), HLout, HLin and LL. The types, described in Fig. 4, are defined in function of the topology of the pseudoknot. In our dataset, there are 154 pseudoknotted RNAs of type H, 3 of type HHH, 26 of type HLout, 4 of type HLin and 11 of type LL. All the RNAs of type H come from the dataset of 168 sequences built by Huang et al. [34] from PseudoBase ++. This dataset excludes redundant sequences. The remaining RNAs were recovered on the database by requests according to the type of pseudoknots.
We also built a second dataset of pseudoknotfree RNAs from the RNA STRAND database [21]. It gathers 145 nonredundant sequences whose lengths range from 10 to 97 nucleotides.
These datasets are available on the EvryRNA platform.
Distribution of real structures over the returned solutions
In this section we study the ability of BiokoP to find the real structures. The purpose is to analyze where the real structures are found, over the Pareto sets or in function of the number of solutions returned. This section is also devoted to a comparison between BiokoP, Mod1 and Mod2 in order to determine the contribution of BiokoP.
Distribution of real structures over the Pareto sets
We study the distribution of real structures returned by BiokoP on our dataset of pseudoknotted RNAs over the Pareto sets. The real structure is the structure that corresponds exactly to the referenced structure for a given RNA.
To study the distribution of real structures, as the number of solutions of a Pareto set can not be predicted, note that in order to have 30 solutions per RNA, the mean number of Pareto sets to compute is 5.2. The distribution of real structures found are displayed in Table 1. Around the half of real structures found are in the first Pareto set (45 over 83). These structures are the optimal ones, showing the relevance of combining these MEA and MFE models. The real structures corresponding to suboptimal solutions are distributed in the first suboptimal Pareto sets, mainly the second (15) and the third (13). The remaining solutions are scattered in the remaining Pareto sets. The position of these suboptimal solutions supports with the fact that the real structure is often a suboptimal solution. This suggests that the suboptimal solutions returned by BiokoP are diversified and that our approach finding the kbest Pareto sets allows to find pertinent suboptimal solutions. Finally, it appears that the first Pareto sets are more useful for this combination of models than the last Pareto sets which do not guarantee to find the real structure. Indeed, the quality of solutions decreases when the number of computed Pareto sets increases. Hence, we recommend to the users to compute three Pareto sets in mean to obtain a relevant set of solutions.
Distribution of real structures in function of the number of solutions returned
This section is devoted to the distribution of real structures found by BiokoP in function of the number of solutions returned on our dataset of pseudoknotted RNAs, and to the comparison with Mod1 and Mod2, in order to show the pertinence of combining these two models on one hand, and to return several solutions on another hand. We extended Mod1 and Mod2 so that they return the kbest solutions, using the constraint [31] presented in the “Algorithm for finding the kbest Pareto sets” section. We refer to these extensions as Mod1^{so} and Mod2^{so} (so stands for suboptimal). The results are reported in Fig. 5.
BiokoP is made to return sets of solutions and all the solutions belonging to one Pareto set are not comparable. Then, this experiment requires to rank the solutions of the Pareto sets returned by BiokoP in order to compare the solutions one against the others. The solutions of each Pareto set are ranked in the following manner: the solutions optimizing equally the two objectives, i.e., the solutions closer to the diagonal, are better ranked.
The results on the dataset of pseudoknotted RNA show that, as expected, BiokoP predicts more real structures than Mod1 and Mod2 (corresponding respectively to Mod1^{so} and Mod2^{so} for one solution returned). Indeed, BiokoP, Mod1 and Mod2 return the real structure for respectively 32, 25 and 23 RNAs. We observe that, in these sets of real structures returned by Mod1 and Mod2, 12 RNAs are identical. Those RNAs also show up in the set of real structures returned by BiokoP. In the remaining real structures found by BiokoP, 6 are neither found by Mod1 nor by Mod2. This shows clearly the pertinence of combining Mod1 and Mod2. Besides, we note that BiokoP finds all the real structures found by Mod1. Some real structures found by Mod2 are not found by BiokoP when one solution is returned but they are all found by BiokoP in the first Pareto set. The real structures found by Mod1 and Mod2 are all returned by BiokoP as optimal solutions, showing that our algorithm succeeds to take benefit from both models.
The more there are solutions, the more BiokoP is likely to find the real structure, and with a fast increase in probability. We observe that after about 20 solutions returned (for about 2 or 3 Pareto sets), the number of real structures found seems to be stable, which supports the results of the previous section. In case of Mod1^{so} and Mod2^{so}, the number of real structures found quickly reaches a plateau amounting to 7 8 solutions returned. This is due to the lack of diversity of the suboptimal solutions. Indeed, the suboptimal solutions are essentially similar to the optimal one: they are derived from the optimal solution by removing only very few base pairs. When the optimal solution is close to the real structure, the real structure can be found quickly as a suboptimal solution, explaining the increase of the curve for a small number of returned solutions.
Finally, this experiment shows that the optimal and suboptimal solutions returned by BiokoP are more likely to contain the real structure compared to those of Mod1^{so} and Mod2^{so}.
Comparison of BiokoP with the literature
Considered software
To evaluate the performances of BiokoP, we compare it with other methods predicting pseudoknotted RNA secondary structures that are able to return several solutions. To our knowledge, only two methods are available in the literature, namely pKiss [4] and McGenus [5]. The principle of pKiss is to decompose the RNA sequence into every possible subwords and to compute the MFE secondary structure of the decompositions. To reduce the search space, pKiss is based on the canonical rules which reduce the number of possible predicted pseudoknots (only certain canonical and kissing pseudoknots) and the redundancy thanks to a nonambiguous dynamic programming algorithm. McGenus is based on a Monte Carlo algorithm which search for a minimum score which includes the energy and the genus of the secondary structure. The genus expresses the complexity of a pseudoknot. McGenus performs a stochastic search that allows to find various types of pseudoknots.
We also compare BiokoP with IPknot [10] and RNAsubopt from the ViennaRNA package [22]. RNAsubopt predicts pseudoknotfree RNA secondary structures using an MFE algorithm to compute all the suboptimal structures in an energy range.
For the evaluation, we consider the first solution returned by IPknot and the 30 first solutions returned by BiokoP, pKiss, McGenus and RNAsubopt. IPknot (version 0.0.4) was executed with the Dirks and Pierce set of thermodynamic parameters and with the options g 2 and g 4. pKiss (version 2.2.12) was executed with the default parameters. We used the option relativeDeviation to obtain up to 30 solutions for each RNA. McGenus (version 7.0) was also executed with the default parameters, with the option nsuboptimal to obtain 30 solutions. We executed RNAsubopt (version 2.3.3) with the option e to obtain 30 solutions and with the option s to sort the solutions by energy.
For pKiss, McGenus and RNAsubopt, the solutions are ranked in the returned order, i.e., in the ascending order of energies. For BiokoP, as the solutions belonging to the same Pareto set are returned in an arbitrary order and are not comparable, we adopt the same ranking as in the previous section. We consider that the best solutions are the ones that optimize equally the two objectives, and are therefore closer to the diagonal.
Statistics used
To evaluate the quality of a predicted structure, the statistics usually used are the sensitivity, the positive predictive value (PPV) and the F_{1}score. The sensitivity measures the ability of finding positive base pairs, while the PPV measures the ability of not finding false positive base pairs. The F_{1}score is the harmonic mean between the sensitivity and the PPV. The three measures are calculated as follows:
where TP is the number of true positive base pairs, FN is the number of false negative base pairs, FP is the number of false positive base pairs, and TN is the number of true negative base pairs. These statistics allow to measure the quality of one solution regarding a structure of reference. In our case, we study methods returning several solutions; therefore, these statistics should be adapted to be able to measure the quality of a set of n solutions regarding a structure of reference. Here we propose to calculate these measures as follows:
where M, a measure corresponding for instance to the F_{1}score of a set of solutions, is calculated in function of the measure M(s_{ i }) corresponding to the F_{1}score of a solution s_{ i }, weighted by the rank i of the solution. Of course, the more the rank of a solution is low, the more the solution is important, since the corresponding criteria are optimized.
Overall results
In this section are presented the results obtained on the dataset of 198 pseudoknotted RNAs. Table 2 reports the weighted means of sensitivities and PPVs in function of the number of solutions returned for BiokoP, pKiss and McGenus. We observe that BiokoP has better sensitivities than pKiss and McGenus and that, when the number of returned solutions increases, the gap between the sensitivity of BiokoP and the one of the other tools increases. Regarding the weighted means of PPVs, we observe that BiokoP outperforms McGenus.
In Fig. 6 we present the weighted means of F_{1}scores obtained by each tool, in function of the number of solutions returned. BiokoP has higher F_{1}scores than pKiss and McGenus. The F_{1}scores of BiokoP are quite stable. There is only a decrease of 10 points going from 1 to 30 returned solutions, whereas there is a decrease of 15 and 18 points for pKiss and McGenus. This suggests that the quality of predicted structures of BiokoP, unlike pKiss and McGenus, is stable when the quantity of returned solutions increases.
For one solution returned, BiokoP gives similar results to IPknot (IPknot gives a mean sensitivity of 80.6%, a mean PPV of 75.1% and a mean F_{1}score of 77,0%).
Results over optimal solutions
The purpose of this section is to complete and to precise the results given by the previous statistics. It is not obvious to compare the optimal solution returned either by pKiss, McGenus or IPknot with only one solution obtained by an arbitrary ranking of the solutions of the optimal Pareto set given by BiokoP. Indeed, the solutions of a Pareto set are not comparable. We thus focus here on the comparison of the first solutions returned, i.e. the optimal solutions of BiokoP (the first Pareto set) and the optimal (one) solution returned by the other tools. Figure 7 reports the F_{1}score results for the optimal solutions of BiokoP versus pKiss, McGenus and IPknot, for each RNA of the dataset of pseudoknotted RNAs. The RNAs are sorted according to the ascending order of the maximum F_{1}score of BiokoP. For BiokoP, we report the maximum and minimum F_{1}scores of the set of solutions for each RNA. BiokoP finds a better solution than pKiss for 84 RNAs (among 198) and than McGenus for 103 RNAs while the optimal solutions found by pKiss and McGenus are better than the optimal solutions of the set generated by BiokoP for respectively 54 and 39 RNAs. The results show that BiokoP returns 61 better solutions compared to IPknot, while IPknot does not return better solutions compared to BiokoP. Returning several optimal solutions allows BiokoP to obtain the best solution more times than the other tools.
Finally, we observe that the gap between the minimum and the maximum F_{1}scores of BiokoP can be important. This shows that BiokoP returns a diversified set of optimal solutions.
Results by pseudoknot types
Figure 8 reports the F_{1}score results in function of pseudoknot types and of the number of solutions returned. The results for the Htype pseudoknots are very similar to the results of the entire dataset, which is not surprising since the Htype is largely represented in it (154 among 198 RNAs). The HHH and HLout pseudoknot types are better predicted by McGenus, with weighted means of F_{1}scores around 84 and 79% respectively. However, for the HLin and LL types, BiokoP outperforms pKiss and McGenus with weighted means of F_{1}scores around 70% (HLin) and 75% (LL), whereas the weighted means of F_{1}scores of pKiss and McGenus are around 60 and 50% respectively for the HLin type and around 70%, for both tools, for the LL type. The results show that compared to IPknot, BiokoP obtains better F_{1}scores for the HHH and the LL pseudoknot types, and similar F_{1}scores for the other types when considering one solution returned.
The BOIP of BiokoP has been modeled to be able to predict any kind of pseudoknots. This is confirmed by the results obtained that are very homogeneous. Indeed, the F_{1}scores of BiokoP are never lower than 70% for any number of solutions returned. This is not the case for pKiss and McGenus, for which we can observe that the results depend greatly on the pseudoknot type. In particular, they obtain F_{1}scores around 50% for the HLin type. Since the datasets of some pseudoknot types are small (3 HHH, 26 HLout, 4 HLin, 11 LL and 154 H), further experiments need to be done to confirm the results.
Finally, when one wants to predict a secondary structure of an RNA, there is generally no information about the pseudoknot type. Therefore, it is better to use BiokoP, which allows to predict structures as close as possible to the real ones, regardless of the pseudoknot type.
Results on pseudoknotfree RNAs
In order to determine if BiokoP is able to predict pseudoknotfree structures of RNAs, we evaluated it on our dataset of pseudoknotfree RNAs and compared the obtained results with those obtained by pKiss, McGenus, IPknot and also RNAsubopt which is a tool dedicated to pseudoknotfree RNA prediction.
In Fig. 9, we present the weighted means of F_{1}scores obtained by BiokoP, pKiss, McGenus, RNAsubopt and IPknot on the pseudoknotfree RNAs, in function of the number of returned solutions. BiokoP, pKiss and RNAsubopt give comparable results, showing that BiokoP and pKiss are both able to predict pseudoknotfree structures, unlike McGenus. We expected a bigger difference with RNAsubopt since this tool is made to generate only pseudoknotfree structures, but finally, it seems that BiokoP and pKiss do not suffer from a bias due to their purpose of returning pseudoknotted structures.
Illustration on some examples of RNAs
In order to give an overview of the predictions of BiokoP, we tested it on two known RNAs, for 30 solutions returned. We evaluated BiokoP with the pseudoknot region PK4 (H type) [35] of the Legionella Pneumophila tmRNA. The referenced structure of this RNA and the best solution according to the F_{1}score returned by BiokoP, pKiss, McGenus, RNAsubopt and IPknot are presented in Fig. 10a. The views are obtained with forna [36], the RNA visualization tool from the ViennaRNA Web Service. We observe that the exact referenced structure is found by BiokoP at rank 15, in the second Pareto set. McGenus and pKiss find the global structure, but miss in part or completely the pseudoknot. RNAsubopt finds the same structure than pKiss. IPknot predicts the exact pseudoknot and the exact referenced structure except for one different base pair. All the tools have a high F_{1}score for this RNA, but BiokoP and IPknot are the only ones that succeed in predicting the exact pseudoknot. Furthermore, BiokoP is the only one that predicts the exact referenced structure.
We also study the results of BiokoP on a pseudoknotfree hammerhead ribozyme [37] (Fig. 10b). This particular structure, composed of three helices, is much studied for the understanding of the structurefunction relation of RNAs. The best solution returned by BiokoP has a F_{1}score of 81.3% and is returned at rank 9, in the second Pareto set. BiokoP finds the two first helices and some base pairs of the third one, but predicts an extra pseudoknot of two base pairs. pKiss and McGenus find the two first helices of the RNA. However, pKiss predicts an hairpin that does not exist in the referenced structure and McGenus does not find any base pair of the third helix. RNAsubopt finds the two first helices and some base pairs of the third one. The structure predicted by IPknot is identical to the one predicted by BiokoP, but its pseudoknot possesses an additional base pair. This example supports the previous results on the pseudoknotfree RNA dataset. As RNAsubopt predicts structures without pseudoknots, its results are better than the ones obtained by the other tools which can predict extra pseudoknots. However, the best solution of RNAsubopt is predicted at rank 25 which is far compared to the one of BiokoP.
Execution time
The complexity of solving optimally integer programs is exponential with respect to the number of variables, which depends mainly on the length of the sequence. Moreover, we deal here with multiobjective programming which increases the number of BOIPs to solve and their number of variables. As estimation of the time of BiokoP, the prediction of 30 secondary structures for an RNA of 30 nucleotides on Debian OS with 24 Intel Xeon CPU X5660 (2.8 GHz) and 64Go RAM took, in average, 22 s. For the pseudoknot region PK4 [35] of the Legionella Pneumophila tmRNA (PKB70) of 55 nucleotides, BiokoP took 5min30.
Discussion
We developed an original generic algorithm to solve BOIPs and to return several optimal solutions (the exact Pareto set) and several suboptimal solutions (the suboptimal Pareto sets). We proposed a BOIP that combines two models of RNA secondary structure prediction with pseudoknots, namely the MEA and the MFE models. The implementation of our algorithm with the proposed BOIP has led to a tool called BiokoP (Biobjective programming pseudoknot Prediction).
Since our method is based on multiobjective optimization, BiokoP returns several sets of solutions (the kbest Pareto sets). Those Pareto sets are returned in an optimal order. However, all the solutions belonging to a Pareto set are not comparable and hence cannot be meaningfully ordered within the set, whereas the tools of the literature return solutions which are ordered optimally. Hence, comparing the results of BiokoP and of the tools of the literature raises some difficulties and, in multiobjective optimization, defining pertinent performance metrics for the Pareto sets is a subject of research [38]. We have chosen to rank the solutions within each Pareto sets assuming that a solution is better if it optimizes equally the two objectives, i.e., if the solution is close to the diagonal. This ranking allows us to show the benefits of combining Mod1 and Mod2. Indeed, the experiments show that for the first ranked solution returned, BiokoP is more likely to return the real structure than Mod1 and Mod2 (Fig. 5). The experiments also show that this combination allows to obtain homogeneous F_{1}scores compared to the other tools regardless of the number of solutions returned, the type of pseudoknots and the presence or not of pseudoknots (Figs. 6, 8, and 9).
However, this ranking shows some limitations. It is illustrated by the results detailed by pseudoknot types, where we observe that the solution giving the best F_{1}score is not always the first one (Fig. 8). To have a better idea of the global quality of the overall first Pareto set, we decided to compare it versus the optimal solution of the other tools, by studying the best and the worst structures (according to the F_{1}score) of the first Pareto set. The results show that in most cases, BiokoP finds more often better solutions than the other tools (Fig. 7).
The numerous experiments we performed show:

that the combination of Mod1 and Mod2 is relevant: BiokoP returns all the real structures found by Mod1 and Mod2 as optimal solutions and is able to find more real structures (in optimal or in suboptimal Pareto sets) than the two models alone;

and that BiokoP, compared to the literature, achieves the best compromise between the number of times the best solution (with the highest F_{1}score) is found and the global quality of predictions.
Conclusion
In this paper, we provide an original approach for predicting RNA secondary structures with pseudoknots that is based on the computation of several optimal and several suboptimal solutions with respect to two different models. Our method is based on integer programming approach, that presents the advantage to be more flexible compared to the dynamic programming approach usually used in the RNA secondary structure prediction. In order to combine the two models, we developed a biobjective integer program (BOIP), and proposed a generic novel algorithm to compute the sets of optimal and suboptimal solutions for any BOIP. We applied it to the prediction of RNA secondary structures with pseudoknots, obtaining the software BiokoP. The experimental tests performed have confirmed the importance of considering suboptimal solutions in addition to optimal ones. We have shown that our method successfully combines Mod1 and Mod2, taking benefit from the two models.
BiokoP was compared with other tools from the literature for RNA secondary structure prediction with pseudoknots that propose several solutions, namely pKiss [4] and McGenus [5]. BiokoP was also compared to IPknot [10]. IPknot predicts pseudoknotted structure, but no suboptimal solutions. Finally, BiokoP was compared to RNAsubopt [22] which can predict only pseudoknotfree structures, but with suboptimal solutions. Considering a set of 198 pseudoknotted secondary structures gathered from Pseudobase++ [20], BiokoP gives better F_{1}scores than pKiss and McGenus. We also show that BiokoP returns more time the best solution (with the higher F_{1}score) than pKiss, McGenus and IPknot, considering the optimal solutions. Even if BiokoP is made to predict pseudoknotted RNAs, the F_{1}score results for the dataset of pseudoknotfree RNAs are close to the results of RNAsubopt. The results of BiokoP are widely homogeneous regardless of the pseudoknot type or the presence or not of pseudoknots. Indeed, the F_{1}scores are always higher than 70% and regardless of the number of solutions returned, unlike pKiss and McGenus.
A drawback of our approach is the time complexity due to the need to optimally solve several integer programs. As stated before, this complexity is exponential, with respect to the length of the sequence. A perspective to decrease the execution time is to use parallelism. Indeed, due to its recursive shape, the generic algorithm we propose to solve BOIP is suitable to parallelization. Another idea to decrease the complexity of the BOIP would be to add constraints when some information on the structure are known: base pairs, presence or not of pseudoknots, type of pseudoknots.
In this work, we combined two models inspired from the literature. A perspective is to combine better monocriterion models in order to raise the global quality of the solutions found. We would first try to model the comparative approach with integer programming to propose a combination with one of the thermodynamic model. This would take benefit from the additional information brought by homologous sequences. Finally, another future work will consist in developing multiobjective algorithms in order to combine more than two models. Moving from two objectives to three will require to rethink the current algorithm in three dimensions. This would represent the first step to develop a multiobjective generic algorithm able to return suboptimal Pareto sets for any IP with n objective functions.
References
Namy O, Moran SJ, Stuart DI, Gilbert RJ, Brierley I. A mechanical explanation of RNA pseudoknot function in programmed ribosomal frameshifting. Nature. 2006; 441(7090):244–7.
Sheikh S, Backofen R, Ponty Y. Impact of the energy model on the complexity of RNA folding with pseudoknots. In: Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer: 2012. p. 321–33.
Lyngsø RB, Pedersen CN. RNA pseudoknot prediction in energybased models. J Comput Biol. 2000; 7(34):409–27.
Janssen S, Giegerich R. The RNA shapes studio. Bioinformatics. 2014; 31(3):423–425. Oxford Univ Press.
Bon M, Micheletti C, Orland H. McGenus: a Monte Carlo algorithm to predict RNA secondary structures with pseudoknots. Nucleic Acids Res. 2012; 41(3):1895–1900. Oxford Univ Press.
Poolsap U, Kato Y, Akutsu T. Prediction of RNA secondary structure with pseudoknots using integer programming. BMC Bioinformatics. 2009; 10(Suppl 1):38.
Reeder J, Giegerich R. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics. 2004; 5(1):104.
Dirks RM, Pierce NA. A partition function algorithm for nucleic acid secondary structure including pseudoknots. J Comput Chem. 2003; 24(13):1664–77.
Rivas E, Eddy SR. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol. 1999; 285(5):2053–68.
Sato K, Kato Y, Hamada M, Akutsu T, Asai K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics. 2011; 27(13):85–93.
Bellaousov S, Mathews DH. ProbKnot: fast prediction of RNA secondary structure including pseudoknots. RNA. 2010; 16(10):1870–80.
Engelen S, Tahi F. Tfold: efficient in silico prediction of noncoding RNA secondary structures. Nucleic Acids Res. 2010; 38(7):2453–66.
Ruan J, Stormo GD, Zhang W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics. 2004; 20(1):58–66.
Saule C, Giegerich R. Pareto optimization in algebraic dynamic programming. Algoritm Mol Biol. 2015; 10(1):10–22.
Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003; 31(13):3406–15.
McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 1990; 29(67):1105–19.
Kato Y, Sato K, Hamada M, Watanabe Y, Asai K, Akutsu T. RactIP: fast and accurate prediction of RNARNA interaction using integer programming. Bioinformatics. 2010; 26(18):460–6.
Bauer M, Klau GW, Reinert K. An exact mathematical programming approach to multiple RNA sequencestructure alignment. Algorithmic Oper Res. 2008; 3(2):130–146.
Reinharz V, Major F, Waldispühl J. Towards 3D structure prediction of large RNA molecules: an integer programming framework to insert local 3D motifs in RNA secondary structure. Bioinformatics. 2012; 28(12):207–14.
Taufer M, Licon A, Araiza R, Mireles D, Van Batenburg F, Gultyaev AP, Leung MY. PseudoBase++: an extension of PseudoBase for easy searching, formatting and visualization of pseudoknots. Nucleic Acids Res. 2009; 37(suppl 1):127–35.
Andronescu M, Bereg V, Hoos HH, Condon A. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinforma. 2008; 9(1):340.
Lorenz R, Bernhart SH, Zu Siederdissen CH, Tafer H, Flamm C, Stadler PF, Hofacker IL. Viennarna package 2.0. Algoritm Mol Biol. 2011; 6(1):26.
Williams HP. Model building in mathematical programming. West Sussex: Wiley; 1999.
Özlen M, Burton BA, MacRae CA. Multiobjective integer programming: An improved recursive algorithm. J Optim Theory Appl. 2014; 160(2):470–82.
Özlen M, Azizoğlu M. Multiobjective integer programming: a general approach for generating all nondominated solutions. Eur J Oper Res. 2009; 199(1):25–35.
Bérubé JF, Gendreau M, Potvin JY. An exact εconstraint method for biobjective combinatorial optimization problems: Application to the Traveling Salesman Problem with Profits. Eur J Oper Res. 2009; 194(1):39–50.
Ralphs TK, Saltzman MJ, Wiecek MM. An improved algorithm for solving biobjective integer programs. Ann Oper Res. 2006; 147:43–70.
Ulungu EL, Teghem J. The twophases method: An efficient procedure to solve biobjective combinatorial optimization problems. Fondations Comput Decis Sci. 1995; 20:149–65.
Bazgan C, Jamain F, Vanderpooten D. Approximate pareto sets of minimal size for multiobjective optimization problems. Oper Res Lett. 2015; 43(1):1–6.
Vassilvitskii S, Yannakakis M. Efficiently computing succint tradeoff curves. Theor Comput Sci. 2005; 348(2–3):334–56.
Balas E, Jeroslow R. Canonical cuts on the unit hypercube. SIAM J Appl Math. 1972; 23(1):61–9.
Tsai JF, Lin MH, Hu YC. Finding multiple solutions to general integer linear programs. Eur J Oper Res. 2008; 184(2):802–9.
IBM. CPLEX Optimizer V12.6.3. http://www03.ibm.com/software/products/fr/ibmilogcpleoptistud. Accessed 09 Jan 2018.
Huang X, Ali H. High sensitivity rna pseudoknot prediction. Nucleic Acids Res. 2006; 35(2):656–63.
Williams KP, Bartel DP. Phylogenetic analysis of tmRNA secondary structure. Rna. 1996; 2(12):1306–10.
Kerpedjiev P, Hammer S, Hofacker IL. Forna (forcedirected RNA): Simple and effective online RNA secondary structure diagrams. Bioinformatics. 2015; 31(20):3377–3379. Oxford Univ Press.
GriffithsJones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. Rfam: annotating noncoding RNAs in complete genomes. Nucleic Acids Res. 2005; 33(suppl_1):121–4.
Cheng S, Shi Y, Qin Q. On the performance metrics of multiobjective optimization. Adv Swarm Intell. 2012;504.
Acknowledgements
Fariza Tahi would like to thank the Genopole for the financial support given for the platform EvryRNA.
Funding
The platform EvryRNA is partially supported by the Genopole.
Availability of data and materials
The developed software, BiokoP, and the datasets are available on our EvryRNA platform (https://EvryRNA.ibisc.univevry.fr).
Author information
Authors and Affiliations
Contributions
AL developed the algorithm and tested it. AL recovered the data and analyzed the results. EA contributed to the integer programming part of this work. FT was involved in the RNA secondary structure part and supervised the work. The three authors participated in the writing of the manuscript and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Legendre, A., Angel, E. & Tahi, F. Biobjective integer programming for RNA secondary structure prediction with pseudoknots. BMC Bioinformatics 19, 13 (2018). https://doi.org/10.1186/s1285901820077
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285901820077