Skip to main content
Figure 6 | BMC Bioinformatics

Figure 6

From: Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction

Figure 6

Comparison of different MFE prediction programs. Dataset: we use the 147 sequences from the DARTS set, except pdb1ajt1B, pdb1kod1A, pdb1koc1A, pdb1lpw1B and pdb1t4x1B, which crashed under UNAFOLD. Together, all according "PDB" structures contain 1,614 base pairs. All "gold" structures have 1,593 base pairs. Distance: One base pair set, i.e. secondary structure, is the reference (R: table columns), the other one is the prediction (P: table rows). Traditional base pair distance is defined as|R \P| + |P \R|. Following [34], we decide to allow additional base pairs in the prediction, as long as they are compatible with the reference, i.e. both bases are unpaired and the additional base pair does not introduce a pseudoknot in the reference. The set of compatible base pairs is P-c= P\{(a, b)|(a, b) ∉ R Λ (a, b) compatible to R}. Then, our asymmetric base pair distance is: |R \P| + |P-c\R|. Table values are the sums of base pair distances for all 142 sequences. In the case of co-optimal results, the one with the smallest distance to the reference is chosen. Our distance function is rather strict and does not allow base pair slippage. If a gold base pair (i, j) is mispredicted as (i + 1, j), this contributes a distance of 2. Programs: for each RNA sequence we called the programs with the following command line options: RNAFOLD (version 1.8.5): echo sequence | RNAfold -noPS -noLP -dX, where X is 0, 1 or 2. UNAFOLD (version 3.8): hybrid-ss-min --suffix = DAT --mfold --NA=RNA --tmin = 37 --tinc = 1 --tmax = 37 --sodium = 1 --magnesium = 0 --noisolate --nodangle tmpseqfile >/dev/null && ct2b.pl tmpseqfile.ct, with and without the --nodangle switch, where "tmpseqfile" is a fasta file containing the sequence and "ct2b.pl" is a small Perl script from the Vienna Package, which converts RNA structures from "connect" to "dot-bracket" format. CENTROIDFOLD (version v0.0.9): centroid_fold --engine=X tmpseqfile, where X is the source of base pair probabilities and is either computed by RNAFOLD (McCaskill) or by CONTRAFOLD. Our ADP implementation of the four grammars "NoDangle", "OverDangle", "MicroState" and "MacroState" get the sequence as their sole input. The binaries can be built with the source code from the additional file 3 and the Bellman's GAP compiler.

Back to article page