Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction

Fig. 1

General view of the Mirnacle approach (Adapted from Tempel and Tahi [14]). Given the input DNA sequence a, a sliding window is used to extract subsequences of length close to the expected pre-miRNA length. For each subsequence, a triangular base pairing matrix (example for the sequence CAGAUUUACUAGUACGUAAUUUG) is constructed and analyzed in three stages. In the first stage (b and c), long exact stems (series of positive numbers in the diagonals) are sought and classified. Next, in the second stage (d and e), for each positively classified exact stem, its diagonal is searched to form a non-exact stem (series of positive numbers interspersed with series of 0’s) that also passes through a classification procedure. Finally, in the last stage (f and g), a complete hairpin is produced from each previously filtered non-exact stem, using the originally identified exact stem as the starting point for a further search in the matrix. In this search, other diagonals are tried so that secondary structures with asymmetrical internal loops are also considered. The resultant hairpins are then classified with a third ML model and only the ones predicted as positives are given as the final output (h)

Back to article page