Skip to main content

Table 4 Summary of the adopted feature sets.

From: Predicting microRNA precursors with a generalized Gaussian components based density estimation algorithm

Feature

Description

Set 1

 

   AA, AC, ..., UU

Frequencies of 16 dinucleotide pairs

   %G+C

Percentage of nitrogenous bases which are either G or C

Set 2

 

   mfe2

Ratio of dG to the number of stems

   mfe1

Ratio of dG to %G+C

   dP

Adjusted base pairing propensity. dP is the number of base pairs observed in the secondary structure divided by the sequence length.

   dG

Adjusted minimum free energy of folding. dG is the minimum free energy (MFE) divided by the sequence length.

   dQ

Adjusted Shannon entropy. dQ measures the entropy of the base pairing probability distribution (BPPD).

   dD

Adjusted base pair distance. dD measures the average distance between all base pairs of structures inferred from the sequence.

   dF

Compactness of the tree-graph representation of the sequence.

Set 3

 

   zG, zQ, zD, zP, zF

5 normalized variants of dP, dG, dQ, dD and dF

Set 4

 

   lH

Hairpin length

   lL

Loop length

   lC

Consecutive base-pairs

   %L

Ratio of loop length to hairpin length

  1. The table shows the order of a feature within the feature set. For example, the fifth feature in the second feature set is dQ.