Skip to main content

Table 1 Datasets characteristics

From: HuntMi: an efficient and taxon-specific approach in pre-miRNA identification

Name

#Positives

#Negatives

Imbalance

human

1 406

81 228

57.8

arabidopsis

231

28 359

122.8

animal

7 053

218 154

30.9

plant

2 172

114 929

52.9

virus

237

839

3.5

microPred

691

9 248

13.4

  1. Characteristics of biological datasets used in the experiments. Imbalance is defined as a ratio of #Negatives to #Positives. We limited dataset imbalance to several tens for practical reasons even though proportions of miRNAs to non-miRNAs in genomes are more extreme. In the case of virus dataset the imbalance is exceptionally low as we wanted to know how methods perform on moderately imbalanced problems. In addition, it is difficult to create representative dataset for viruses as their genomes differ significantly in sizes and most of them do not contain miRNAs.