Data sets used
In the current research work, publicly available six miRNA expression data sets with accession number GSE17681, GSE17846, GSE21036, GSE24709, GSE28700, and GSE31408 are used, which are downloaded from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/).
GSE17681
This data set has been generated to detect specific patterns of miRNAs in peripheral blood samples of lung cancer patients. As controls, blood of donors without known affection have been tested. The number of miRNAs, samples, and classes in this data sets are 866, 36, and 2, respectively [38].
GSE17846
This data set represents the analysis of miRNA profiling in peripheral blood samples of multiple sclerosis and in the blood of normal donors. It contains 864 miRNAs, 41 samples, and 2 classes [39].
GSE21036
This data set contains miRNA expression profiles of 218 prostate tumors with primary or metastatic prostate cancer with a median of 5 years clinical follow‐up. The number of miRNAs and samples are 373 and 141, respectively [40].
GSE24709
It analyzes peripheral miRNA blood profiles of patients with lung diseases. The miRNA expression profiling has been done for patients with lung cancer, chronic obstructive pulmonary disease, and normal controls. It contains total 863 miRNAs, 71 samples, and 3 classes.
GSE28700
This data set contains expression profiles of miRNAs from 22 paired gastric cancer and normal tissues. It contains total 44 samples and 470 miRNAs. The samples are grouped into 2 classes [41].
GSE31408
It analyzes miRNA expression profiles of cutaneous T‐cell lymphomas and benign inflammation of skin. It consists of total 705 miRNAs, 148 samples, and 2 classes [42].
Method
Hypercuboid equivalence partition matrix
Let \mathbb{U}=\{{x}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{x}_{i},\cdots \phantom{\rule{0.3em}{0ex}},{x}_{n}\} be the set of n objects or samples and \mathbb{C}=\{{\mathcal{A}}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{\mathcal{A}}_{i},\cdots \phantom{\rule{0.3em}{0ex}},{\mathcal{A}}_{j},\cdots \phantom{\rule{0.3em}{0ex}},{\mathcal{A}}_{m}\} denotes the set of m attributes or miRNAs of a given microarray data set \mathcal{T}=\left\{{w}_{\mathit{\text{ij}}}\righti=1,\cdots \phantom{\rule{0.3em}{0ex}},m,j=1,\cdots \phantom{\rule{0.3em}{0ex}},n\}, where w_{
i
j
}∈ℜ is the measured expression value of the miRNA {\mathcal{A}}_{i} in the sample x_{
j
}. Let \mathbb{D} be the set of class labels or sample categories of n samples. In rough set theory, the attribute sets \mathbb{C} and \mathbb{D} are termed as the condition and decision attribute sets in \mathbb{U}, respectively.
If \mathbb{U}/\mathbb{D}=\{{\beta}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{\beta}_{i},\cdots \phantom{\rule{0.3em}{0ex}},{\beta}_{c}\} denotes c equivalence classes or information granules of \mathbb{U} generated by the equivalence relation induced from the decision attribute set \mathbb{D}, then c equivalence classes of \mathbb{U} can also be generated by the equivalence relation induced from each condition attribute {\mathcal{A}}_{k}\in \mathbb{C}. If \mathbb{U}/{\mathcal{A}}_{k}=\{{\delta}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{\delta}_{i},\cdots \phantom{\rule{0.3em}{0ex}},{\delta}_{c}\} denotes c equivalence classes or information granules of \mathbb{U} induced by the condition attribute {\mathcal{A}}_{k} and n is the number of objects in \mathbb{U}, then c‐partitions of \mathbb{U} are the sets of (cn) values \left\{{\mathrm{h}}_{\mathit{\text{ij}}}\right({\mathcal{A}}_{k}\left)\right\} that can be conveniently arrayed as a (c×n) matrix \mathbb{H}\left({\mathcal{A}}_{k}\right)=\left[{\mathrm{h}}_{\mathit{\text{ij}}}\right({\mathcal{A}}_{k}\left)\right]. The matrix \mathbb{H}\left({\mathcal{A}}_{k}\right) is denoted by
\mathbb{H}\left({\mathcal{A}}_{k}\right)=\left(\begin{array}{llll}{\mathrm{h}}_{11}\left({\mathcal{A}}_{k}\right)& {\mathrm{h}}_{12}\left({\mathcal{A}}_{k}\right)& \cdots & {\mathrm{h}}_{1n}\left({\mathcal{A}}_{k}\right)\\ {\mathrm{h}}_{21}\left({\mathcal{A}}_{k}\right)& {\mathrm{h}}_{22}\left({\mathcal{A}}_{k}\right)& \cdots & {\mathrm{h}}_{2n}\left({\mathcal{A}}_{k}\right)\\ \cdots & \cdots & \cdots & \cdots \\ {\mathrm{h}}_{c1}\left({\mathcal{A}}_{k}\right)& {\mathrm{h}}_{c2}\left({\mathcal{A}}_{k}\right)& \cdots & {\mathrm{h}}_{\mathit{\text{cn}}}\left({\mathcal{A}}_{k}\right)\end{array}\right)
(1)
\text{where}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)=\left\{\begin{array}{ll}1& \text{if}{\mathrm{L}}_{i}\le {x}_{j}\left({\mathcal{A}}_{k}\right)\le {\mathrm{U}}_{i}\\ 0& \text{otherwise.}\end{array}\right.
(2)
The tuple [L_{
i
},U_{
i
}] represents the interval of ith class β_{
i
} according to the decision attribute set \mathbb{D}. The interval [L_{
i
},U_{
i
}] is the value range of condition attribute {\mathcal{A}}_{k} with respect to class β_{
i
}. It is spanned by the objects with same class label β_{
i
}. That is, the value of each object x_{
j
} with class label β_{
i
} falls within interval [L_{
i
},U_{
i
}]. This can be viewed as a supervised granulation process, which utilizes class information.
Generally, an m‐dimensional hypercuboid or hyperrectangle is defined in the m‐dimensional Euclidean space, where the space is defined by the m variables measured for each sample or object. In geometry, a hypercuboid or hyperrectangle is the generalization of a rectangle for higher dimensions, formally defined as the Cartesian product of orthogonal intervals. A d‐dimensional hypercuboid with d attributes as its dimensions is defined as the Cartesian product of d orthogonal intervals. It encloses a region in the d‐dimensional space, where each dimension corresponds to a certain attribute. The value domain of each dimension is the value range or interval that corresponds to a particular class.
The c×n matrix \mathbb{H}\left({\mathcal{A}}_{k}\right) is termed as hypercuboid equivalence partition matrix of the condition attribute {\mathcal{A}}_{k}. It represents the c‐hypercuboid equivalence partitions of the universe generated by an equivalence relation. Each row of the matrix \mathbb{H}\left({\mathcal{A}}_{k}\right) is a hypercuboid equivalence partition or class. Here {\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)\in \{0,1\} represents the membership of object x_{
j
} in the ith equivalence partition or class β_{
i
} satisfying following two conditions:
\begin{array}{l}1\le \sum _{j=1}^{n}{\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)\le n,\forall i;\phantom{\rule{2em}{0ex}}\end{array}
(3)
\begin{array}{l}1\le \sum _{i=1}^{c}{\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)\le c,\forall \mathrm{j.}\phantom{\rule{2em}{0ex}}\end{array}
(4)
The above axioms should hold for every equivalence partition, which correspond to the requirement that an equivalence class is non‐empty. However, in real data analysis, uncertainty arises due to overlapping class boundaries. Hence, such a granulation process does not necessarily result in a compatible granulation in the sense that every two class hypercuboids or intervals may intersect with each other. The intersection of two hypercuboids also forms a hypercuboid, which is referred to as implicit hypercuboid. The implicit hypercuboids encompass the misclassified samples or objects those belong to more than one classes. The degree of dependency of the decision attribute set or class label on the condition attribute set depends on the cardinality of the implicit hypercuboids. The degree of dependency increases with the decrease in cardinality. Hence, the degree of dependency of decision attribute on a condition attribute set is evaluated by finding the implicit hypercuboids that encompass misclassified objects. Using the concept of hypercuboid equivalence partition matrix, the misclassified objects of implicit hypercuboids can be identified based on the confusion vector defined next
\begin{array}{l}\mathbb{V}\left({\mathcal{A}}_{k}\right)=\phantom{\rule{0.3em}{0ex}}\left[\phantom{\rule{0.3em}{0ex}}{\mathrm{v}}_{1}\right({\mathcal{A}}_{k}),\cdots \phantom{\rule{0.3em}{0ex}},{\mathrm{v}}_{j}({\mathcal{A}}_{k}),\cdots \phantom{\rule{0.3em}{0ex}},{\mathrm{v}}_{n}({\mathcal{A}}_{k}\left)\right]\phantom{\rule{2em}{0ex}}\end{array}
(5)
\begin{array}{l}\text{where}\phantom{\rule{1em}{0ex}}{\mathrm{v}}_{j}\left({\mathcal{A}}_{k}\right)=min\{1,\sum _{i=1}^{c}{\mathrm{h}}_{\mathit{\text{ij}}}({\mathcal{A}}_{k})1\}.\phantom{\rule{2em}{0ex}}\end{array}
(6)
According to the rough set theory, if an object x_{
j
} belongs to the lower approximation of any class β_{
i
}, then it does not belong to the lower or upper approximations of any other classes and {\mathrm{v}}_{j}\left({\mathcal{A}}_{k}\right)=0. On the other hand, if the object x_{
j
} belongs to the boundary region of more than one classes, then it should be encompassed by the implicit hypercuboid and {\mathrm{v}}_{j}\left({\mathcal{A}}_{k}\right)=1. Hence, the hypercuboid equivalence partition matrix and corresponding confusion vector of the condition attribute {\mathcal{A}}_{k} can be used to define the lower and upper approximations of the ith class β_{
i
} of the decision attribute set \mathbb{D}.
Let {\beta}_{i}\subseteq \mathbb{U}. β_{
i
} can be approximated using only the information contained within {\mathcal{A}}_{k} by constructing the A‐lower and A‐upper approximations of β_{
i
}:
\underline{A}\left({\beta}_{i}\right)=\left\{{x}_{j}\right\phantom{\rule{1em}{0ex}}{\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)=1\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\mathrm{v}}_{j}\left({\mathcal{A}}_{k}\right)=0\};
(7)
\begin{array}{c}\overline{A}\left({\beta}_{i}\right)=\left\{{x}_{j}\right\phantom{\rule{1em}{0ex}}{\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)=1\};\end{array}
(8)
where equivalence relation A is induced from attribute {\mathcal{A}}_{k}. The boundary region of β_{
i
} is then defined as
\begin{array}{c}{\mathit{\text{BN}}}_{A}\left({\beta}_{i}\right)=\left\{{x}_{j}\right\phantom{\rule{1em}{0ex}}{\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)=1\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\mathrm{v}}_{j}\left({\mathcal{A}}_{k}\right)=1\}.\end{array}
(9)
Dependency
Combining (1), (5), and (7), the dependency between condition attribute {\mathcal{A}}_{k} and decision attribute \mathbb{D} can be defined as follows:
\begin{array}{l}{\gamma}_{{\mathcal{A}}_{k}}\left(\mathbb{D}\right)=\frac{1}{n}\sum _{i=1}^{c}\sum _{j=1}^{n}{\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)\phantom{\rule{0.3em}{0ex}}\cap \phantom{\rule{0.3em}{0ex}}[1{\mathrm{v}}_{j}({\mathcal{A}}_{k}\left)\right],\phantom{\rule{2em}{0ex}}\end{array}
(10)
\begin{array}{l}\text{that is,}\phantom{\rule{1em}{0ex}}{\gamma}_{{\mathcal{A}}_{k}}\left(\mathbb{D}\right)=1\frac{1}{n}\sum _{j=1}^{n}{\mathrm{v}}_{j}\left({\mathcal{A}}_{k}\right),\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{2em}{0ex}}\end{array}
(11)
where 0\le {\gamma}_{{\mathcal{A}}_{k}}\left(\mathbb{D}\right)\le 1. If {\gamma}_{{\mathcal{A}}_{k}}\left(\mathbb{D}\right)=1, \mathbb{D} depends totally on {\mathcal{A}}_{k}, if 0<{\gamma}_{{\mathcal{A}}_{k}}\left(\mathbb{D}\right)<1, \mathbb{D} depends partially on {\mathcal{A}}_{k}, and if {\gamma}_{{\mathcal{A}}_{k}}\left(\mathbb{D}\right)=0, then \mathbb{D} does not depend on {\mathcal{A}}_{k}. The {\gamma}_{{\mathcal{A}}_{k}}\left(\mathbb{D}\right) is also termed as the relevance of attribute {\mathcal{A}}_{k} with respect to class \mathbb{D}.
Significance
Given two condition attributes {\mathcal{A}}_{k} and {\mathcal{A}}_{l}, the c×n hypercuboid equivalence partition matrix corresponding to the set \mathbb{A}=\{{\mathcal{A}}_{k},{\mathcal{A}}_{l}\} can be calculated from two c×n hypercuboid equivalence partition matrices \mathbb{H}\left({\mathcal{A}}_{k}\right) and \mathbb{H}\left({\mathcal{A}}_{l}\right) as follows:
\begin{array}{l}\mathbb{H}\left(\right\{{\mathcal{A}}_{k},{\mathcal{A}}_{l}\left\}\right)=\mathbb{H}\left({\mathcal{A}}_{k}\right)\cap \mathbb{H}\left({\mathcal{A}}_{l}\right);\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{2em}{0ex}}\end{array}
(12)
\begin{array}{l}\text{where}\phantom{\rule{1em}{0ex}}{\mathrm{h}}_{\mathit{\text{ij}}}\left(\right\{{\mathcal{A}}_{k},{\mathcal{A}}_{l}\left\}\right)={\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{k}\right)\cap {\mathrm{h}}_{\mathit{\text{ij}}}\left({\mathcal{A}}_{l}\right).\phantom{\rule{2em}{0ex}}\end{array}
(13)
The change in dependency when an attribute is removed from the set of condition attributes, is a measure of the significance of the attribute. To what extent an attribute is contributing to calculate the dependency on decision attribute can be calculated by the significance of that attribute. The significance of the attribute {\mathcal{A}}_{k} with respect to the condition attribute set \{{\mathcal{A}}_{k},{\mathcal{A}}_{l}\} is given by
\begin{array}{c}{\sigma}_{\mathbb{A}}(\mathbb{D},{\mathcal{A}}_{k})=\frac{1}{n}\sum _{j=1}^{n}\left[{\mathrm{v}}_{j}(\mathbb{A}\{{\mathcal{A}}_{k}\left\}\right){\mathrm{v}}_{j}\left(\mathbb{A}\right)\right];\end{array}
(14)
where 0\le {\sigma}_{\{{\mathcal{A}}_{k},{\mathcal{A}}_{l}\}}(\mathbb{D},{\mathcal{A}}_{k})\le 1. Hence, the higher the change in dependency, the more significant the attribute {\mathcal{A}}_{k} is. If significance is 0, then the attribute is dispensable.
μHEM: proposed miRNA selection method
Let {\gamma}_{{\mathcal{A}}_{i}}\left(\mathbb{D}\right) be the relevance of the miRNA {\mathcal{A}}_{i} with respect to the class labels \mathbb{D} and {\sigma}_{\{{\mathcal{A}}_{i},{\mathcal{A}}_{j}\}}(\mathbb{D},{\mathcal{A}}_{i}) is the significance of the miRNA {\mathcal{A}}_{i} with respect to another miRNA {\mathcal{A}}_{j}\in \mathbb{S}, where \mathbb{S} is the set of selected miRNAs. The average relevance of all selected miRNAs is, therefore, given by
\begin{array}{c}{\mathcal{J}}_{\text{relev}}=\frac{1}{\left\mathbb{S}\right}\sum _{{\mathcal{A}}_{i}\in \mathbb{S}}{\gamma}_{{\mathcal{A}}_{i}}\left(\mathbb{D}\right),\end{array}
(15)
while the average significance among the selected miRNAs is as follows
\begin{array}{ll}\hfill {\mathcal{J}}_{\text{signf}}=& \frac{1}{\left\mathbb{S}\right\left(\right\mathbb{S}1)}\\ \times \sum _{{\mathcal{A}}_{i}\ne {\mathcal{A}}_{j}\in \mathbb{S}}\left\{{\sigma}_{\{{\mathcal{A}}_{i},{\mathcal{A}}_{j}\}}\right(\mathbb{D},{\mathcal{A}}_{i})+{\sigma}_{\{{\mathcal{A}}_{i},{\mathcal{A}}_{j}\}}(\mathbb{D},{\mathcal{A}}_{j}\left)\right\}.\end{array}
(16)
Therefore, the problem of selecting a set \mathbb{S} of relevant and significant miRNAs from the whole miRNA set \mathbb{C} is equivalent to maximize {\mathcal{J}}_{\text{relev}} and {\mathcal{J}}_{\text{signf}}, that is, to maximize the objective function \mathcal{J}, where
\mathcal{J}=\omega {\mathcal{J}}_{\text{relev}}+(1\omega ){\mathcal{J}}_{\text{signf}}
(17)
where ω is a weight parameter. To solve the above problem, the following greedy algorithm is used.

1.
Initialize \mathbb{C}\leftarrow \{{\mathcal{A}}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{\mathcal{A}}_{i},\cdots \phantom{\rule{0.3em}{0ex}},{\mathcal{A}}_{m}\},\mathbb{S}\leftarrow \varnothing.

2.
Generate hypercuboid equivalence partition matrix \mathbb{H}\left({\mathcal{A}}_{i}\right) and corresponding confusion vector \mathbb{V}\left({\mathcal{A}}_{i}\right) for each miRNA {\mathcal{A}}_{i}\in \mathbb{C} using (1) and (5), respectively.

3.
Calculate the relevance {\gamma}_{{\mathcal{A}}_{i}}\left(\mathbb{D}\right) of each miRNA {\mathcal{A}}_{i}\in \mathbb{C} using (11).

4.
Select the miRNA {\mathcal{A}}_{i} as the most relevant miRNA that has highest relevance value {\gamma}_{{\mathcal{A}}_{i}}\left(\mathbb{D}\right). In effect, {\mathcal{A}}_{i}\in \mathbb{S} and \mathbb{C}=\mathbb{C}\setminus {\mathcal{A}}_{i}.

5.
Repeat the following two steps until \mathbb{C}=\varnothing or the desired number of miRNAs is selected.

6.
Repeat the following four steps for each of the remaining miRNAs of \mathbb{C}.

(a)
Generate hypercuboid equivalence partition matrix \mathbb{H}\left(\right\{{\mathcal{A}}_{i},{\mathcal{A}}_{j}\left\}\right) using (12) between each selected miRNA {\mathcal{A}}_{i}\in \mathbb{S} and each miRNA {\mathcal{A}}_{j}\in \mathbb{C}.

(a)
Generate corresponding confusion vector \mathbb{V}\left(\right\{{\mathcal{A}}_{i},{\mathcal{A}}_{j}\left\}\right) for two miRNAs {\mathcal{A}}_{i} and {\mathcal{A}}_{j} using (5).

(a)
Calculate the significance of each miRNA {\mathcal{A}}_{j}\in \mathbb{C} with respect to each of the already selected miRNAs of \mathbb{S} using (14).

(a)
Remove {\mathcal{A}}_{j} from \mathbb{C} if it has zero significance value with respect to any one of the selected miRNAs. In effect, \mathbb{C}=\mathbb{C}\setminus {\mathcal{A}}_{j}.

7.
From the remaining miRNAs of \mathbb{C}, select miRNA {\mathcal{A}}_{j} that maximizes the following condition:
\omega {\gamma}_{{\mathcal{A}}_{j}}\left(\mathbb{D}\right)+\frac{(1\omega )}{\left\mathbb{S}\right}\sum _{{\mathcal{A}}_{i}\in \mathbb{S}}{\sigma}_{\{{\mathcal{A}}_{i},{\mathcal{A}}_{j}\}}(\mathbb{D},{\mathcal{A}}_{j}).
(18)
As a result of that, {\mathcal{A}}_{j}\in \mathbb{S} and \mathbb{C}=\mathbb{C}\setminus {\mathcal{A}}_{j}.

8.
Stop.
Computational complexity
The proposed μHEM method has low computational complexity with respect to the number of miRNAs, samples, and classes. Prior to computing the relevance or significance of a miRNA, the hypercuboid equivalence partition matrix and confusion vector for each miRNA are to be generated first, which are carried out in Step 2 of the proposed algorithm. The computational complexity to generate a (c×n) hypercuboid equivalence partition matrix is \mathcal{O}\left(\mathit{\text{cn}}\right), where c and n represent the number of classes and objects in the data set, respectively, while the generation of confusion vector has also \mathcal{O}\left(\mathit{\text{cn}}\right) time complexity. In effect, the computation of the relevance of a miRNA has \mathcal{O}\left(\mathit{\text{cn}}\right) time complexity. Hence, the total complexity to compute the relevance of m miRNAs, which is carried out in Step 3 of the proposed algorithm, is \mathcal{O}\left(\mathit{\text{mcn}}\right). The selection of most relevant miRNA from the set of m miRNAs, which is carried out in Step 4, has a complexity \mathcal{O}\left(m\right).
There is only one loop in Step 5 of the proposed miRNA selection method, which is executed (d−1) times, where d represents the number of selected miRNAs. The complexity to compute the significance of a candidate miRNA with respect to another miRNA has also the complexity \mathcal{O}\left(\mathit{\text{cn}}\right). If \stackrel{\u0301}{m} represents the cardinality of the already selected miRNA set, the total complexity to compute the significance of (m\stackrel{\u0301}{m}) candidate miRNAs, which is carried out in Step 6, is \mathcal{O}\left(\right(m\stackrel{\u0301}{m}\left)\mathit{\text{cn}}\right). The selection of a miRNA from (m\stackrel{\u0301}{m}) candidate miRNAs by maximizing relevance and significance, which is carried out in Step 7, has a complexity \mathcal{O}(m\stackrel{\u0301}{m}). Hence, the total complexity to execute the loop (d−1) times is \left(\mathcal{O}\right((d1)\left(\right(m\stackrel{\u0301}{m})+(m\stackrel{\u0301}{m}\left)\mathit{\text{cn}}\right))=)\mathcal{O}\left(\mathit{\text{dcn}}\right(m\stackrel{\u0301}{m}\left)\right).
In effect, the selection of a set of d relevant and significant miRNAs from the whole set of m miRNAs using the proposed hypercuboid equivalence partition matrix based first order incremental search method has an overall computational complexity of \left(\mathcal{O}\right(\mathit{\text{mcn}})+\mathcal{O}(m)+\mathcal{O}(\mathit{\text{dcn}}(m\stackrel{\u0301}{m}))=)\mathcal{O}\left(\mathit{\text{dnm}}\right) as c,\stackrel{\u0301}{m}<<m.
B.632+ error rate
In order to minimize the variability and biasedness of derived result, the so‐called B.632+ bootstrap approach [37] is used, which is defined as follows:
\mathrm{B.}632+=(1\stackrel{~}{\omega})\mathit{\text{AE}}+\stackrel{~}{\omega}B1
(19)
where AE denotes the proportion of the original training samples misclassified, termed as apparent error rate, and B1 is the bootstrap error, defined as follows:
B1=\frac{1}{n}\sum _{j=1}^{n}\left(\frac{\sum _{k=1}^{\mathrm{M}}{I}_{\mathit{\text{jk}}}{Q}_{\mathit{\text{jk}}}}{\sum _{k=1}^{\mathrm{M}}{I}_{\mathit{\text{jk}}}}\right)
(20)
where n is the number of original samples and M is the number of bootstrap samples. If the sample x_{
j
} is not contained in the kth bootstrap sample, then I_{
j
k
}=1, otherwise 0. Similarly, if x_{
j
} is misclassified, Q_{
j
k
}=1, otherwise 0. The weight parameter \stackrel{~}{\omega} is given by
\stackrel{~}{\omega}=\frac{0.632}{10.368r};
(21)
\text{where}\phantom{\rule{1em}{0ex}}r=\frac{B1\mathit{\text{AE}}}{\gamma \mathit{\text{AE}}};
(22)
\text{and}\phantom{\rule{1em}{0ex}}\gamma =\sum _{i=1}^{c}{p}_{i}(1{q}_{i});
(23)
where c is the number of classes, p_{
i
} is the proportion of the samples from the ith class, and q_{
i
} is the proportion of them assigned to the ith class. Also, γ is termed as the no‐information error rate that would apply if the distribution of the class‐membership label of the sample x_{
j
} did not depend on its feature vector.
Support vector machine
In the current study, the support vector machine (SVM) [43] is used to evaluate the performance of the proposed μHEM algorithm as well as several other feature selection algorithms. The SVM is a margin classifier that draws an optimal hyperplane in the feature vector space; this defines a boundary that maximizes the margin between data samples in different classes, therefore leading to good generalization properties. A key factor in the SVM is to use kernels to construct nonlinear decision boundary. In the present work, linear kernels are used. The source code of the SVM has been downloaded from Library for Support Vector Machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm/).
To compute different types of error rates obtained using the SVM, bootstrap approach is performed on each miRNA expression data set. For each training set, a set of differential miRNAs is first generated, and then the SVM is trained with the selected miRNAs. After the training, the information of miRNAs those were selected for the training set is used to generate test set and then the class label of the test sample is predicted using the SVM. For each data set, fifty top‐ranked miRNAs are selected for the analysis.
In order to calculate the B.632+ error rate, apparent error (AE) is first calculated. This error is obtained when the same original data set is used to train and test a classifier. After that, the B1 error is computed from M bootstrap samples. Finally, the no‐information error (γ) is calculated by randomly perturbing the class label of a given data set. The mutated data set is used for miRNA selection and the selected miRNA set is used to build the SVM. Then, the trained SVM is used to classify the original data set. The error generated by this procedure is known as γ rate. Finally, the B.632+ error rate is computed based on the AE, B1 error, and γ error using (19).