Skip to main content

Table 2 Process of stepwise character selection based on RF

From: Extracting predictors for lung adenocarcinoma based on Granger causality test and stepwise character selection

Algorithm: Stepwise Character Selection

Input: Ranked independent gene list G, number of candidate genes n, gene expression microarray

  DR(nd),threshold ε.

Output: Predictor set P, predict accuracy ACC,P_ACC.

Step1: Initialization: P=, candidate gene set C=G.

  Step 1.1:Calculate the accuracy ACCi, SNi,SPi,MCCi by Eqs. 1, 2, 3, 4 of each ciC

which acts as a single predictor in RF with 5-fold cross validation, respectively.

Step 1.2: Pp,where \(p \leftarrow \mathop {argmax}\limits _{c_{i} \in C}ACC\), \(P\_ACC \leftarrow \mathop {max}\limits _{i\in {1 \cdots n}}ACC,C \leftarrow C/p\).

  Step2: Character selection.

  While P_ACCACC_max>−ε and C, do

   Step2.1: ACC_maxP_ACC.

   Step2.2: Add members into P.

    1.\(P_{add\_i} \leftarrow P \cup \{ c_{i} \in C\}\) calculate \({ACC}_{add\_i}\) using \(P_{add\_i}\),as predictors in RF with 5-folf cross validation, i=1,,n.

    2.\(P \leftarrow P_{add\_I_{add}}\),where \(I_{add} \leftarrow \mathop {argmax}\limits _{i=1,\cdots,n}({ACC}_{add}),P\_ACC \leftarrow max(ACC),C \leftarrow C/P\).

   Step2.3: Try remove members form P.

    If P_ACCACC_max>−ε,do

    1.Define nremove as the length of P.

    2.\(P_{remove\_i} \leftarrow P/p_{i} \in P\),calculate \({ACC}_{remove\_i}\) using \(P_{remove\_i}\)as predictors in Random Forest with 5-fold cross validation, i=1,,nremove.

    3.\(P\leftarrow \{p_{i}|{ACC}_{remove\_i}>P\_ACC,i=1,\cdots,n_{remove}\}\).

    \(C \leftarrow \{C \cup \{p_{i}|{ACC}_{remove\_i} \leq P\_ACC,i=1,\cdots,n_{remove}\}\)

    End if.

   Step2.4: Calculate accuracy.

    Calculate P_ACC using P as predictors in RF with 5-fold cross validation.

   End while.