Gene selection methods
We employed fourteen published gene selection methods in this article. In machine learning features selection methods can be classified into three categories : filters, wrappers and embedded methods. Filter methods select a subset of features prior to classifier training according to some measure of relevance for class membership, e.g. mutual information . Wrapper methods systematically assess the prediction performance of feature subsets, e.g. recursive feature elimination (RFE) ; and embedded methods perform features selection within the process of classifier training. The methods we employed in this article covered all three categories. Furthermore we can classify feature selection methods according to whether or not they incorporate biological network knowledge (conventional vs. network-based approaches).
As one of the most basic approaches, we considered here a combination of significance analysis of microarrays (SAM)  as a filter prior to SVM or Naïve Bayes classifier learning. More specifically, only genes with FDR < 5% (Benjamini-Hochberg method)  were considered as differentially expressed. As further classical gene selection methods we considered prediction analysis for microarrays (PAM) , which is an embedded method, and recursive feature elimination (SVM-RFE) , an SVM-based wrapper algorithm. Moreover, we included SCAD-SVMs  and elastic-net penalty SVMs (HHSVM)  as more recently proposed embedded approaches that particularly take into account correlations in gene expression data. In this article we used SAM+SVM (significant gene SVM), SAM+NB (significant gene Naïve Bayes classifier), PAM, SCAD-SVM, HHSVM and SVM-RFE as conventional feature selection methods that do not employ network knowledge.
The following network-based approaches for integrating network or pathway knowledge into gene selection algorithms were investigated: Mean expression profile of member genes within KEGG pathways (aveExpPath) , graph diffusion kernels for SVMs (graphK; diffusion kernel parameter δ=1) , p-step random walk kernels for SVMs (graphKp; parameters p=3, α=2, as suggested by Gao et al.) , pathway activity classification (PAC) , gradient boosting (PathBoost)  and network-based SVMs (parameter sd. cutoff=0.8 for pre-filtering of probesets according to their standard deviation) . In case of avgExpPath whole KEGG-pathways were selected or not selected based on their average differential expression between patient groups. This was done based on a SAM-test with FDR cutoff 5% (see above). In case of diffusion and p-step random walk kernels the SVM-RFE algorithm was adopted for gene selection using the implementation in the R-package pathClass . Furthermore, pathClass was used to calculate the diffusion kernel. This implementation is directly based on  and only keeps the 20% smallest eigenvalues and corresponding eigenvectors of the normalized graph Laplacian to compute the kernel matrix.
PAC and PathBoost come with an own mechanism to select relevant genes. PathBoost incorporates network knowledge directly into the gradient boosting procedure to perform gene selection, whereas PAC first selects genes within each KEGG-pathway based on a t-test and then summarizes gene expression in each pathway to a pathway activity score. According to the original paper by Lee et al.  only the top 10% pathways with highest differences in their activity between sample groups were selected. Recently, Taylor et al.  found that differentially expressed hub proteins in a protein-protein interaction network could be related to breast cancer disease outcome. We here applied their approach (called HubClassify) as follows: the random permutation test proposed in Taylor et al.  was used to select differentially expressed hub genes with FDR cutoff 5%. Hubs were here defined to be those genes, whose node degree fell into the top 5% percentile of the degree distribution of our protein interaction network. Afterwards a SVM was trained using only those differential hub genes. Finally, we considered the recently proposed Reweighted Recursive Feature Elimination (RRFE) algorithm , which combines GeneRank  and SVM-RFE as implemented in the pathClass package . In summary average pathway expression (aveExpPath), graph diffusion kernels for SVMs (graphK), p-step random walk graph kernels for SVMs (graphKp), PAC, PathBoost, networkSVM and HubClassify are considered in our comparison of network-based gene selection methods.
For all SVM classifiers used in this study the soft-margin parameter C was tuned in the range 10-3, 10-3, 10-2, ..., 103 on the training data. For that purpose the pathClass package was employed, which uses the span-bound for SVMs as a computationally attractive and probably accurate alternative to cross-validation . For elastic net SVMs and SCAD-SVMs we used the R-package penalizedSVM , which allows for tuning of hyperparameters (elastic net: λ1∈[2-8,214
], λ2 set in a fixed ratio to λ1 according to ; SCAD-SVM: λ∈[2-8,214
]) based on the generalized approximate cross-validation (GACV) error as another computationally attractive alternative to cross-validation. The EPSGO algorithm described in  was used for finding optimal hyper-parameter values within the defined ranges. Note that in any case only the training data were used for hyper-parameter tuning.
It should be mentioned that for conventional approaches all probesets on the chip were considered. This is in agreement with a typical purely data driven approach with no extra side information. Please note that an a-priori restriction to probesets, which can be mapped to a pre-defined network, would already include a certain level of extra background knowledge with corresponding assumptions.
Classification performance and stability of a signature
In order to assess the prediction performance for our tested gene selection methods, we performed a 10 times repeated 10-fold cross-validation. That means the whole data was randomly split into 10 folds, and each fold sequentially left out once for testing, while the rest of the data was used for training and optimizing the classifier (including gene selection via filtering methods, standardization of expression values for each gene to mean 0 and standard deviation 1, etc.). The whole process was repeated 10 times. It should be noted extra that also standardization of gene expression data was only done on each training set separately and the corresponding scaling parameters then applied to the test data.
The area under receiver operator characteristic curve (AUC)  was used here to measure the prediction accuracy, and the AUC was calculated by R-package ROCR . To assess the stability of features selection methods, we computed the selection frequency of each gene within the 10 times repeated 10-fold cross-validation procedure. That means a particular gene could be selected at most 100 times.