1. Data collection |
 Collect raw gene expression datasets, which possibly come from previous experiments and/or systematic search from online repositories. |
2. Data preparation |
 (i) Individually preprocess raw gene expression datasets (i.e. normalization, background correction, log2 transformation). |
 (ii) Divide D available gene expression datasets into three sets, i.e. D-2 gene expression datasets to get a gene signature list (SET1), a gene expression set to train classification models (SET2) and a dataset to validate the models (SET3). |
3. Meta-analysis for gene selection |
 (i) For each probesets, aggregate expression values from SET1 to get a signature list via random effect meta-analysis. |
 (ii) Record significant probesets (also refer to as informative probesets) |
4. Predictive modeling |
 (i) In SET2, include informative probesets resulted from Step 3. |
 (ii) Divide samples in SET2 to a learning set and a testing set. |
 (iii) Perform cross validation in classification model modeling. |
 (iv) Evaluate optimum predictive models in the testing set. |
5. External validation |
 (i) In SET3, include probesets that are informative from Step 3. |
 (ii) Scale gene expression values in SET3 with SET2 as a reference. |
 (iii) Validate classification models from Step 4 to the scaled gene expressions data in SET3. |