Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia

Table 1 An approach in building and validating classification models by using meta-analysis as gene selection technique

1. Data collection
Collect raw gene expression datasets, which possibly come from previous experiments and/or systematic search from online repositories.
2. Data preparation
(i) Individually preprocess raw gene expression datasets (i.e. normalization, background correction, log2 transformation).
(ii) Divide D available gene expression datasets into three sets, i.e. D-2 gene expression datasets to get a gene signature list (SET1), a gene expression set to train classification models (SET2) and a dataset to validate the models (SET3).
3. Meta-analysis for gene selection
(i) For each probesets, aggregate expression values from SET1 to get a signature list via random effect meta-analysis.
(ii) Record significant probesets (also refer to as informative probesets)
4. Predictive modeling
(i) In SET2, include informative probesets resulted from Step 3.
(ii) Divide samples in SET2 to a learning set and a testing set.
(iii) Perform cross validation in classification model modeling.
(iv) Evaluate optimum predictive models in the testing set.
5. External validation
(i) In SET3, include probesets that are informative from Step 3.
(ii) Scale gene expression values in SET3 with SET2 as a reference.
(iii) Validate classification models from Step 4 to the scaled gene expressions data in SET3.

ISSN: 1471-2105