Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

Fig. 1

Flow chart of data preprocessing, machine learning, and evaluation approaches. Large-scale RNA-Seq datasets were freely available to be obtained from TCGA, GTEx, ICGC, and GEO and assigned to training and test sets. The original datasets are part of data preprocessing combination #1 and serve as the 'baseline' for comparison. The transformed datasets of data preprocessing combinations #2 through #16 are based on various combinations of normalization (Unnormalized, Quantile Normalization, Quantile Normalization with Target, and Feature Specific Quantile Normalization), batch effect correction (No batch effect correction or Batch effect correction), and data scaling (Unscaled or Scaled) procedures applied to the original datasets. Each of the data preprocessing combinations is used to build an associated machine learning classifier

Back to article page