Skip to main content

Table 2 Comparison between five batch correction methods in predicting pathway activity and drug efficacy

From: Alternative empirical Bayes models for adjusting for batch effects in genomic studies

  

Correlation: EGFR protein expression

Drug response in cell lines

ComBat version

Common genes (cell line vs. TCGA)

Cell line

TCGA

Erlotinib

GSK1120212

Original ComBat

20 (40%)

0.316

0.132

0.360

0.401

Mean-only ComBat

44 (88%)

0.331

-0.042

0.294

0.407

Reference-batch ComBat

50 (100%)

0.442

0.299

0.415

0.520

Frozen SVA

50 (100%)

0.115

0.092

-0.09

-0.131

RUV

40 (80%)

0.287

0.182

0.332

0.145

  1. We combined the oncogenic signature dataset with the cell line and TCGA patient data separately to adjust for batch effect and enable the profiling of EGFR signatures from the oncogenic data to the test sets. We observed the set bias using original ComBat (40% same signature genes), mean-only ComBat (88% same genes), and RUV (80% same genes) to combine the datasets. Reference-batch ComBat and frozen SVA kept the same signature genes. Also, using reference-batch ComBat gave the highest correlations of prediction scores with both protein expression and drug response, among all five batch correction methods. These results support the benefit of using reference-batch ComBat in this context