Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: Alternative empirical Bayes models for adjusting for batch effects in genomic studies

Fig. 5

Cluster assignment of the 200 genes using k-means algorithm, where k=2. Color bars show the 200 genes from top to bottom, which corresponds to the gene labels in Fig. 4. The red and blue bars represent signature and control genes, respectively. During batch adjustment, true activation levels are included as covariates, as opposed to using no covariates in both versions of ComBat (Additional file 1: Figure S6). In the batch adjusted data, we first clustered genes into 2 groups without specifying the group sizes or labels. Then, clusters are assigned as signature and control by how it best accords with the original separation. a In batch 1, genes are correctly separated. But combining batch 2 with batch 1 without ComBat adjustment changes the signature / non-signature separation. Only 58.5% genes remain the same in the combined dataset. b Reference-batch ComBat gives cluster assignment that is more consistent with the true separation than original ComBat, in batch 1 only, batch 2 only, and the combined dataset of batch 1 and 2. These results suggest that the original ComBat breaks the similarity between genes in the same group (signature or control), where similarity is measured by the Euclidean distance. Only reference-batch ComBat is able to preserve this similarity

Back to article page