Risk-conscious correction of batch effects: maximising information extraction from high-throughput genomic datasets

Table 2 The varying nature of batch effects in the three datasets as detected by Harman

PC indices	1	2	3	4	5	6	7	8
A. Correction Vector (Hn-.95)
Dataset 1	0.26	0.33	0.51	0.9	0.44	0.85	0.74	1
Dataset 2	0.42	1	0.93	1	0.99	1	1	0.95
Dataset 3	0.76	1	0.35	0.69	1	1	1	1
B. % of data variance explained by PC
Dataset 1	43.4 %	9.5 %	4.8 %	4.3 %	2.7 %	2.4 %	2.2 %	2.0 %
Dataset 2	19.1 %	11.5 %	6.9 %	4.6 %	4.3 %	4.0 %	3.6 %	3.6 %
Dataset 3	33.9 %	17.2 %	16.0 %	8.6 %	5.8 %	4.5 %	3.7 %	3.3 %

(A) Shows the ‘correction vector’ spanning the first eight principal components for the three datasets resulting from Harman (.95). No or negligible correction were detected for the remaining PCs. A score of 1 means no correction, whereas a score of 0 means maximum correction within the confines of Harman. (B) Shows the relative proportion of overall variance explained by each of the (first eight PCs) for the three datasets

ISSN: 1471-2105