Skip to main content

Table 2 The varying nature of batch effects in the three datasets as detected by Harman

From: Risk-conscious correction of batch effects: maximising information extraction from high-throughput genomic datasets

PC indices

1

2

3

4

5

6

7

8

A. Correction Vector (Hn-.95)

Dataset 1

0.26

0.33

0.51

0.9

0.44

0.85

0.74

1

Dataset 2

0.42

1

0.93

1

0.99

1

1

0.95

Dataset 3

0.76

1

0.35

0.69

1

1

1

1

B. % of data variance explained by PC

Dataset 1

43.4 %

9.5 %

4.8 %

4.3 %

2.7 %

2.4 %

2.2 %

2.0 %

Dataset 2

19.1 %

11.5 %

6.9 %

4.6 %

4.3 %

4.0 %

3.6 %

3.6 %

Dataset 3

33.9 %

17.2 %

16.0 %

8.6 %

5.8 %

4.5 %

3.7 %

3.3 %

  1. (A) Shows the ‘correction vector’ spanning the first eight principal components for the three datasets resulting from Harman (.95). No or negligible correction were detected for the remaining PCs. A score of 1 means no correction, whereas a score of 0 means maximum correction within the confines of Harman. (B) Shows the relative proportion of overall variance explained by each of the (first eight PCs) for the three datasets