Fig. 2From: ERStruct: a fast Python package for inferring the number of top principal components from whole genome sequencing dataA flowchart that demonstrates the three parts of the ERStruct Python package for the whole genome sequencing data analysis. As an example we use the 1000 Genomes Project sequencing data set [10] in which genetic markers with MAF less than 5% are removed. The input real data are processed as in Eigens.py and then transmitted to TopPCs.py to obtain the sample eigenvalue ratios \(r_i\) (as plotted in the lower left panel). While in GOE.py, GOE matrices simulation is carried out and then transmitted to TopPCs.py to calculate the critical values \(\xi _{\alpha ,1}, \ldots , \xi _{\alpha ,\hat{K}_c}\). Finally, these critical values are used to infer the number of top principal components following Eq. 4 (as plotted in the lower right panel)Back to article page