Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: ERStruct: a fast Python package for inferring the number of top principal components from whole genome sequencing data

Fig. 2

A flowchart that demonstrates the three parts of the ERStruct Python package for the whole genome sequencing data analysis. As an example we use the 1000 Genomes Project sequencing data set [10] in which genetic markers with MAF less than 5% are removed. The input real data are processed as in Eigens.py and then transmitted to TopPCs.py to obtain the sample eigenvalue ratios \(r_i\) (as plotted in the lower left panel). While in GOE.py, GOE matrices simulation is carried out and then transmitted to TopPCs.py to calculate the critical values \(\xi _{\alpha ,1}, \ldots , \xi _{\alpha ,\hat{K}_c}\). Finally, these critical values are used to infer the number of top principal components following Eq. 4 (as plotted in the lower right panel)

Back to article page