Skip to main content
Figure 5 | BMC Bioinformatics

Figure 5

From: KRLMM: an adaptive genotype calling method for common and low frequency variants

Figure 5

Summary of the KRLMM algorithm. Analysis begins with either idat files or output from GenomeStudio. Data are then processed by the genotype.Illumina function to generate genotype calls and call confidence scores (A). A key part of the KRLMM algorithm is determining the number of clusters (k = 1,2, or 3) for each SNP (B). Variables that measure the tightness of a particular clustering (R ik for k = 1,2,3), the amount of bias present in the estimated cluster positions (D ik for k = 1,2,3) and agreement with Hardy-Weinberg proportions (H ik for k = 2,3) based on k-means cluster assignments for different values of k are calculated for each SNP. In each plot, points have been numbered and colored according to the k-means clustering results for a given k. Regression coefficients for each of these variables are pre-determined (and saved in platform specific annotation packages) by fitting a logistic regression model to a training data set made up of 10,000 randomly chosen SNPs from a HapMap data set where the genotypes (and true k) are known in advance. This model is applied to the SNP-specific predictors to determine the best k to use in a k-means clustering to obtain genotype calls.

Back to article page