Application of machine learning in SNP discovery

BMC Bioinformatics

Table 4 Algorithm for haplotype variation factor determination

N is the total number of polymorphic positions
For each polymorphic position i = 1 to N
List of chromatograms having the major allele b1, minor allele b2 are b1(i) and b2(i) respectively.
Set Sum(HapVariationFactor) to zero.
For each of the polymorphic position j = 1 to N and i ≠ j
List of chromatograms having the major allele b1 and minor allele b2 are b1(j) and b2(j)
c(i,j) is the number of elements (chromatograms) common in b2(i) and b2(j) and t is the number of elements in b2(j) then
Sum(HapVariationFactor) += c(i,j)/t
End of For loop
HaplotypeFactor = Sum(HapVariationFactor)/N
End of For loop

Haplotype variation factor is defined as a measure of co-variance observed in the same chromatogram across different SNP loci. For each SNP locus the fraction of number of co-variances (observing minor alleles at different SNP locus on the same chromatogram) with respect to total number of minor alleles observed is first calculated. These values are then summed for all positions and the mean value (haplotype variation factor) is calculated by dividing by the total number of polymorphisms.

ISSN: 1471-2105