Skip to main content

alleHap: an efficient algorithm to reconstruct zero-recombinant haplotypes from parent-offspring pedigrees

Background

Haplotype inference is an essential stage in genetic linkage analysis and estimation methods are also very frequently used to reconstruct haplotypes in current genetic association studies. Most of the latter are focused on haplotype phasing from recombinant DNA areas of unrelated individuals and use likelihood-based methods to infer the presence of alleles in several loci with very time-consuming probabilistic algorithms.

So far, literature does not analyze haplotypes using deterministic techniques, and there are hardly any alternative methods for constructing haplotypes from non-recombinant DNA areas, despite the fact that computational inference by probabilistic models may cause a large number of incorrect inferences.

Description and results

We have developed an algorithm called alleHap, which is able to impute alleles from parent-offspring pedigree databases with missing family members, to later construct their corresponding, unambiguous haplotypes.

The alleHap algorithm is based on a preliminary analysis of all possible combinations that may exist in the genotyping of a family, considering that each member, due to meiosis, should unequivocally have two alleles, one from each parent. The analysis was founded on the differentiation of seven cases, as described in [1], but some of them divided into a maximum of three variants, representing a different combination of alleles of the family members (Table 1).

Table 1 Possible allelic combinations in a parent-offspring pedigree

The classification by cases and variants allows the algorithm to impute missing values efficiently in the loaded database to proceed afterwards to the conformation of corresponding unambiguous haplotypes. Furthermore, the algorithm allows the construction of haplotypes, without any limitation in terms of the number of SNPs, i.e. enables the construction of haplotypes of more than two SNPs.

By analyzing all possible combinations of a parent-offspring pedigree in which parents may be missing, as long as one child has been genotyped, theoretically an unequivocal imputation of three possible parent haplotypes is possible in 92.3% of cases even when one parent is missing. When neither parent has been genotyped, in 36.4% of cases at least two haplotypes can be constructed. Regarding offspring allele imputation with both parents fully genotyped, a minimum of one haplotype for each child may be successfully reconstructed in 6.1% of possible cases.

Evaluation of the results (Figure 1) reveals an optimum performance of alleHap computational tasks, namely Simulation, Imputation and Reconstruction. Their corresponding execution times are quite low even when considering a large number of families (≤ 2000) and SNPs (≤ 50).

Figure 1
figure1

Representation of computing times according to the number of families (left) and the number of SNPs (right).

Figure 2 shows how our algorithm has high allele imputation rates (about 65%) even when the probability of missing parents in each family is high (>50%). Regarding haplotype reconstruction rates, there is an almost linear relationship between reconstruction rates and the number of missing individuals per family. This is because alleHap is mainly based on the information included in the offspring, so the more children that are missing the more difficult it is to reconstruct the family haplotypes.

Figure 2
figure2

Representation of allele imputation rates (left) and haplotype reconstruction rates (right).

Conclusions

alleHap has been tested by simulations and also with the Type 1 Diabetes Genetics Consortium [2] database. Our algorithm is very robust against inconsistencies within the genotypic data and consumes very little time, even when handling large amounts of data. The missing data imputation may improve results in numerous epidemiological and/or genetic linkage studies.

Our algorithm could be a useful instrument for information retrieval and knowledge discovery in genetics, since it would allow epidemiological specialists to discover new intergenic patterns by studying zero-recombinant haplotypes with a larger number of SNPs from family-based databases.

References

  1. 1.

    Berger-Wolf TY: Reconstruction sibling relationships in wild populations. Bioinformatics. 2007, 23: i49-i56. 10.1093/bioinformatics/btm219.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Rich SS: The Type 1 Diabetes Genetics Consortium. Ann N Y Acad Sci. 2006, 1079: 1-8. 10.1196/annals.1375.001.

    CAS  Article  PubMed  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nathan Medina-Rodríguez.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Cite this article

Medina-Rodríguez, N., Santana, A., Wägner, A.M. et al. alleHap: an efficient algorithm to reconstruct zero-recombinant haplotypes from parent-offspring pedigrees. BMC Bioinformatics 15, A6 (2014). https://doi.org/10.1186/1471-2105-15-S3-A6

Download citation

Keywords

  • Reconstruction Rate
  • Genetic Linkage Analysis
  • Haplotype Inference
  • Genetic Linkage Study
  • Miss Data Imputation