Genotyping microsatellites in next-generation sequencing data

Dashnow, Harriet; Tan, Susan; Das, Debjani; Easteal, Simon; Oshlack, Alicia

doi:10.1186/1471-2105-16-S2-A5

Volume 16 Supplement 2

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Meeting abstract
Open access
Published: 28 January 2015

Genotyping microsatellites in next-generation sequencing data

Harriet Dashnow^1,2,3,
Susan Tan⁴,
Debjani Das⁴,
Simon Easteal⁴ &
…
Alicia Oshlack^2,3

BMC Bioinformatics volume 16, Article number: A5 (2015) Cite this article

2644 Accesses
8 Citations
1 Altmetric
Metrics details

Background

Microsatellites are short (2-6bp) DNA sequences repeated in tandem, which make up approximately 3% of the human genome [1]. These loci are prone to frequent mutations and high polymorphism with the estimated mutation rates of 10⁻² - 10⁻⁶ events per locus per generation, orders of magnitude higher than other parts of the genome [2]. Dozens of neurological and developmental disorders have been attributed to microsatellite expansions [3]. Microsatellites have also been implicated in a range of functions such as DNA replication and repair, chromatin organisation and regulation of gene expression [4].

Traditionally, microsatellite variation has been measured using capillary gel electrophoresis [5]. In addition to being time-consuming, and expensive, this method fails to reveal the full complexity at these loci because it does not directly sequence the fragment but only measure the number of bases in the repeat.

Next-generation sequencing has the potential to address these problems. However, determining microsatellite lengths using next-generation sequencing data is difficult. In particular, polymerase slippage during PCR amplification introduces stutter noise. A small number of software tools have been written to genotype simple microsatellites in next-generation sequencing data [6–8], however they fail to address the issues of SNPs and compound repeats, and in some cases provide only approximate genotypes.

We have begun to develop a microsatellite genotyping algorithm that addresses these issues, providing high accuracy as well as more detailed analysis of microsatellite loci. We have validated it using high depth amplicon sequencing data of microsatellites near the AVPR1A gene.

Results

We found high concordance between our algorithm and repeat lengths obtained by electrophoresis, manual inspection and Mendelian inheritance (Table 1). By subsampling the reads, we found that our model is accurate to within one repeat unit down to coverages that we would expect in standard exome sequencing (Figure 1). Additionally, we detected polymorphic single nucleotide changes within some microsatellites.

Table 1 Concordance of microsatellite variance calls three validation methods: electrophoresis, manual inspection and Mendelian inheritance.

Full size table

Conclusions

The algorithm was approximately 95% correct at calling the exact same genotype on high depth sequencing data. When it did call a genotype incorrectly, the genotype was only one repeat unit different. The algorithm can perform at approximately 90% accuracy to within one repeat unit with as few as 20 informative reads and reaches almost 100% accuracy to within one repeat unit with 100 or more informative reads.

Future work will include expanding the algorithm to genotype compound microsatellites and further validation and comparison with other algorithms will be performed on whole genome data sets.

References

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.
Article CAS PubMed Google Scholar
Gemayel R, Vinces MD, Legendre M, Verstrepen KJ: Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annual review of genetics. 2010, 44: 445-477. 10.1146/annurev-genet-072610-155046.
Article CAS PubMed Google Scholar
Gatchel JR, Zoghbi HY: Diseases of unstable repeat expansion: mechanisms and common principles. Nature Reviews Genetics. 2005, 6 (10): 743-755. 10.1038/nrg1691.
Article CAS PubMed Google Scholar
Li YC, Korol AB, Fahima T, Beiles A, Nevo E: Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Molecular ecology. 2002, 11 (12): 2453-2465. 10.1046/j.1365-294X.2002.01643.x.
Article CAS PubMed Google Scholar
Guichoux E, Lagache L, Wagner S, Chaumeil P, LÉGer P, Lepais O, Lepoittevin C, Malausa T, Revardel E, Salin F, et al: Current trends in microsatellite genotyping. Molecular Ecology Resources. 2011, 11 (4): 591-611. 10.1111/j.1755-0998.2011.03014.x.
Article CAS PubMed Google Scholar
Gymrek M, Golan D, Rosset S, Erlich Y: lobSTR: A short tandem repeat profiler for personal genomes. Genome Research. 2012
Google Scholar
Highnam G, Franck C, Martin A, Stephens C, Puthige A, Mittelman D: Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles. Nucleic acids research. 2012, gks981-
Google Scholar
Cao MD, Tasker E, Willadsen K, Imelfort M, Vishwanathan S, Sureshkumar S, Balasubramanian S, Bodén M: Inferring short tandem repeat variation from paired-end short reads. Nucleic acids research. 2014, 42 (3): e16-e16. 10.1093/nar/gkt1313.
Article PubMed Central CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Life Science Computation Centre, Victorian Life Sciences Computation Initiative, Carlton, VIC, Australia
Harriet Dashnow
The University of Melbourne, Parkville, VIC, Australia
Harriet Dashnow & Alicia Oshlack
Murdoch Childrens Research Institute, Parkville, VIC, Australia
Harriet Dashnow & Alicia Oshlack
John Curtin School of Medical Research - Australian National University, Canberra, ACT, Australia
Susan Tan, Debjani Das & Simon Easteal

Authors

Harriet Dashnow
View author publications
You can also search for this author in PubMed Google Scholar
Susan Tan
View author publications
You can also search for this author in PubMed Google Scholar
Debjani Das
View author publications
You can also search for this author in PubMed Google Scholar
Simon Easteal
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Oshlack
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Dashnow, H., Tan, S., Das, D. et al. Genotyping microsatellites in next-generation sequencing data. BMC Bioinformatics 16 (Suppl 2), A5 (2015). https://doi.org/10.1186/1471-2105-16-S2-A5

Download citation

Published: 28 January 2015
DOI: https://doi.org/10.1186/1471-2105-16-S2-A5

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Genotyping microsatellites in next-generation sequencing data

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Genotyping microsatellites in next-generation sequencing data

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us