An integrated approach for SNP calling based on population of genomes

Vo, Nam S; Tran, Quang; Phan, Vinhthuy

doi:10.1186/1471-2105-15-S10-P30

Volume 15 Supplement 10

UT-KBRIN Bioinformatics Summit 2014: Abstracts

Poster presentation
Open access
Published: 29 September 2014

An integrated approach for SNP calling based on population of genomes

Nam S Vo¹,
Quang Tran² &
Vinhthuy Phan¹

BMC Bioinformatics volume 15, Article number: P30 (2014) Cite this article

1056 Accesses
Metrics details

Background

The identification of genetic variants such as single nucleotide polymorphisms (SNPs) is a critical step in many applications based on NGS technologies [1]. Although many SNP calling programs have been developed, it is still challenging to accurately call SNPs, especially when coverage level is low [2]. Moreover, the determination of SNPs, which is performed through many separate steps, requires a careful selection of a diverse set of tools [3, 4]. This can lead to several disadvantages, for example, one cannot incorporate information from the read alignment step into the SNP calling step or vice versa to help improve accuracy of called SNPs.

Materials and methods

We propose a novel integrated approach to detect more true SNPs while calling fewer false positives. Different from current methods that perform read alignment and SNP calling steps separately, our method combines them methodologically to improve the accuracy of SNP identification. To effectively exploit information from a population of genomes, databases of confirmed SNPs, such as dbSNP, are employed in both aligning reads to references as well as calling SNPs. This strategy allows us to develop a novel algorithm to align reads to references that can differentiate sequencing errors from SNPs.

Results

Based on this result, the method can call SNPs accurately and effectively even with low-coverage sequencing data. Our results on simulated data show that the method is able to call SNPs with very high precision and recall rate with low-coverage datasets.

Conclusions

With the existence of databases of confirmed SNPs for large amounts of sequenced species, our approach provides a promising method to call accurate SNP information even with low-coverage sequencing data. This approach can also help researchers facilitate the determination of SNPs by using an integrated SNP calling tool.

References

Nielsen R, Paul JS, Albrechtsen A, Song YS: Genotype and snp calling from next-generation sequencing data. Nat Rev Genet. 2011, 12 (6): 443-451. 10.1038/nrg2986.
Article PubMed Central CAS PubMed Google Scholar
Yu X, Sun S: Comparing a few snp calling algorithms using low-coverage sequencing data. BMC Bioinformatics. 2013, 14: 274-10.1186/1471-2105-14-274.
Article PubMed Central PubMed Google Scholar
Altmann A, Weber P, Bader D, Preuss M, Binder EB, Müller-Myhsok B: A beginners guide to snp calling from high-throughput dna-sequencing data. Hum Genet. 2012, 131 (10): 1541-1554. 10.1007/s00439-012-1213-z.
Article PubMed Google Scholar
Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z: A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014, 15 (2): 256-278. 10.1093/bib/bbs086.
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgements

This work is partly supported by NSF CCF-1320297.

Author information

Authors and Affiliations

Department of Computer Science, University of Memphis, Memphis, TN, 38152, USA
Nam S Vo & Vinhthuy Phan
Bioinformatics Program, University of Memphis, Memphis, TN, 38152, USA
Quang Tran

Authors

Nam S Vo
View author publications
You can also search for this author in PubMed Google Scholar
Quang Tran
View author publications
You can also search for this author in PubMed Google Scholar
Vinhthuy Phan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nam S Vo.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Vo, N.S., Tran, Q. & Phan, V. An integrated approach for SNP calling based on population of genomes. BMC Bioinformatics 15 (Suppl 10), P30 (2014). https://doi.org/10.1186/1471-2105-15-S10-P30

Download citation

Published: 29 September 2014
DOI: https://doi.org/10.1186/1471-2105-15-S10-P30

UT-KBRIN Bioinformatics Summit 2014: Abstracts

An integrated approach for SNP calling based on population of genomes

Background

Materials and methods

Results

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

UT-KBRIN Bioinformatics Summit 2014: Abstracts

An integrated approach for SNP calling based on population of genomes

Background

Materials and methods

Results

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us