NGS-Logistics: data infrastructure for efficient analysis of NGS sequence variants across multiple centers
© Ardeshirdavani et al; licensee BioMed Central Ltd. 2015
Published: 28 January 2015
Next-Generation Sequencing (NGS) is a key tool in genomics, in particular in research and diagnostics of human Mendelian, oligogenic, and complex disorders . Multiple projects now aim at mapping the human genetic variation on a large scale, such as the 1,000 Genomes Project, the UK 100k Genome Project. Meanwhile with the dramatic decrease of the price and turnaround time, large amounts of human sequencing data have been generated over the past decade . As of January 2014, about 2,555 sequencers were spread over 920 centers across the world . As a result, about 100,000 human exome have been sequenced so far . Crucially, the speed at which NGS data is produced greatly surpasses Moore's law  and challenges our ability to conveniently store, exchange, and analyze this data. Data pre-processing is needed to extract reliable information from sequencing data and it can be divided into two major steps: primary analysis (image analysis and base calling) and secondary analysis. When looking for variation in the human genome, secondary analysis consists of aligning/mapping the reads against the reference genome and scanning the alignment for variation. Both raw data and mapped reads are large files occupying significant disk storage space. The collection of files resulting from the analysis of a single whole genome study can take up to 50Gb of disk space. This raises significant issues in terms of computing and data storage and transfer, with off-site data transfer currently being a key bottleneck. Moreover, the analysis of NGS data also raises the major challenge of how to reconcile federated analysis of personal genomic data and confidentiality of data to protect privacy. In many situations, the analysis of data from a single study alone will be much less powerful than if it can be correlated with other studies. In particular, when investigating a mutation of interest, it is extremely useful to obtain data about other patients or controls sharing similar mutations. However, personal genome data (whole genome, exome, transcriptome data, etc.) is sensitive personal data. Confidentiality of this data must be guaranteed at all times and only duly authorized researchers should access such personal data.
The pilot version of NGS-Logistics has been installed and is currently being beta-tested by users at the Center for Human Genetics of the University of Leuven. Currently we have two installations of the system, the first one at the Leuven University Hospitals and the second one at the Flemish Supercomputing Center (VSC). The development of NGS-Logistics has significantly reduced the effort and time needed to evaluate the significance of mutations from full genome sequencing and exome sequencing, in a safe and confidential environment. This platform provides more opportunities for operators who are interested in expanding their queries and further analysis.
- Voelkerding KV, Dames SA, Durtschi JD: Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009, 55 (4): 641-658. 10.1373/clinchem.2008.112789.View ArticlePubMedGoogle Scholar
- Institute NHGR: DNA Sequencing Costs. 2013Google Scholar
- Next Generation Genomics: World Map of High-throughput Sequencers. [http://omicsmaps.com/]
- Human genome: Genomes by the thousand. Nature. 2010, 467 (7319): 1026-1027.Google Scholar
- DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP). [http://www.genome.gov/sequencingcosts/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.