COSMOS: cloud enabled NGS analysis

Souilmi, Yassine; Jung, Jae-Yoon; Lancaster, Alex; Gafni, Erik; Amzazi, Saaid; Ghazal, Hassan; Wall, Dennis; Tonellato, Peter

doi:10.1186/1471-2105-16-S2-A2

Volume 16 Supplement 2

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Meeting abstract
Open access
Published: 28 January 2015

COSMOS: cloud enabled NGS analysis

Yassine Souilmi^1,2,
Jae-Yoon Jung²,
Alex Lancaster²,
Erik Gafni³,
Saaid Amzazi¹,
Hassan Ghazal⁴,
Dennis Wall^2,5 &
…
Peter Tonellato²

BMC Bioinformatics volume 16, Article number: A2 (2015) Cite this article

1545 Accesses
2 Citations
Metrics details

Background

The dramatic fall of next generation sequencing (NGS) cost in recent years positions the price in range of typical medical testing, and thus whole genome analysis (WGA) may be a viable clinical diagnostic tool. Modern sequencing platforms routinely generate petabyte data. The current challenge lies in calling and analyzing this large-scale data, which has become the new time and cost rate-limiting step.

Methods

To address the computational limitations and optimize the cost, we have developed COSMOS (http://cosmos.hms.harvard.edu) , a scalable, parallelizable workflow management system running on clouds (e.g., Amazon Web Services or Google Clouds). Using COSMOS [1], we have constructed a NGS analysis pipeline implementing the Genome Analysis Toolkit - GATK v3.1 - best practice protocol [2, 3], a widely accepted industry standard developed by the Broad Institute. COSMOS performs a thorough sequence analysis, including quality control, alignment, variant calling and an unprecedented level of annotation using a custom extension of ANNOVAR. COSMOS takes advantage of parallelization and the resources of a high-performance compute cluster, either local or in the cloud, to process datasets of up to the petabyte scale, which is becoming standard in NGS.

Conclusion

This approach enables the timely and cost-effective implementation of NGS analysis, allowing for it to be used in a clinical setting and translational medicine. With COSMOS we reduced the whole genome data analysis cost under the $100 barrier, placing it within a reimbursable cost point and in clinical time, providing a significant change to the landscape of genomic analysis and cement the utility of cloud environment as a resource for Petabyte-scale genomic research.

References

Gafni E, Luquette LJ, Lancaster AK, Hawkins JB, Jung J-Y, Souilmi Y, Wall DP, Tonellato PJ: COSMOS: Python library for massively parallel workflows. Bioinformatics. 2014, btu385-
Google Scholar
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011, 43: 491-498. 10.1038/ng.806.
Article PubMed Central CAS PubMed Google Scholar
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA: From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. 2013
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biology, Faculty of Sciences of Rabat, Morocco
Yassine Souilmi & Saaid Amzazi
Center for Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
Yassine Souilmi, Jae-Yoon Jung, Alex Lancaster, Dennis Wall & Peter Tonellato
INVITAE, San Francisco, CA, 94107, USA
Erik Gafni
Department of Biology, Mohamed First University, Oujda/Nador, Morocco
Hassan Ghazal
Department of Pediatrics, Division of Systems Medicine, Stanford University, Stanford, CA, 94305, USA
Dennis Wall

Authors

Yassine Souilmi
View author publications
You can also search for this author in PubMed Google Scholar
Jae-Yoon Jung
View author publications
You can also search for this author in PubMed Google Scholar
Alex Lancaster
View author publications
You can also search for this author in PubMed Google Scholar
Erik Gafni
View author publications
You can also search for this author in PubMed Google Scholar
Saaid Amzazi
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Ghazal
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Wall
View author publications
You can also search for this author in PubMed Google Scholar
Peter Tonellato
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Souilmi, Y., Jung, JY., Lancaster, A. et al. COSMOS: cloud enabled NGS analysis. BMC Bioinformatics 16 (Suppl 2), A2 (2015). https://doi.org/10.1186/1471-2105-16-S2-A2

Download citation

Published: 28 January 2015
DOI: https://doi.org/10.1186/1471-2105-16-S2-A2

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

COSMOS: cloud enabled NGS analysis

Background

Methods

Conclusion

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

COSMOS: cloud enabled NGS analysis

Background

Methods

Conclusion

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us