Alignment-free methods for metagenomic profiling

Gao, Shanshan; Pham, Diem-Trang; Phan, Vinhthuy

doi:10.1186/1471-2105-16-S15-P4

Volume 16 Supplement 15

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Poster presentation
Open access
Published: 23 October 2015

Alignment-free methods for metagenomic profiling

Shanshan Gao¹,
Diem-Trang Pham¹ &
Vinhthuy Phan¹

BMC Bioinformatics volume 16, Article number: P4 (2015) Cite this article

1243 Accesses
1 Citations
Metrics details

Background

The primary goal of metagenomic studies is to analyze and evaluate the rich microbial communities present in all natural environments. The construction and utilization of a large index required by alignment-based methods for thousands of microbial genomes can be computationally prohibitive. To avoid this computational cost, we investigated three different variations of an alignment-free method for profiling abundances of microbial communities.

Materials and methods

The main idea of the method is reformulate the problem of determining abundance of microbial genomes as finding optimal solutions of linear equations that satisfy specific constraints. A set of genomic markers for the entire set of genomes is represented by a matrix F, where F_ij represents the frequency of marker i in genome j. The occurrence vector b represents the frequencies of markers in reads. We would like to find an optimal solution x, the abundance vector in which x_j represents the abundance of genome j. To find the abundance vector x, we solve the linear equation Fx = b. The methods to choose F and b are the key factor to find the optimal value of x. We introduced a concept of genome specific marker (GSM), which is a kmer that occurs in only one genome and no other. We exhaustively determine such markers from the entire dataset and represent the frequencies of these markers in the matrix F. Given a set of reads from a metagenomic dataset, we compute the frequency of GSM as b. Then, three variations can be formulated, respectively, as a linear programming problem (LP), a least-square approximation problem (L2), and an L1-approximation problem.

Results

So far, our investigation on two data sets consisting of 100 and 1105 microbial genomes showed that the linear programming formulation (LP) yielded the best prediction of abundances of microbial genomes. This result was consistent across different levels of abundances. The LP variant also achieved better results across the board compared to a popular metagenomic profiler, FOCUS[1], which was found to be superior to other methods.

Conclusions

In the future, we need to investigate deeper into the matrix F which consists not only of the GSM, but also the kmers that occur in more than one genome.

References

Silva GGZ, et al: “FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares.”. PeerJ. 2014, 2: e425-
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Memphis, Memphis, TN, 38152, USA
Shanshan Gao, Diem-Trang Pham & Vinhthuy Phan

Authors

Shanshan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Diem-Trang Pham
View author publications
You can also search for this author in PubMed Google Scholar
Vinhthuy Phan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinhthuy Phan.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Gao, S., Pham, DT. & Phan, V. Alignment-free methods for metagenomic profiling. BMC Bioinformatics 16 (Suppl 15), P4 (2015). https://doi.org/10.1186/1471-2105-16-S15-P4

Download citation

Published: 23 October 2015
DOI: https://doi.org/10.1186/1471-2105-16-S15-P4

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Alignment-free methods for metagenomic profiling

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Alignment-free methods for metagenomic profiling

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us