Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities

Peabody, Michael A.; Van Rossum, Thea; Lo, Raymond; Brinkman, Fiona S. L.

doi:10.1186/s12859-015-0788-5

Comments from the authors

Michael Peabody, Simon Fraser University

16 December 2015

We have received questions from a few researchers about our paper. Below are some answers in case they are of broader interest.

---------

Q: Did you evaluate how well methods performed without clade exclusion (i.e. no species were removed from the database being compared to)?

A: Yes we did; the results are shown in Figure S1 in Additional File 2.

Q: Why are there so few species in your test datasets for the evaluations?

A: A smaller number of species in the test sets makes the system easier to comprehend, which enables us to better understand the performance of the different methods. The MetaSimHC dataset was used since it was previously published and proposed as an evaluation dataset, plus it contains diverse taxa. For the in vitro dataset, making a mock community containing many species is non-trivial. Note we chose species relevant to a larger study we were performing and encourage researchers to customize any test dataset to suit their analysis needs. We also purposefully included some more closely related organisms, to evaluate how well methods differentiated more closely related taxa.

Q: If I'm just starting out, which method should I try?

A: MEGAN5 has some great features for performing all sorts of analyses and visualizations - good for exploring your data and potentially a good first method for starting out. We did find in this analysis (and subsequent analyses) that MEGAN4 and MEGAN5 with default settings tend to overpredict reads (e.g. assign a read to a similar species in the same genus if the actual species was removed from the database), however they otherwise perform quite well with BLASTX-like programs (BLASTX, RAPSearch2, Diamond). If the goal is trying to analyze community composition rather than classify all of the reads, a marker based method (which we would expect to have high precision and run relatively quick) such as MetaPhyler or MetaPhlAn should work well. MetaPhlAn2 was recently released and should be worth checking out (http://www.nature.com/nmeth/journal/v12/n10/full/nmeth.3589.html). Kraken is an incredibly quick method to run and should identify species if they are in the database used. CARMA3 and DiScRIBinATE were methods we found to be more conservative/less prone to overpredict reads relative to other methods with similar sensitivity, so are worth checking out if you are concerned about reads being overpredicted (and thus predicting species that aren't actually in the sample). See the paper discussion for more information.

Q: Why do certain methods give false species predictions?

A: This is method dependent. For example, MEGAN4 and MG-RAST (using LCA) rely on relatively simple lowest common ancestor (LCA) approaches. If a read makes a single hit to one species and it reaches the bit-score threshold, the read will be assigned directly to the species level, regardless of how well the hit/alignment is in terms of other metrics such as % identity (although depending on parameter choices such as the minimum support parameter the read may be reassigned by MEGAN4).

Q: Why is MG-RAST not in the clade exclusion analysis? I see it only in the analysis without clade exclusion.

A: Many methods, including MG-RAST, could not be evaluated with clade exclusion because we didn't have access to manipulate their underlying database. Others were not evaluated simply due to time constraints, as there are a very large number of methods to perform metagenomics taxonomic sequence classification (and more keep coming out). Hopefully more methods will be evaluated in this way and we have evaluated more methods already since this publication (see below).

We also have a few additional comments:

Strain level classification can be quite difficult, and complicated by notable within-strain variability that can occur, so we only looked at classifications down to the species level.

Pseudomonas fluorescens Pf-5 is now known as Pseudomonas protegens Pf-5.

Due to the way we evaluated sensitivity and precision, more conservative methods which tend to assign reads to higher taxonomic levels if the species is not in the database showed higher levels of sensitivity and precision. However, if you are only interested in more specific taxonomic ranks like species and genus level classifications, these methods may not be as useful if they end up classifying few reads to these taxonomic levels. Less conservative methods make the trade-off of assigning reads to more specific taxonomic levels, with higher rates of overprediction.

We have evaluated MEGAN5 on BLASTX, RAPSearch2, Diamond sensitive mode, RAPSearch2 fast mode, and Diamond for the MetaSimHC dataset with 250bp simulated reads and overpredictions considered incorrect. As we move from BLASTX to Diamond in this list of heuristic search methods, we generally see a tradeoff of slightly decreased sensitivity, for a substantially improved (i.e. shortened) running time - see http://www.slideshare.net/Mpeabody/comparison-of-megan5-with-different-similarity-search-methods. However, precision stays about the same. We have also compared MEGAN4 vs MEGAN5 using default parameters for each, and find MEGAN5 has slightly increased sensitivity and similar precision relative to MEGAN4.

Hope this is helpful.
-Michael Peabody

Competing interests

None declared

Comments from the authors

Michael Peabody, Simon Fraser University

16 December 2015

We have received questions from a few researchers about our paper. Below are some answers in case they are of broader interest.

---------

Q: Did you evaluate how well methods performed without clade exclusion (i.e. no species were removed from the database being compared to)?

A: Yes we did; the results are shown in Figure S1 in Additional File 2.

Q: Why are there so few species in your test datasets for the evaluations?

A: A smaller number of species in the test sets makes the system easier to comprehend, which enables us to better understand the performance of the different methods. The MetaSimHC dataset was used since it was previously published and proposed as an evaluation dataset, plus it contains diverse taxa. For the in vitro dataset, making a mock community containing many species is non-trivial. Note we chose species relevant to a larger study we were performing and encourage researchers to customize any test dataset to suit their analysis needs. We also purposefully included some more closely related organisms, to evaluate how well methods differentiated more closely related taxa.

Q: If I'm just starting out, which method should I try?

A: MEGAN5 has some great features for performing all sorts of analyses and visualizations - good for exploring your data and potentially a good first method for starting out. We did find in this analysis (and subsequent analyses) that MEGAN4 and MEGAN5 with default settings tend to overpredict reads (e.g. assign a read to a similar species in the same genus if the actual species was removed from the database), however they otherwise perform quite well with BLASTX-like programs (BLASTX, RAPSearch2, Diamond). If the goal is trying to analyze community composition rather than classify all of the reads, a marker based method (which we would expect to have high precision and run relatively quick) such as MetaPhyler or MetaPhlAn should work well. MetaPhlAn2 was recently released and should be worth checking out (http://www.nature.com/nmeth/journal/v12/n10/full/nmeth.3589.html). Kraken is an incredibly quick method to run and should identify species if they are in the database used. CARMA3 and DiScRIBinATE were methods we found to be more conservative/less prone to overpredict reads relative to other methods with similar sensitivity, so are worth checking out if you are concerned about reads being overpredicted (and thus predicting species that aren't actually in the sample). See the paper discussion for more information.

Q: Why do certain methods give false species predictions?

A: This is method dependent. For example, MEGAN4 and MG-RAST (using LCA) rely on relatively simple lowest common ancestor (LCA) approaches. If a read makes a single hit to one species and it reaches the bit-score threshold, the read will be assigned directly to the species level, regardless of how well the hit/alignment is in terms of other metrics such as % identity (although depending on parameter choices such as the minimum support parameter the read may be reassigned by MEGAN4).

Q: Why is MG-RAST not in the clade exclusion analysis? I see it only in the analysis without clade exclusion.

A: Many methods, including MG-RAST, could not be evaluated with clade exclusion because we didn't have access to manipulate their underlying database. Others were not evaluated simply due to time constraints, as there are a very large number of methods to perform metagenomics taxonomic sequence classification (and more keep coming out). Hopefully more methods will be evaluated in this way and we have evaluated more methods already since this publication (see below).

We also have a few additional comments:

Strain level classification can be quite difficult, and complicated by notable within-strain variability that can occur, so we only looked at classifications down to the species level.

Pseudomonas fluorescens Pf-5 is now known as Pseudomonas protegens Pf-5.

Due to the way we evaluated sensitivity and precision, more conservative methods which tend to assign reads to higher taxonomic levels if the species is not in the database showed higher levels of sensitivity and precision. However, if you are only interested in more specific taxonomic ranks like species and genus level classifications, these methods may not be as useful if they end up classifying few reads to these taxonomic levels. Less conservative methods make the trade-off of assigning reads to more specific taxonomic levels, with higher rates of overprediction.

We have evaluated MEGAN5 on BLASTX, RAPSearch2, Diamond sensitive mode, RAPSearch2 fast mode, and Diamond for the MetaSimHC dataset with 250bp simulated reads and overpredictions considered incorrect. As we move from BLASTX to Diamond in this list of heuristic search methods, we generally see a tradeoff of slightly decreased sensitivity, for a substantially improved (i.e. shortened) running time - see http://www.slideshare.net/Mpeabody/comparison-of-megan5-with-different-similarity-search-methods. However, precision stays about the same. We have also compared MEGAN4 vs MEGAN5 using default parameters for each, and find MEGAN5 has slightly increased sensitivity and similar precision relative to MEGAN4.

Hope this is helpful.
-Michael Peabody

Competing interests
None declared

Archived Comments for: Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities

Comments from the authors

Competing interests

BMC Bioinformatics

Contact us