Metabolomics data analysis and missing value issues with application to infarcted mouse hearts

Shah, Jasmit S; Brock, Guy N; Rai, Shesh N

doi:10.1186/1471-2105-16-S15-P16

Volume 16 Supplement 15

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Poster presentation
Open access
Published: 23 October 2015

Metabolomics data analysis and missing value issues with application to infarcted mouse hearts

Jasmit S Shah^1,2,
Guy N Brock² &
Shesh N Rai²

BMC Bioinformatics volume 16, Article number: P16 (2015) Cite this article

1602 Accesses
7 Citations
Metrics details

Background

High throughput technology makes it possible to monitor metabolites on different experiments and has been widely used to detect differences in metabolites in many areas of biomedical research. Mass spectrometry has become one of the main analytical technique for profiling a wide array of compounds in the biological samples. Extracting relevant biological information from large datasets is one of the challenges. Missing values in metabolomics datasets occur widely and can arise from different sources, including both technical and biological reasons. Mostly the missing value is substituted by the minimum value, and this substitute may lead to different results in the downstream analysis. Different methods tend to give different results. In this study we summarize the statistical analysis of metabolomics data with no missing values and with missing values. With the missing values, we compare the different methods and examine the outcomes based on each method.

Materials and methods

Analysis was done on 276 metabolites from 10 samples (12 metabolites excluded due to not detected in either group). 204 metabolites had complete information in all samples[1]. We used seven different Missing Value (MV) imputations: Zero, Mean, Median, Half Minimum (HM), k Nearest Neighbors (kNN), Random Forest (RF) and Probabilistic Principal Components Analysis (PPCA). Filtering, scaling and transformation was done with inter-quartile range, pareto scaling and log transformation respectively. Different downstream analyses such as t-test, fold change, PLS-DA, correlation analysis, etcetera, were done.

Results

Zero gave the least number of significant metabolites whereas Mean gave the most. 55 metabolites were uniquely identified by all methods in the volcano plot; 28 metabolites were similar across all methods.

Conclusions

We have shown that the selection of imputation methods to replace MVs may have a dramatic impact on the data. The handling of missing values is an absolutely crucial step in the data pre-processing. Metabolites such as adenylosuccinate, caprylate (8:0) and N-acetylalanine are only detectible by a specific method and may be important in their specific metabolic pathways and so choosing an appropriate method is critical. Also PLS-DA FAD is important for only kNN in predicting the class membership whereas adenine and adenylosuccinate is important only for Zero and Mean Methods. In future studies we will further examine MVs and model an appropriate method such that the correct significant metabolites are captured.

References

Sansbury BE, et al: Metabolomic analysis of pressure-overloaded and infarcted mouse hearts. Circ Heart Fail. 2014, 7: 634-642. 10.1161/CIRCHEARTFAILURE.114.001151.
Article PubMed CAS PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

The Diabetes and Obesity Center, University of Louisville, Louisville, KY, 40202, USA
Jasmit S Shah
Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 40202, USA
Jasmit S Shah, Guy N Brock & Shesh N Rai

Authors

Jasmit S Shah
View author publications
You can also search for this author in PubMed Google Scholar
Guy N Brock
View author publications
You can also search for this author in PubMed Google Scholar
Shesh N Rai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jasmit S Shah.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Shah, J.S., Brock, G.N. & Rai, S.N. Metabolomics data analysis and missing value issues with application to infarcted mouse hearts. BMC Bioinformatics 16 (Suppl 15), P16 (2015). https://doi.org/10.1186/1471-2105-16-S15-P16

Download citation

Published: 23 October 2015
DOI: https://doi.org/10.1186/1471-2105-16-S15-P16

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Metabolomics data analysis and missing value issues with application to infarcted mouse hearts

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Metabolomics data analysis and missing value issues with application to infarcted mouse hearts

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us