Skip to main content
  • Poster presentation
  • Open access
  • Published:

Development of large-scale metabolite identification methods for metabolomics


Large-scale identification of metabolites is key to elucidating and modeling metabolism at the systems level. Advances in metabolomics technologies, particularly ultra-high resolution mass spectrometry enable comprehensive and rapid analysis of metabolites, which is impractical to achieve by conventional methods. However, a significant barrier to meaningful data interpretation is the identification of a wide range of metabolites including unknowns and the determination of their role(s) in various metabolic networks. Our recent development of chemoselective (CS) probes to tag metabolite functional groups provides additional structural constraints for metabolite identification, but remains limited by the lack of functional group-resolved metabolite databases.

Materials and methods

We have developed a novel algorithm to allow for the rapid detection of functional groups within existing metabolite databases such as KEGG Ligand and the Human Metabolome Database in order to create functional group resolved versions of both databases. These databases will allow for combined molecular formula and functional group (from CS tagging) queries to aid in metabolite identification based on accurate mass information without a priori knowledge.


An isomeric analysis of both HMDB and KEGG demonstrates a high percentage of isomeric molecular formulas, indicating the necessity of techniques such as CS-tagging with detection via MS and NMR to help assign specific metabolites and their isotopologue and isotopomer distributions based upon both molecular formula and distinct composition of functional groups. Furthermore, these two databases have only moderate overlap in molecular formulae. Thus, it is prudent to use multiple databases in metabolite assignment, since each of the major metabolite databases represents different portions of metabolism within the biosphere. In silico analysis of various CS-tagging strategies under different conditions for adduct formation demonstrate that the combination of FT-MS derived molecular formulas and CS-tagging can significantly increase the unique identification of isotopologues based on the entries in KEGG and HMDB databases.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hunter NB Moseley.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mitchell, J.M., Fan, T.WM., Lane, A.N. et al. Development of large-scale metabolite identification methods for metabolomics. BMC Bioinformatics 15 (Suppl 10), P36 (2014).

Download citation

  • Published:

  • DOI: