IQMNMR: Open source software using timedomain NMR data for automated identification and quantification of metabolites in batches
 Xu Song^{1, 2},
 BoLi Zhang^{1, 3}Email author,
 HongMin Liu^{2}Email author,
 BoYang Yu^{1}Email author,
 XiuMei Gao^{3} and
 LiYuan Kang^{3}
DOI: 10.1186/1471210512337
© Song et al; licensee BioMed Central Ltd. 2011
Received: 20 April 2011
Accepted: 12 August 2011
Published: 12 August 2011
Abstract
Background
One of the most promising aspects of metabolomics is metabolic modeling and simulation. Central to such applications is automated highthroughput identification and quantification of metabolites. NMR spectroscopy is a reproducible, nondestructive, and nonselective method that has served as the foundation of metabolomics studies. However, the automated highthroughput identification and quantification of metabolites in NMR spectroscopy is limited by severe spectral overlap. Although numerous software programs have been developed for resolving overlapping resonances, as well as for identifying and quantifying metabolites, most of these programs are frequencydomain methods, considerably influenced by phase shifts and baseline distortions, and effective only in smallscale studies. Almost all these programs require multiple spectra for each application, and do not automatically identify and quantify metabolites in batches.
Results
We created IQMNMR, an R package that integrates a relaxation algorithm, digital filter, and similarity search algorithm. It differs from existing software in that it is a timedomain method; it uses not only frequency to resolve overlapping resonances but also relaxation time constants; it requires only one NMR spectrum per application; is uninfluenced by phase shifts and baseline distortions; and most important, yields a batch of quantified metabolites.
Conclusions
IQMNMR provides a solution that can automatically identify and quantify metabolites by onedimensional proton NMR spectroscopy. Its timedomain nature, stability against phase shifts and baseline distortions, requirement for only one NMR spectrum, and capability to output a batch of quantified metabolites are of considerable significance to metabolic modeling and simulation.
IQMNMR is available at http://cran.rproject.org/web/packages/IQMNMR/.
Background
Metabolomics, which complements other "omic" technologies (genomics, transcriptomics, and proteomics), is a rapidly emerging field of postgenomic research. One of the promising aspects of this discipline is metabolic modeling and simulation based on automated highthroughput identification and quantification of metabolites [1, 2]. However, metabolomics does not feature welldefined methods for automated highthroughput identification and quantification of metabolites [3]. Until recently, numerous works on metabolomics have been restricted to qualitative studies, often the result of statistical model analysis rather than metabolic modeling and simulation [3, 4].
NMR spectroscopy has served as the foundation of metabolomics studies [3]. The primary advantages of NMR spectroscopy are high reproducibility, nondestructiveness, nonselectivity in metabolite detection, and the ability to simultaneously quantify multiple classes of metabolites [5]. However, the automated highthroughput identification and quantification of metabolites in NMR spectroscopy is limited by severe spectral overlap [5].
Motivated by the requirement described above, researchers developed numerous software programs for automated resolution of overlapping signals, as well as metabolite identification and quantification; in these programs, one or twodimensional NMR spectra and databases of metabolite standards are used [6, 7]. However, most of the existing software programs are frequencydomain methods, considerably affected by phase shifts and baseline distortions [3, 5, 6, 8], and effective only in smallscale studies [7]. In addition, almost all these programs constantly require multiple spectra for each application, and do not automatically identify and quantify metabolites in batches [3, 5, 7].
In the current study, we created IQMNMR, an R package that provides one solution that can automatically identify and quantify metabolites by onedimensional proton NMR spectroscopy. It differs from existing software in terms of the following aspects: it is a timedomain method, uninfluenced by phase shifts and baseline distortions; it uses not only frequency to resolve overlapping resonances but also relaxation time constants; and it requires only one NMR spectrum per application, but outputs a batch of quantified metabolites. These advantages are of considerable significance to metabolic modeling and simulation.
Implementation
Overview of program flow and critical issues
where α_{ k }, d_{ k }, and ω_{ k } represent the nonzero complex amplitudes, damping factors (inverse time constants), and frequencies; z_{ k } represents the signal poles; and ξ(n) denotes the unobservable additive noise.
The frequency and damping factor of the dominant peak of the FID can be computed by searching the maximum of . Then, complex amplitude α_{ k } can be calculated using .
With the abovementioned procedures, the RELAX algorithm can be summarized as follows [10]:
Step 1. Assume that K = 1. Then, , , and are obtained from y.
Step 2. Assume that K = 2. y_{2} is calculated with Eq. (1.2) using , , and derived in Step 1. , , and are then obtained from y_{2}. Then, y_{1} is computed with Eq. (1.2) using , , and . We then redetermine , , and from y_{1}.
The first two steps are iterated until practical convergence is achieved (refer to the help files of IQMNMR).
Step 3. Assume that K = 3. y_{3} is computed with Eq. (1.2) using , , , , , and obtained in Step 2. Subsequently, , , and are derived from y_{3}. Next, y_{1} is recalculated with Eq. (1.2) using , , , , , and . , , and are then redetermined from y_{1}. After which y_{2} is recalculated with Eq. (1.2) using , , , , , and , , , and are redetermined from y_{2}.
The previous steps are iterated until practical convergence is achieved (refer to the help files of IQMNMR).
The procedures are repeated until K is equal to the desired value (see the help files of IQMNMR).
Simulation examples and practical applications have demonstrated that the RELAX algorithm is accurate and robust [10, 11]. The algorithm uses not only frequency to resolve overlapping resonances but also relaxation time constants [10], and has a resolution superior to that of FFT when FIDs are strongly damped or truncated [12]. As an iterative algorithm, however, its computational burden increases exponentially with the number of signals.
With the development of computer processor technologies, digital filtering has been increasingly used for NMR raw data processing [13]. A digital filter can suppress undesirable frequency ranges and maintain desired frequency ranges, as well as improve signaltonoise ratio and overall sensitivity [13].
As the input file (FID) is filtered into subbands, the total number of steps required by the RELAX algorithm decreases, and the computation could be parallelized. Parallel computing can be efficiently performed by cloud computing. An example is Amazon's Elastic Compute Cloud http://aws.amazon.com/ec2/, which was used in the field of comparative genomics[14]. In cloud computing, the time consumed by IQMNMR is substantially reduced. Digital filtering and cloud computing enable IQMNMR to be a highthroughput method.
After resolving each subband into damped sinusoids IQMNMR only keeps damped sinusoids that are within a specific frequency range. This range is less than the passband range of the subband to decrease the influence of the Gibbs effect, which stems from the digital filter. The passband range of each subband overlaps with that of adjoining subbands to avoid information loss.
Several metabolomic databases have emerged to serve as bioinformatics resources for identifying common metabolites from experimental data [15, 16]. The Madison Metabolomics Consortium Database [16] http://mmcd.nmrfam.wisc.edu/, for instance, has collected information on more than 20,000 metabolites. Therefore, prior knowledge data sets containing the standard spectra of targeted metabolites can be created on the basis of these metabolomic databases.
The results of the RELAX algorithm are amplitudes, frequencies, and damping constants (the reciprocal of relaxation time constants). The initial timedomain amplitude of an NMR resonance is proportional to the frequencydomain area under the NMR spectral absorption mode peak. A cosine similarity measure [17] can be constructed on the basis of amplitudes (which are located in specific frequency ranges) and prior knowledge data sets. This way, the targeted metabolites are identified by the similarity search algorithm. The total number of hydrogen nuclei that generate the resonance lines of a targeted metabolite is directly proportional to the sum of integrated signal areas of the targeted metabolite. The targeted metabolites and internal standard are the components of the same sample, so that both have the same variation in receiver gain, probe design, etc. In this manner, the targeted metabolites can be quantified by comparing the amplitudes of the targeted metabolites and the internal standard.
Workflow overview
IQMNMR is a fully automated method. Identifying and quantifying targeted metabolites entails only two steps.
Step one: creating prior knowledge data sets of targeted metabolites
The prior knowledge data set consists of two tables: "lists_metabolites" and "space_x." The "lists_metabolites" table contains information on the molecular constitutions of targeted metabolites and experimental conditions of standard onedimensional proton NMR spectroscopy. The "space_x" table contains information on the chemical shifts of targeted metabolites and area ratios of intramolecular peaks. The variable descriptions of these tables are listed in the help files of IQMNMR.
We created a prior knowledge data set using the Madison Metabolomics Consortium Database as basis [16]. The aforementioned tables can be loaded by typing "data(lists_metabolites); data(space_x)" in the R command console. Furthermore, users can collect data and create prior knowledge data sets according to this paradigm.
Step two: identifying and quantifying metabolites
The function "identify_quantify" uses the RELAX algorithm, digital filter, and similarity search algorithm to automatically resolve overlapping signals, as well as identify and quantify targeted metabolites. Its arguments are listed in the help files of IQMNMR. This function outputs a table that presents the names, concentrations, and cosine similarity measures of targeted metabolites.
Results and Discussion
The results of identification and quantification
Name  Measured Concentration (mM)  True Concentration (mM)  Relative error 

Acetic acid  0  1.91  
Adonitol  0  0  
Agmatine  23.94  27.76  13.76 
Alanine  0  0  
betaAlanine  8.34  14.08  40.77 
alphaKetoglutaric acid  1.83  1.81  1.27 
Methyl 4aminobutyrate  8.70  10.95  20.50 
4(2Aminoethyl)morpholine  0  0  
Anthranilic acid  0  0  
LArginine  0  0  
LAscorbate  0  0  
LAsparagine  17.39  21.83  20.34 
Benzoate  0  0  
transCinnamic acid  7.22  5.086  42.03 
Citrate  3.57  2.92  22.15 
Ethanol  0  0  
DGalactono1,4lactone  0  24.73  
LGlutamic acid  0  0  
LHistidine  0  0  
Homogentisic acid  0  0  
OSuccinylLhomoserine  0  0  
Imidazole  0  0  
Inosine 5'monophosphate  0  0  
LIsoleucine  20.35  21.03  3.25 
LKynurenine  8.048  5.48  46.76 
Malic acid  22.10  27.65  20.10 
NAcetylDmannosamine  10.56  17.90  41.02 
LMethionine methylsulfonium iodide  9.60  8.84  8.64 
3Methyl2oxobutanoic acid  0  0  
Nicotinic acid  0  0  
Nicotine  12.20  8.10  35.75 
4Nitrocatechol  15.70  12.02  30.65 
N(alpha)AcetylDLornithine  0  0  
Phenol  0  0  
Phenylacetic acid  0  0  
LPhenylalanine  10.85  21.75  50.10 
DLPipecolic acid  0  0  
Polygalacturonic acid  0  0  
LProline  0  0  
trans4HydroxyLproline  0  0  
Pyridoxal5phosphate  41.34  19.61  110.78 
Quinolinic acid  16.69  16.51  1.04 
DRibulose 5phosphate  0  0  
Sarcosine  0  0  
LSerine  0  0  
LThreonine  10.45  13.12  20.37 
DTrehalose  0  0  
Trigonelline  0  0  
Tryptamine  0  0  
Tyramine  0  0  
LTyrosine  0  0  
Uracil  0  0  
Uridine  8.28  10.97  24.53 
LValine  18.66  13.66  36.60 
where "m" and "r" are the measured and real concentrations of targeted metabolites, respectively. The identification rate is defined as the number of identified metabolites divided by the total number of targeted metabolites. A metabolite is identified if its true and measured concentrations are higher than zero, or if its true and measured concentrations equal zero.
Figure 3 shows clear phase shifts and baseline distortions. As a timedomain method, IQMNMR is stable against phase shifts and baseline distortions. Table 1 presents the result of IQMNMR. The mean of related errors is 29.52%; the standard deviation of related errors is 23.70%; and the identification rate is 96.36%. Given that FID is filtered into subbands and the computation is parallelized, cloud computing [14] can substantially reduce the time consumed by IQMNMR. On the basis of these results, we conclude that IQMNMR provides one solution that can automatically identify and quantify metabolites in batches.
Quantification in metabolomics is generally performed by either absolute or relative quantification. Absolute quantification uses an internal standard to determine the absolute concentration. The metabolites and internal standard are the components of the same sample. Hence, changes in receiver gain, probe design, etc. are the same for the metabolites and internal standard. The signal intensities in an NMR spectrum only depend on the molar concentrations of the sample [18]. Consequently, the absolute concentrations of metabolites can be easily obtained after using RELAX and similarity search algorithms by comparing the amplitudes of the targeted metabolites and the internal standard. In relative quantification, the metabolite signal intensity is normalized to that of a specific metabolite, which is the component of the same sample. In principle, absolute quantification encompasses relative quantification. If the absolute concentrations of the metabolites are known, their relative ratios can be easily calculated. Additionally, for relative quantification, an accurate determination of the internal standard concentration is unnecessary.
The quantitative error is affected by color noise, white noise, the Gibbs effect of a digital filter, and signal overlapping. The RELAX algorithm performs well in the presence of colored noise, white noise, and signal overlapping [10, 11]. However, this algorithm is unable to deal with the quantitative error caused by the Gibbs effect. Oversampling technique had been used in modern NMR spectrometry [13, 19, 20]. Oversampling can ensure a higher filter order, and consequently decrease the ripple and proportion of the overshoot range in the passband range. Therefore, oversampling can effectively deal with the influence of the Gibbs effect. However, the final FID generated by modern NMR spectrometry is reduced in order to avoid a large data set. For example, in 20fold oversampling, the number of data points also increases by a factor of 20 [13]. For an FID size of 64 000 data points, 20fold oversampling results in 1.3 million data points [13].
Presently, IQMNMR only uses information on amplitude ratios and peak locations. In future editions, information on coupling will be used. We believe that coupling information decreases identification and quantification errors.
To highlight the resolution of the RELAX algorithm, the magnetic field strengths of the simulated FID cited above were set to 400 MHz. Some metabolomics studies were carried out at low magnetic field strengths (<600 MHz) [21], but a higher magnetic field leads to increased signals resolution, thereby improving the performance of the RELAX algorithm. We suggest that higher magnetic fields be used to generate FIDs for the application of IQMNMR.
Different NMR spectrometers must use different prior knowledge data sets acquired at the same magnetic field strengths as the NMR spectrometer settings. Before using this package, users must create a prior knowledge data set that matches the magnetic field strength of their NMR spectrometer.
Some unknown metabolites will inevitably exist in the sample. IQMNMR assumes that FID is modeled as the sum of sinusoidal, autoregressive noise, and white gaussian noise signals. Whether these signals are known, the digital filter separates FID into subbands, the RELAX algorithm decomposes these subbands into their constituent signals, and the similarity search algorithm identifies the signal combinations that match the prior knowledge data set and quantifies them. Future editions will involve the generation of resultant NMR data that contain only the remaining sinusoidal and noise signals, so that further analysis can be performed by users.
IQMNMR reduces spectral data to a batch of quantified metabolites that is more beneficial than spectral binning. The batch of metabolites can be directly used as input variables in principal component analysis or metabolic modeling and simulation.
Although IQMNMR provides for metabolomics identification and quantification, validation via application to real samples (i.e., complex multicomponent systems) should be a prerequisite for practicality. Metabolomics reflects a paradigm shift from reductionism to holism [22]. The key to its success is multidisciplinary collaboration [22].
Conclusions
Metabolite identification is the foundation of metabolomics. The quantification of metabolites is a stateoftheart approach. IQMNMR provides one solution that can automatically identify and quantify metabolites in batches by onedimensional proton NMR spectroscopy. It is a timedomain method that features stability against phase shifts and baseline distortions. It uses not only frequency to resolve overlapping resonances but also relaxation time constants. It requires only one NMR spectrum per application, and produces a batch of quantified metabolites. These features are of considerable significance to metabolic modeling and simulation.
Availability and requirements
Project name: IQMNMR
Project home page: http://cran.rproject.org/web/packages/IQMNMR/
Operating systems: UNIX or MAC
Programming language: R
Other requirements: None
License: GNU GPL
Any restrictions on use by nonacademics: None
List of abbreviations
 NMR:

nuclear magnetic resonance
 FID:

free induction decay
 RELAX algorithm:

relaxation algorithm
 FFT:

fast Fourier transform
Declarations
Acknowledgements and Funding
This work is supported by the National Program on the Key Basic Research Project (973 Program) of China.
Authors’ Affiliations
References
 Bundy JG, Papp B, Harmston R, Browne RA, Clayson EM, Burton N, Reece RJ, Oliver SG, Brindle KM: Evaluation of predicted network modules in yeast metabolism using NMRbased metabolite profiling. Genome research 2007, 17(4):510–519. 10.1101/gr.5662207PubMed CentralView ArticlePubMedGoogle Scholar
 Wishart DS: Current progress in computational metabolomics. Brief Bioinform 2007, 8(5):279–293. 10.1093/bib/bbm030View ArticlePubMedGoogle Scholar
 Schripsema J: Application of NMR in plant metabolomics: techniques, problems and prospects. Phytochem Anal 2010, 21(1):14–21. 10.1002/pca.1185View ArticlePubMedGoogle Scholar
 Roessner U, Bowne J: What is metabolomics all about? Biotechniques 2009, 46(5):363–365. 10.2144/000113133View ArticlePubMedGoogle Scholar
 Issaq HJ, Van QN, Waybright TJ, Muschik GM, Veenstra TD: Analytical and statistical approaches to metabolomics research. J Sep Sci 2009, 32(13):2183–2199. 10.1002/jssc.200900152View ArticlePubMedGoogle Scholar
 IzquierdoGarcia JL, Rodriguez I, Kyriazis A, Villa P, Barreiro P, Desco M, RuizCabello J: A novel Rpackage graphic user interface for the analysis of metabonomic profiles. BMC Bioinformatics 2009, 10: 363. 10.1186/1471210510363PubMed CentralView ArticlePubMedGoogle Scholar
 Lewis IA, Schommer SC, Markley JL: rNMR: open source software for identifying and quantifying metabolites in NMR spectra. Magn Reson Chem 2009.Google Scholar
 Aranibar N, Ott KH, Roongta V, Mueller L: Metabolomic analysis using optimized NMR and statistical methods. Anal Biochem 2006, 335(1):62–70.View ArticleGoogle Scholar
 Jian L, Stoica P: Efficient mixedspectrum estimation with applications to target feature extraction. IEEE Trans Signal Process 1996, 44(2):281–295. 10.1109/78.485924View ArticleGoogle Scholar
 Liu ZS, Li J, Stoica P: RELAXbased estimation of damped sinusoidal signal parameters. Signal processing 1997, 62(3):311–321. 10.1016/S01651684(97)001321View ArticleGoogle Scholar
 Li J, Zheng D, Stoica P: Angle and waveform estimation via RELAX. IEEE Trans Aerosp Electron Syst 1997, 33(3):1077–1087.View ArticleGoogle Scholar
 Bi Z, Bruner AP, Li J, Scott KN, Liu ZS, Stopka CB, Kim HW, Wilson DC: Spectral Fitting of NMR Spectra Using an Alternating Optimization Method with a Priori Knowledge. J Magn Reson 1999, 140(1):108–119. 10.1006/jmre.1999.1833View ArticlePubMedGoogle Scholar
 Moskau D: Application of real time digital filters in NMR spectroscopy. Concepts Magn Reson 2002, 15(2):164–176. 10.1002/cmr.10031View ArticleGoogle Scholar
 Dennis W, Parul K, Vincent F, Rimma P, Prasad P, Peter T: Cloud computing for comparative genomics. BMC Bioinformatics 2010, 11(259):1–12.Google Scholar
 Go EP: Database resources in metabolomics: an overview. J Neuroimmune Pharmacol 2010, 5(1):18–30. 10.1007/s1148100991573View ArticlePubMedGoogle Scholar
 Cui Q, Lewis IA, Hegeman AD, Anderson ME, Li J, Schulte CF, Westler WM, Eghbalnia HR, Sussman MR, Markley JL, et al.: Metabolite identification via the Madison Metabolomics Consortium Database. Nat Biotechnol 2008, 26(2):162–164. 10.1038/nbt0208162View ArticlePubMedGoogle Scholar
 Korenius T, Laurikkala J, Juhola M: On principal component analysis, cosine and Euclidean measures in information retrieval. Information Sciences 2007, 177(22):4893–4905. 10.1016/j.ins.2007.05.027View ArticleGoogle Scholar
 Verpoorte R, Choi YH, Mustafa NR, Kim HK: Metabolomics: back to basics. Phytochemistry Reviews 2008, 7(3):525–537. 10.1007/s1110100890917View ArticleGoogle Scholar
 Wider G: Elimination of baseline artifacts in NMR spectra by oversampling. J Magn Reson 1990, 89: 406.Google Scholar
 Hauser MW: Principles of oversampling A/D conversion. J Audio Eng Soc 1991, 39(1/2):3–26.Google Scholar
 Constantinou MA, Papakonstantinou E, Spraul M, Sevastiadou S, Costalos C, Koupparis MA: 1H NMRbased metabonomics for the diagnosis of inborn errors of metabolism in urine. Analytica Chimica Acta 2005, 542(2):169–177. 10.1016/j.aca.2005.03.059View ArticleGoogle Scholar
 Nobeli I, Thornton JM: A bioinformatician's view of the metabolome. Bioessays 2006, 28(5):534–545. 10.1002/bies.20414View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments
View archived comments (1)