Objectives
MetaboLab is based on the previously developed NMRLab software [3], which is a general purpose software tool for multidimensional NMR data processing similar to other freely available software [4–6]. MetaboLab aims to facilitate post-processing steps necessary to prepare NMR spectra for statistical analysis similar to the script based software ProMetab [7]. The major objectives for the MetaboLab software design were,
-
1.
to create a simple and transparent software interface providing access to state-of-the-art algorithms,
-
2.
to provide automated or graphically facilitated algorithms, e.g. for phase correction, spectral referencing, glog transform optimisation, baseline correction of series of spectra,
-
3.
to create a graphical batch processing interface,
-
4.
to enable data output for statistical packages such as PLS Toolbox (Eigenvector Research Inc.),
-
5.
to enable semi-automated assignment of 2D-HSQC spectra,
-
6.
to provide tools capable of generating peak lists and intensities for series of spectra.
One-dimensional NMR spectra processing and batch processing
MetaboLab includes the processing of the original spectrometer data (implemented for Bruker and Varian/Agilent data and parameter files), followed by common processing steps. The free induction decay (FID) is Fourier transformed, phase corrected and referenced, a robust automated algorithm is included for phase correction, which has worked well for most spectra acquired with the Bruker 'baseopt' acquisition parameter. Automated referencing to chemical shift standards such as TMS, TMSP or DSS is available. If no chemical shift standard has been added referencing is performed on the assumption that the transmitter offset has been set to the water frequency [8].
For batch processing of NMR series of NMR spectra a script builder generates scripts using standard processing steps including zero filling, several algorithms for post acquisition water suppression [9], apodization functions, phase correction (two different automated algorithms or manual phase correction), DC offset correction, Gibbs multiplication and referencing. The file selection mechanism allows the simultaneous choice of NMR data from different directories. The script builder composes a MATLAB script, which can be further edited and executed. This script will read all the selected FIDs and process them subsequently (see also Additional File 1).
Processing scripts can similarly be composed for 2D spectra. Parameter sets for 2D J-resolved and for HSQC spectra have been included. For a series of 2D J-resolved spectra a projection using either the skyline or summation algorithm along the frequency axis can be applied after optionally calculating a tilted 2D spectrum. Apodization functions default to either exponential multiplication for 1D spectra or square cosine for 2D spectra. A window function composed of a shifted sine multiplied by an exponential (SEM window function [10]) is also available for 2D J-resolved spectra.
MetaboLab: Metabolomics post-processing
Metabolomics NMR data analysis requires additional post-processing applied to the data series in order to remove distortions causing artifacts in subsequent statistical data analysis or quantification. Post-processing includes spectral alignment (with respect to TMSP or a reference spectrum, the latter is typically performed after selected regions of the spectra have been excluded), various algorithms for baseline correction, the scaling of spectra (with respect to TMSP or total spectral area), bucketing and data export for subsequent multivariate statistical analysis. The graphical user interface (Figure 1) allows for the exclusion of spectra and the assignment of classes, which can be displayed in different colors. An interface is provided to transfer data and spectral classes into the commercially available PLS Toolbox software (Eigenvector Research Inc., Wenatchee, USA).
The MetaboLab graphical user interface provides easy access to post-processing tasks. Even on modern spectrometers the spectral baseline is often not ideally flat or shows a small DC offset. Such distortions typically may arise from the use of digital filters in the context of oversampling or from imperfections in the water suppression. Moreover, small baseline distortions in high-resolution 1D spectra are enhanced by non-linear scaling algorithms such as the glog transform [2], Pareto or autoscaling, which are typically used to enhance small signals in spectra. MetaboLab includes various algorithms with automated determination of baseline points for baseline correction [11, 12]. However, parametrization of these algorithms is often difficult and tends to cause artifacts at the edge of peaks. Best results are achieved by manually picking baseline points on the series of NMR spectra, using common zeros in all or most spectra, followed by calculating a spline function to model the baseline for each individual spectrum. The advantage of this is that exactly the same spline points can be used for all spectra, thus reducing the variability between spectra. Facilitated by a graphical interface this helps to minimize distortions introduced by the baseline correction.
MetaboLab includes a new algorithm to compute an optimized glog parameter that maximizes the variance in the spectral series of NMR spectra. For sufficiently large data sets this algorithm yields good glog parameters. MetaboLab also allows manual modification of glog parameters and facilitates its adjustment by updating the display of processed spectra in real time.
Graphical Assignment Tool
The assignment tool presents itself as a graphical user interface to assign signals in 2D-HSQC spectra based on metabolite peak lists in a spectral library (Figure 2). MetaboLab contains a limited number of assigned metabolites from public databases [13] but provides an easy method to add additional metabolites. The software allows for small differences in chemical shifts in a spectrum compared to those in the library and searches for the closest local maximum (similar to the region of interest based data analysis in the rNMR program [14]). The chemical structure including atom numbering of the carbon nuclei is displayed for an individual metabolite (see Additional File 1 for details). For ambiguous assignments alternative assignments available in the spectral library can be displayed along with peak intensities. Duplicate assignments are highlighted by color.